Skip to content

MP3 ID3 metadata extraction is ANSI-only; non-ACP filenames can silently lose tags

Historical reference only: stale-v0.72a-experimental-clean and analysis\stale-v0.72a-experimental-clean are retired reference sources, not active branch targets or current baselines. Use them only as provenance or idea-extraction sources; landed status is determined against main. See Historical References.

Summary

Current main now prefers MediaInfo.dll for audio/video metadata, including MP3, before it falls back to the legacy built-ins and finally to id3lib for MPEG audio.

That change reduces the default MP3 exposure when MediaInfo.dll is present, but the vendored id3lib still exposes only narrow filename APIs and still opens or updates files through narrow CRT / CreateFileA paths. When the DLL is absent or yields no useful metadata, MP3 fallback parsing can still fail or silently miss metadata on filenames that do not round-trip through the active ANSI code page. The same integration also still depends on local workarounds for broken id3lib Unicode getters and mixed ID3v1/ID3v2 Unicode corruption.

Current Main Status

On 2026-04-18, main was updated to extract the optional MediaInfo.dll loader into a shared MediaInfo_DLL.cpp helper and to route both File Info and shared-file tag extraction through a common precedence order:

  1. MediaInfo.dll first for audio/video, including MP3
  2. built-in RIFF / RM / WM readers next
  3. id3lib last for explicit MPEG-audio fallback extensions only

This is an intentional risk-reduction step, not a full bug closure. The ANSI-only id3lib dependency still exists as a fallback path when MediaInfo.dll is missing or returns no useful metadata, so the item is closed as Wont-Fix by product decision rather than Done.

Product Decision

On 2026-04-26, the remaining MP3 id3lib ANSI fallback risk was explicitly accepted for Release 1. MediaInfo.dll remains the preferred metadata path, but no id3lib removal, replacement, Unicode port, or fallback behavior change should be scheduled unless that product decision is explicitly reversed.

Evidence In Current Tree

  • srchybrid/FileInfoDialog.cpp:731-764
  • local ID3_GetStringW(...) wrapper documents two live id3lib Unicode bugs:
  • only trust id3lib Unicode when the frame is already UTF-16
  • do not use ID3_FieldImpl::Get(unicode_t*, ..., itemNum) because GetRawUnicodeTextItem is broken
  • srchybrid/FileInfoDialog.cpp:883-888
  • current main does:
  • CStringA strFilePathA(pFile->GetFilePath());
  • myTag.Link(strFilePathA, ID3TT_ID3V2);
  • srchybrid/KnownFile.cpp:1416-1426
  • same ANSI path downcast and myTag.Link(strFilePathA, ...) pattern
  • repos\third_party\emulebb-id3lib\include\id3\tag.h:99,134
  • public API only exposes Link(const char *fileInfo, ...)
  • repos\third_party\emulebb-id3lib\src\tag_impl.h:118,153,179
  • implementation also only exposes Link(const char*)
  • _file_name is stored as narrow dami::String
  • repos\third_party\emulebb-id3lib\src\utils.cpp:330-451
  • file helpers still call ifstream / fstream::open(name.c_str(), ...)
  • repos\third_party\emulebb-id3lib\src\tag_file.cpp:49-64,132-141,320-321
  • truncate path still uses CreateFileA
  • file-link state is still stored as narrow _file_name
  • replacement path still uses narrow remove(...) / rename(...)
  • repos\third_party\emulebb-id3lib\src\field_binary.cpp:128-163
  • binary file import/export still uses narrow fopen_s

Why This Is A Real Bug

CStringA conversion is locale-dependent and lossy for characters outside the active ANSI code page. That means the MP3 file selected by the user can become unopenable to id3lib even though the rest of eMule can still access it via UTF-16 Win32 paths.

The current integration already acknowledges that parts of id3lib's Unicode path are broken. The app-side ID3_GetStringW wrapper and the forced ID3v2-first read path are not speculative cleanup notes; they are live workarounds for current dependency bugs.

User-Visible / Runtime Impact

  • MP3 metadata display can fail for files stored under Unicode-only filenames or directories
  • media-tag extraction behavior varies by system locale / code page
  • multi-item Unicode fields and mixed ID3v1 + ID3v2 files still rely on brittle defensive paths in app code

Cross-Variant Status

  • workspaces\v0.72a\app\eMule-main\srchybrid\FileInfoDialog.cpp and KnownFile.cpp
  • primary MP3 metadata now routes through MediaInfo.dll first
  • residual ANSI-only id3lib risk remains in the MPEG-audio fallback path
  • no audited sibling tree contains a Unicode-safe id3lib port
  • analysis\stale-v0.72a-experimental-clean
  • provided the helper-extraction reference for MediaInfo_DLL.cpp
  • still removed id3lib instead of fixing it (docs\DEP-STATUS.md cites commit 907e675)
  • analysis\emuleai
  • current workspace snapshot has no active ID3_Tag / ID3_GetStringW call sites in srchybrid, so there is no direct cherry-pickable Unicode port there either
  • current release notes reinforce the same direction by replacing legacy ID3Lib integration with MediaInfoLib / bundled MediaInfo behavior

Historical Fix Shape

  1. Keep the current MediaInfo.dll-first routing as the default MP3 path.
  2. Preferred close-stock closure:
  3. retire the id3lib fallback after packaging/licensing review proves a first-class MediaInfo path is available for the supported builds
  4. keep behavior deterministic when MediaInfo is absent by documenting the reduced metadata surface instead of falling back to ANSI-only file opens
  5. If id3lib remains strategic, do the real vendor fix:
  6. add wide or explicit UTF-8-normalized filename support to ID3_Tag and ID3_TagImpl
  7. replace narrow helpers in utils.cpp, tag_file.cpp, and field_binary.cpp
  8. revalidate Update() / Strip() temp-file replace logic on Win32 and long paths
  9. fix or re-verify ID3_FieldImpl::Get(unicode_t*, ...) / GetRawUnicodeTextItem
  10. Alternative dependency decision:
  11. remove or replace id3lib, as the sibling trees already chose to do

Historical Validation Target

  • place an MP3 under a path containing characters outside the active ANSI code page, ideally also under a long path
  • verify the preferred path with MediaInfo.dll present still populates File Info and shared metadata tags
  • verify the fallback path without MediaInfo.dll still populates title / artist / album where id3lib can read the path
  • include a mixed ID3v1 + ID3v2 sample and a multi-item Unicode text-frame sample

No validation target remains active under the current product decision.