Skip to content

Same-hash KnownFile replacement can unshare or mis-track equivalent files

Summary

CKnownFileList::SafeAddKFile() no longer resolves every same-MD4 collision by destructively replacing the old entry with the new one. Current main preserves live shared/download owners, adopts an incoming live owner only over an inactive known-file record, and keeps inactive collisions non-destructive.

The shared-file side also has a persisted duplicate-path cache that prevents the most visible startup duplicate-path regression.

Fixed Mainline Scope

The first implementation slice landed on main in commits 4c974a3, c495525, and d7aa382:

  • persist duplicate shared paths in a shareddups.dat sidecar
  • reuse remembered duplicate shared paths at startup when path, size, date, and canonical MD4 still match
  • include the duplicate-path cache in the shared startup-cache save and purge flow

The core collision slice landed on main in commit 05eabec with parity-test coverage in emulebb-build-tests commit b0ab2e8:

  • add a seam-level ResolveKnownFileCollision(...) policy
  • keep existing live shared or downloading known-file entries on same-MD4 collision
  • adopt incoming shared/downloading entries only when the existing known-file entry is inactive
  • merge compatible statistics into the retained owner
  • update equivalent-path spelling without removing the authoritative live owner
  • teach shared-file hashing and completed-download handoff to cleanly unwind when the known-file list rejects a duplicate live owner

The remaining replacement branch now applies only when an incoming live owner replaces an inactive known-file entry.

Why This Matters

Representative low-drama failure cases:

  • a shared file is moved between shared directories
  • the same hashed file exists in two shared directories
  • startup or reload rediscovers an equivalent file before share state fully settles
  • a previously downloaded/shared file is reintroduced through another path

Those are exactly the kinds of cases that can create unshared files, mismatched GUI state, or accidental loss of the authoritative shared instance.

Comparison Notes

  • analysis\emuleai goes much further with duplicate-path/history tracking and shows that the problem space is real
  • the focused Xtreme mod archive still carries the historical warning on this surface, which suggests the logical flaw has been known for a long time

The branch does not need eMuleAI's full duplicate-history feature set to justify fixing the core destructive replacement behavior.

Later Option

If the narrow MD4-only fix still leaves too much ambiguity, a local strong-hash sidecar is a viable later stabilization option:

  • keep MD4 as the protocol and known.met identity
  • add a separate local cache keyed by path / size / mtime with a strong content hash such as BLAKE3
  • use that sidecar only to distinguish true same-content rediscovery from local same-MD4 ambiguity before merging or replacing KnownFile state

That should be treated as a local persistence/consolidation aid, not as a first-pass protocol change.

Validation

  • python scripts\build_emule_tests.py --workspace-root EMULE_WORKSPACE_ROOT\workspaces\v0.72a --app-root EMULE_WORKSPACE_ROOT\workspaces\v0.72a\app\eMule-main --build-output-mode ErrorsOnly --run -- --test-case=*Known-file* — 18 passed
  • full parity test run built cleanly but still hit the pre-existing environment-sensitive long-path current-directory case in other_functions.tests.cpp
  • python -m emule_workspace build app --workspace-root EMULE_WORKSPACE_ROOT --workspace-name v0.72a --config Debug --platform x64 --build-output-mode ErrorsOnly --variant main — passed, including CFG verification
  • python scripts\shared-files-ui-e2e.py --workspace-root EMULE_WORKSPACE_ROOT\workspaces\v0.72a --app-root EMULE_WORKSPACE_ROOT\workspaces\v0.72a\app\eMule-main --configuration Debug --scenario duplicate-startup-reuse — passed