Shared-file hashing fails too eagerly on transient sharing and lock violations
Closure¶
Closed on 2026-05-24.
- app commit
dbe818d(BUG-031: retry transient hash open failures) adds a bounded hashing-open retry wrapper forERROR_SHARING_VIOLATIONandERROR_LOCK_VIOLATION, used by bothCKnownFile::CreateFromFile()andCKnownFile::CreateAICHHashSetOnly(). - test commit
9dcb7d7covers retryable error classification and retry-budget boundaries.
Validation:
python -m emule_workspace validatepython -m emule_workspace build app --variant main --config Debug --platform x64 --build-output-mode ErrorsOnlypython -m emule_workspace build app --variant main --config Release --platform x64 --build-output-mode ErrorsOnlypython -m emule_workspace build tests --config Debug --platform x64 --build-output-mode ErrorsOnlypython -m emule_workspace build tests --config Release --platform x64 --build-output-mode ErrorsOnly- Debug and Release native suites:
known_file_hash_openandparity.
Summary¶
Current main performs a single long-path-safe open attempt when hashing a discovered
shared file. If the file is still being copied, moved, or finalized by another process and
the open fails with ERROR_SHARING_VIOLATION or ERROR_LOCK_VIOLATION, hashing fails
immediately.
eMuleAI carries a small, local retry wrapper for that exact path. The fix is narrow and fits the current branch goal: it reduces false negative hashing failures without changing the broader hashing architecture.
Evidence In Current Tree¶
srchybrid/KnownFile.cppCKnownFile::CreateFromFile(...)callsOpenFileStreamSharedReadLongPath(...)once and bails immediately on failureCKnownFile::CreateAICHHashSetOnly()follows the same one-shot open modelanalysis\emuleai\srchybrid\KnownFile.cpp- adds
IsRetryableHashOpenError(...) - adds
OpenFileStreamSharedReadForHashing(...) - retries a bounded number of times for:
ERROR_SHARING_VIOLATIONERROR_LOCK_VIOLATION
- preserves the real Win32 failure reason if the retry budget is exhausted
GitHub references from eMuleAI commit
8e34bdec2b7e4fe9e4307df9d80f691804be99ed:
- retry helper and retryable error classification:
KnownFile.cpp - hashing open call sites:
KnownFile.cpp
Why This Matters¶
This is not a speculative performance tweak. Shared-file discovery and startup hashing can see files while they are still in transition on disk:
- copy into a shared directory
- move between shared directories
- rename/finalize workflows from another application
On the current tree, those transient windows become immediate hash failures even though the file may be readable a few hundred milliseconds later.
Likely Fix Shape¶
Keep the fix local to hashing opens:
- add a helper in
KnownFile.cppthat retries shared-read open for a short bounded window onERROR_SHARING_VIOLATION/ERROR_LOCK_VIOLATION - reuse it in both:
CreateFromFile()CreateAICHHashSetOnly()- preserve the already-landed BUG-025 Win32 error logging
Do not bundle this with a larger handle-based hashing rewrite.
Validation Target¶
- place a file into a shared directory while another process still holds it open
- verify transient sharing/lock cases no longer fail immediately
- verify hard failures still report the real Win32 reason after the retry window expires
- re-check startup hashing on large shared trees with active file churn
Product Decision¶
2026-04-19: This remained a valid narrow hardening candidate, but it was
explicitly delayed. It was originally tracked as Blocked because the backlog
status model had no dedicated Deferred state.
2026-05-01: Marked Deferred after adding Deferred as a first-class backlog
status. The delay is intentional; this is not a release blocker and should not
be scheduled unless the release scope changes.
2026-05-24: Revalidated during the eMuleAI release-history/code review. This remains one of the few eMuleAI bug fixes that is not already clearly covered by current eMuleBB hardening. Keep it deferred, but preserve the implementation links above so the future fix can stay narrow.