Skip to content

Shared-file hashing fails too eagerly on transient sharing and lock violations

Closure

Closed on 2026-05-24.

  • app commit dbe818d (BUG-031: retry transient hash open failures) adds a bounded hashing-open retry wrapper for ERROR_SHARING_VIOLATION and ERROR_LOCK_VIOLATION, used by both CKnownFile::CreateFromFile() and CKnownFile::CreateAICHHashSetOnly().
  • test commit 9dcb7d7 covers retryable error classification and retry-budget boundaries.

Validation:

  • python -m emule_workspace validate
  • python -m emule_workspace build app --variant main --config Debug --platform x64 --build-output-mode ErrorsOnly
  • python -m emule_workspace build app --variant main --config Release --platform x64 --build-output-mode ErrorsOnly
  • python -m emule_workspace build tests --config Debug --platform x64 --build-output-mode ErrorsOnly
  • python -m emule_workspace build tests --config Release --platform x64 --build-output-mode ErrorsOnly
  • Debug and Release native suites: known_file_hash_open and parity.

Summary

Current main performs a single long-path-safe open attempt when hashing a discovered shared file. If the file is still being copied, moved, or finalized by another process and the open fails with ERROR_SHARING_VIOLATION or ERROR_LOCK_VIOLATION, hashing fails immediately.

eMuleAI carries a small, local retry wrapper for that exact path. The fix is narrow and fits the current branch goal: it reduces false negative hashing failures without changing the broader hashing architecture.

Evidence In Current Tree

  • srchybrid/KnownFile.cpp
  • CKnownFile::CreateFromFile(...) calls OpenFileStreamSharedReadLongPath(...) once and bails immediately on failure
  • CKnownFile::CreateAICHHashSetOnly() follows the same one-shot open model
  • analysis\emuleai\srchybrid\KnownFile.cpp
  • adds IsRetryableHashOpenError(...)
  • adds OpenFileStreamSharedReadForHashing(...)
  • retries a bounded number of times for:
    • ERROR_SHARING_VIOLATION
    • ERROR_LOCK_VIOLATION
  • preserves the real Win32 failure reason if the retry budget is exhausted

GitHub references from eMuleAI commit 8e34bdec2b7e4fe9e4307df9d80f691804be99ed:

Why This Matters

This is not a speculative performance tweak. Shared-file discovery and startup hashing can see files while they are still in transition on disk:

  • copy into a shared directory
  • move between shared directories
  • rename/finalize workflows from another application

On the current tree, those transient windows become immediate hash failures even though the file may be readable a few hundred milliseconds later.

Likely Fix Shape

Keep the fix local to hashing opens:

  1. add a helper in KnownFile.cpp that retries shared-read open for a short bounded window on ERROR_SHARING_VIOLATION / ERROR_LOCK_VIOLATION
  2. reuse it in both:
  3. CreateFromFile()
  4. CreateAICHHashSetOnly()
  5. preserve the already-landed BUG-025 Win32 error logging

Do not bundle this with a larger handle-based hashing rewrite.

Validation Target

  • place a file into a shared directory while another process still holds it open
  • verify transient sharing/lock cases no longer fail immediately
  • verify hard failures still report the real Win32 reason after the retry window expires
  • re-check startup hashing on large shared trees with active file churn

Product Decision

2026-04-19: This remained a valid narrow hardening candidate, but it was explicitly delayed. It was originally tracked as Blocked because the backlog status model had no dedicated Deferred state.

2026-05-01: Marked Deferred after adding Deferred as a first-class backlog status. The delay is intentional; this is not a release blocker and should not be scheduled unless the release scope changes.

2026-05-24: Revalidated during the eMuleAI release-history/code review. This remains one of the few eMuleAI bug fixes that is not already clearly covered by current eMuleBB hardening. Keep it deferred, but preserve the implementation links above so the future fix can stay narrow.