Skip to content

Ignored helper-thread launch failures can hang shutdown waits

Summary

Several long-lived helper objects start worker threads in their constructors and ignore AfxBeginThread(...) failure. Their shutdown paths then wait on a thread-ended event that is only signaled by the worker body. Under resource exhaustion or CRT/MFC thread-start failure, shutdown can block forever.

Current Main Evidence

  • srchybrid\UploadDiskIOThread.cpp calls AfxBeginThread(RunProc, this) in the constructor and ignores the returned CWinThread*.
  • CUploadDiskIOThread::EndThread() sets m_Run = RUN_STOP, posts to m_hPort, and waits on m_eventThreadEnded.
  • srchybrid\PartFileWriteThread.cpp follows the same constructor/start and EndThread() pattern.
  • srchybrid\UploadBandwidthThrottler.cpp calls AfxBeginThread(...) in the constructor, ignores failure, and later waits on m_eventThreadEnded after clearing m_bRun.

Risk

The normal path is fine when thread creation succeeds. The failure path is the problem: no worker exists to initialize state or signal the completion event, yet shutdown assumes that worker is alive. This is a low-frequency but concrete hang risk during low-memory, thread-quota, or process-teardown stress.

Broadband Fit

This is narrow defensive hardening. It preserves the current worker-thread architecture and only makes startup/shutdown paths explicit.

Acceptance Criteria

  • [x] capture and check every helper AfxBeginThread(...) result
  • [x] record whether the worker actually started
  • [x] make EndThread() a no-op or bounded failure path when startup failed
  • [x] avoid posting to invalid completion ports or waiting on events that cannot be signaled
  • [x] log thread-start failures at a rate-limited level suitable for shutdown diagnostics
  • [x] add a seam or targeted test that simulates worker launch failure for each affected helper

Resolution

Current main captures helper-thread launch results for upload disk I/O, part-file writes, and the upload bandwidth throttler in app commit 7cbdbc9. Failed launches now log once, avoid waiting on an event no worker can signal, and avoid posting work to missing IOCP handles. IOCP workers also signal their shutdown event if early completion-port creation fails.

Targeted native coverage lives in helper_thread_launch.tests.cpp in tests commit 60ec43a.