Skip to content

WebSocket and legacy socket leak-churn gate

Summary

Add an Beta 0.7.3 release gate that repeatedly churns WebSocket, REST, and legacy socket close paths while measuring handles, threads, queued socket state, and process memory.

This is the release proof for known legacy Win32/MFC socket hazards: accepted client lifetime, CAsyncSocket-style helper state, async close/reset ordering, queued send cleanup, and thread-object ownership.

Risk Being Covered

  • accepted-client or listener thread objects leaking after socket churn
  • helper-window or async-socket state retaining stale socket pointers
  • reset/close races leaving queued send buffers attached to dead sockets
  • self-deleting socket objects running callbacks after owner teardown
  • process handle/thread/private-byte growth hidden by otherwise passing request counts

Execution Plan

  1. Add a leak-churn harness mode that records baseline process handles, threads, GDI objects, USER objects, private bytes, and accepted WebSocket thread counts.
  2. Run repeated HTTP and HTTPS connect/reset cycles, including idle accepted sockets, partial headers, partial declared bodies, slow responses, and queued-response disconnects.
  3. Run legacy listen-socket churn for short-lived accepted peer-like sockets where the harness can do so without joining the public network.
  4. Run stop/start cycles after churn to prove WebSocket global state is reusable only after accepted clients drain.
  5. Compare post-drain resource counts against baseline and documented tolerance.
  6. Emit a release artifact containing counts before churn, peak counts, counts after drain, socket outcome totals, and any stuck thread/handle identifiers.

Acceptance Criteria

  • [x] live REST reports include baseline and post-adversity resource snapshots
  • [x] 1k+ HTTP connect/reset churn mode is implemented with selectable smoke and soak budgets
  • [x] HTTPS connect/reset churn mode is implemented with selectable smoke and soak budgets
  • [x] 1k+ HTTP connect/reset cycles complete without unbounded handle, thread, or memory growth
  • [x] 1k+ HTTPS connect/reset cycles complete without unbounded handle, thread, or memory growth
  • [x] accepted-client threads drain before WebSocket termination state is closed
  • [x] queued send buffers are released when clients reset during response send
  • [x] stop/start after churn succeeds after thresholded post-drain state
  • [x] leak-churn reports include baseline, peak, and post-drain process resource counts
  • [x] leak-churn reports evaluate Beta 0.7.3 resource thresholds and fail on violations

Progress Evidence

  • Test harness commit: 1d97dd4.
  • Test harness commit: e88e067.
  • Test harness commit: ae3a840.
  • Test harness commit: 941c439.
  • Test harness commit: b8729d3.
  • Test harness commit: 352a2d2.
  • Build orchestration commit: 94d1044.
  • Build orchestration commit: 3ec3674.
  • Added process resource snapshots to live REST reports after launch and after REST socket adversity/stress.
  • Added resource deltas for handles, GDI objects, USER objects, private bytes, and working set bytes.
  • Added --rest-leak-churn-budget {off,smoke,soak} and --rest-leak-churn-cycles; soak defaults to 1000 HTTP connect/reset cycles.
  • Extended leak churn to HTTPS profiles with stalled TLS connect-close, partial TLS record reset, and partial ClientHello reset cycles.
  • Exposed the same controls through supported workspace live-e2e parameters -RestLeakChurnBudget and -RestLeakChurnCycles.
  • Added focused Python coverage for resource-delta calculation.
  • HTTPS smoke artifact: repos\emulebb-build-tests\reports\rest-api-smoke\20260508-121849-eMule-main-release\result.json. The run passed 100/100 HTTPS leak-churn cycles with enforced thresholds, resource_thresholds.ok=true, and zero threshold violations. Observed post-drain deltas were handles +1, private bytes +45056, and working set bytes +57344.
  • Added release thresholds for leak-churn deltas: post-drain handles <= 64, process threads <= 4, GDI objects <= 32, USER objects <= 32, private bytes <= 256 MiB, and working set bytes <= 256 MiB; peak handles <= 128, process threads <= 32, GDI objects <= 64, USER objects <= 64, private bytes <= 384 MiB, and working set bytes <= 384 MiB.
  • Thread-count smoke artifact: repos\emulebb-build-tests\reports\rest-api-smoke\20260508-122517-eMule-main-release\result.json. The run passed with thread count 19 -> 19 -> 19 across baseline, peak, and post-drain snapshots, thread_count delta 0, and zero threshold violations.
  • Added --rest-stop-start-after-churn to the REST smoke harness and -RestStopStartAfterChurn to the supported workspace live E2E entrypoint. The stop/start proof is gated behind leak churn and only runs after the thresholded post-drain check passes.
  • HTTPS stop/start-after-churn artifact: repos\emulebb-build-tests\reports\rest-api-smoke\20260508-123736-eMule-main-release\result.json. The run passed 100/100 HTTPS leak-churn cycles with enforced thresholds, resource_thresholds.ok=true, and zero threshold violations; old process 15520 closed in 8590.308 ms, the profile relaunched as process 16760, and /api/v1/app returned 200 after relaunch.
  • HTTP stop/start soak artifact: repos\emulebb-build-tests\reports\rest-api-smoke\20260508-203500-eMule-main-release\result.json. The run passed 1000/1000 HTTP leak-churn cycles with resource_thresholds.ok=true, zero threshold violations, and successful post-churn relaunch from process 15664 to process 5876.
  • HTTPS stop/start soak artifact: repos\emulebb-build-tests\reports\rest-api-smoke\20260508-203928-eMule-main-release\result.json. The run passed 1000/1000 HTTPS leak-churn cycles with resource_thresholds.ok=true, zero threshold violations, and successful post-churn relaunch from process 11000 to process 7040.
  • Queued-send reset cleanup is covered by the passing reset_during_response_send and reset_during_error_response_send socket probes in repos\emulebb-build-tests\reports\rest-api-smoke\20260508-203041-eMule-main-release\result.json, plus the thresholded HTTP/HTTPS leak-churn stop/start soaks above.
  • HTTP soak artifact: repos\emulebb-build-tests\reports\rest-api-smoke\20260508-122017-eMule-main-release\result.json. The run passed 1000/1000 HTTP leak-churn cycles with enforced thresholds, resource_thresholds.ok=true, and zero threshold violations. Observed post-drain deltas were handles +1, private bytes +442368, and working set bytes +745472.
  • HTTPS soak artifact: repos\emulebb-build-tests\reports\rest-api-smoke\20260508-122141-eMule-main-release\result.json. The run passed 1000/1000 HTTPS leak-churn cycles with enforced thresholds, resource_thresholds.ok=true, and zero threshold violations. Observed post-drain deltas were GDI objects +1, private bytes +131563520, and working set bytes +131661824; handles finished below baseline.
  • App hardening commit d75919a preserves non-TLS WebSocket positive partial sends instead of treating stale WSAGetLastError state as a fatal send failure.
  • App hardening commit 1ca4d49 snapshots the legacy Web shared-files list before rendering, avoiding live map traversal while the shared-file list can be updated.
  • App hardening commit b33efb8 locks Web graph history reads and writes around m_Params.PointsForWeb.
  • App hardening commit 2cca5ac releases the locale allocated during Web template reload and guards the timestamp path against locale/time conversion failure.
  • App hardening commit 942c484 initializes WebSocket queued-send chunks before allocation of their payload buffers, making allocation-failure cleanup safe.
  • Validation:
  • python -m pytest tests\python\test_rest_api_smoke.py tests\python\test_live_e2e_suite.py -q
  • python -m emule_workspace build app --config Release --platform x64
  • python -m emule_workspace test python --path tests\python\test_rest_api_smoke.py --quiet
  • python -m emule_workspace test python --path tests\python\test_live_e2e_suite.py --quiet
  • python scripts\rest-api-smoke.py --help
  • git -C repos\emulebb-build-tests diff --check
  • git -C repos\emulebb-build diff --check
  • python -m emule_workspace validate
  • python -m emule_workspace test live-e2e --config Release --platform x64 --suite rest-api --skip-live-seed-refresh --rest-webserver-scheme https --rest-coverage-budget smoke --rest-stress-budget off --rest-socket-adversity-budget off --rest-tls-handshake-adversity-budget off --rest-leak-churn-budget smoke --rest-stop-start-after-churn --rest-server-search-count 0 --rest-kad-search-count 0 --rest-download-trigger-count 0

Pending Release Evidence

  • None. Legacy listen-socket public-network churn remains out of Beta 0.7.3 scope unless a future release gate promotes a deterministic offline seam.

Relationship To Other Items