WebSocket and legacy socket leak-churn gate
Summary¶
Add an Beta 0.7.3 release gate that repeatedly churns WebSocket, REST, and legacy socket close paths while measuring handles, threads, queued socket state, and process memory.
This is the release proof for known legacy Win32/MFC socket hazards: accepted client lifetime, CAsyncSocket-style helper state, async close/reset ordering, queued send cleanup, and thread-object ownership.
Risk Being Covered¶
- accepted-client or listener thread objects leaking after socket churn
- helper-window or async-socket state retaining stale socket pointers
- reset/close races leaving queued send buffers attached to dead sockets
- self-deleting socket objects running callbacks after owner teardown
- process handle/thread/private-byte growth hidden by otherwise passing request counts
Execution Plan¶
- Add a leak-churn harness mode that records baseline process handles, threads, GDI objects, USER objects, private bytes, and accepted WebSocket thread counts.
- Run repeated HTTP and HTTPS connect/reset cycles, including idle accepted sockets, partial headers, partial declared bodies, slow responses, and queued-response disconnects.
- Run legacy listen-socket churn for short-lived accepted peer-like sockets where the harness can do so without joining the public network.
- Run stop/start cycles after churn to prove WebSocket global state is reusable only after accepted clients drain.
- Compare post-drain resource counts against baseline and documented tolerance.
- Emit a release artifact containing counts before churn, peak counts, counts after drain, socket outcome totals, and any stuck thread/handle identifiers.
Acceptance Criteria¶
- [x] live REST reports include baseline and post-adversity resource snapshots
- [x] 1k+ HTTP connect/reset churn mode is implemented with selectable smoke and soak budgets
- [x] HTTPS connect/reset churn mode is implemented with selectable smoke and soak budgets
- [x] 1k+ HTTP connect/reset cycles complete without unbounded handle, thread, or memory growth
- [x] 1k+ HTTPS connect/reset cycles complete without unbounded handle, thread, or memory growth
- [x] accepted-client threads drain before WebSocket termination state is closed
- [x] queued send buffers are released when clients reset during response send
- [x] stop/start after churn succeeds after thresholded post-drain state
- [x] leak-churn reports include baseline, peak, and post-drain process resource counts
- [x] leak-churn reports evaluate Beta 0.7.3 resource thresholds and fail on violations
Progress Evidence¶
- Test harness commit:
1d97dd4. - Test harness commit:
e88e067. - Test harness commit:
ae3a840. - Test harness commit:
941c439. - Test harness commit:
b8729d3. - Test harness commit:
352a2d2. - Build orchestration commit:
94d1044. - Build orchestration commit:
3ec3674. - Added process resource snapshots to live REST reports after launch and after REST socket adversity/stress.
- Added resource deltas for handles, GDI objects, USER objects, private bytes, and working set bytes.
- Added
--rest-leak-churn-budget {off,smoke,soak}and--rest-leak-churn-cycles; soak defaults to 1000 HTTP connect/reset cycles. - Extended leak churn to HTTPS profiles with stalled TLS connect-close, partial TLS record reset, and partial ClientHello reset cycles.
- Exposed the same controls through supported workspace
live-e2eparameters-RestLeakChurnBudgetand-RestLeakChurnCycles. - Added focused Python coverage for resource-delta calculation.
- HTTPS smoke artifact:
repos\emulebb-build-tests\reports\rest-api-smoke\20260508-121849-eMule-main-release\result.json. The run passed 100/100 HTTPS leak-churn cycles with enforced thresholds,resource_thresholds.ok=true, and zero threshold violations. Observed post-drain deltas were handles+1, private bytes+45056, and working set bytes+57344. - Added release thresholds for leak-churn deltas: post-drain handles <= 64, process threads <= 4, GDI objects <= 32, USER objects <= 32, private bytes <= 256 MiB, and working set bytes <= 256 MiB; peak handles <= 128, process threads <= 32, GDI objects <= 64, USER objects <= 64, private bytes <= 384 MiB, and working set bytes <= 384 MiB.
- Thread-count smoke artifact:
repos\emulebb-build-tests\reports\rest-api-smoke\20260508-122517-eMule-main-release\result.json. The run passed with thread count19 -> 19 -> 19across baseline, peak, and post-drain snapshots,thread_countdelta0, and zero threshold violations. - Added
--rest-stop-start-after-churnto the REST smoke harness and-RestStopStartAfterChurnto the supported workspace live E2E entrypoint. The stop/start proof is gated behind leak churn and only runs after the thresholded post-drain check passes. - HTTPS stop/start-after-churn artifact:
repos\emulebb-build-tests\reports\rest-api-smoke\20260508-123736-eMule-main-release\result.json. The run passed 100/100 HTTPS leak-churn cycles with enforced thresholds,resource_thresholds.ok=true, and zero threshold violations; old process15520closed in8590.308ms, the profile relaunched as process16760, and/api/v1/appreturned200after relaunch. - HTTP stop/start soak artifact:
repos\emulebb-build-tests\reports\rest-api-smoke\20260508-203500-eMule-main-release\result.json. The run passed 1000/1000 HTTP leak-churn cycles withresource_thresholds.ok=true, zero threshold violations, and successful post-churn relaunch from process15664to process5876. - HTTPS stop/start soak artifact:
repos\emulebb-build-tests\reports\rest-api-smoke\20260508-203928-eMule-main-release\result.json. The run passed 1000/1000 HTTPS leak-churn cycles withresource_thresholds.ok=true, zero threshold violations, and successful post-churn relaunch from process11000to process7040. - Queued-send reset cleanup is covered by the passing
reset_during_response_sendandreset_during_error_response_sendsocket probes inrepos\emulebb-build-tests\reports\rest-api-smoke\20260508-203041-eMule-main-release\result.json, plus the thresholded HTTP/HTTPS leak-churn stop/start soaks above. - HTTP soak artifact:
repos\emulebb-build-tests\reports\rest-api-smoke\20260508-122017-eMule-main-release\result.json. The run passed 1000/1000 HTTP leak-churn cycles with enforced thresholds,resource_thresholds.ok=true, and zero threshold violations. Observed post-drain deltas were handles+1, private bytes+442368, and working set bytes+745472. - HTTPS soak artifact:
repos\emulebb-build-tests\reports\rest-api-smoke\20260508-122141-eMule-main-release\result.json. The run passed 1000/1000 HTTPS leak-churn cycles with enforced thresholds,resource_thresholds.ok=true, and zero threshold violations. Observed post-drain deltas were GDI objects+1, private bytes+131563520, and working set bytes+131661824; handles finished below baseline. - App hardening commit
d75919apreserves non-TLS WebSocket positive partial sends instead of treating staleWSAGetLastErrorstate as a fatal send failure. - App hardening commit
1ca4d49snapshots the legacy Web shared-files list before rendering, avoiding live map traversal while the shared-file list can be updated. - App hardening commit
b33efb8locks Web graph history reads and writes aroundm_Params.PointsForWeb. - App hardening commit
2cca5acreleases the locale allocated during Web template reload and guards the timestamp path against locale/time conversion failure. - App hardening commit
942c484initializes WebSocket queued-send chunks before allocation of their payload buffers, making allocation-failure cleanup safe. - Validation:
python -m pytest tests\python\test_rest_api_smoke.py tests\python\test_live_e2e_suite.py -qpython -m emule_workspace build app --config Release --platform x64python -m emule_workspace test python --path tests\python\test_rest_api_smoke.py --quietpython -m emule_workspace test python --path tests\python\test_live_e2e_suite.py --quietpython scripts\rest-api-smoke.py --helpgit -C repos\emulebb-build-tests diff --checkgit -C repos\emulebb-build diff --checkpython -m emule_workspace validatepython -m emule_workspace test live-e2e --config Release --platform x64 --suite rest-api --skip-live-seed-refresh --rest-webserver-scheme https --rest-coverage-budget smoke --rest-stress-budget off --rest-socket-adversity-budget off --rest-tls-handshake-adversity-budget off --rest-leak-churn-budget smoke --rest-stop-start-after-churn --rest-server-search-count 0 --rest-kad-search-count 0 --rest-download-trigger-count 0
Pending Release Evidence¶
- None. Legacy listen-socket public-network churn remains out of Beta 0.7.3 scope unless a future release gate promotes a deterministic offline seam.