Skip to content

Interpreted event loop: socket/TLS workloads degrade 50-100x under ambient process load (investigation) #198

Description

@nickna

Symptom

TlsServerAndClientHandshake_Interpreted (guest-code TLS server + client handshake over loopback) runs in 0.27-0.32s in isolation but 13.6-30.3s inside the full SharpTS.Tests suite. This is the single largest contributor to the suite''s wall-clock critical path.

This is filed as a runtime concern, not a test concern: whatever stalls the interp event loop ~100x under load likely affects real interpreted socket/TLS workloads on busy hosts.

What the investigation established (all measured)

  • Not the test or TLS: the pure-C# twin (TlsServerAndClientHandshake, raw SslStream, no interpreter) runs in 0.09-0.12s in the same full-suite runs. Machine, Defender, lsass, loopback all exonerated.
  • Not class state: the whole TlsModuleTests class alone -> 0.32s.
  • Not any single neighbor category: pairing with NetModule/HttpModule, Cluster/ChildProcess, or Crypto/Https classes -> 0.18-0.22s.
  • Partially the subprocess tier: pairing with IntegrationTests+PackagingTests (dotnet child-process spawns, DLL compiles to disk) -> 2.6s (10x). Full effect needs suite breadth.
  • Not raw CPU parallelism, and inversely correlated with it: maxParallelThreads=8 -> 30.3s, default aggressive (~16) -> 19.6s, =24 -> 13.6s.
  • Parallel-knob tuning is a dead end: suite wall is flat 53-62s across aggressive/conservative/8/24; DOTNET_ThreadPool_ForceMinWorkerThreads=128 was catastrophic (294s wall, 10x test-second inflation). Current xunit config is near-optimal; do not re-tune without new evidence.
  • Stack samples during the stall window show heavy GC churn (PollGC workers, spin-waiters), in-process IL compilation (MetadataBuilder/Ecma335 frames), and compiled-test threads parked in emitted $Runtime..cctor via InitClassSlow.
  • The ScheduleTimer(0, ...) hop path used by SharpTSTlsSocket to marshal handshake completions onto the event loop does wake a blocked loop (verified) - though Interpreter.ScheduleTimer: cross-thread timers with delay > 10ms never wake a blocked event loop (worst-case latency up to 60s) #194 is adjacent (delays >10ms do not wake).

Mechanism (test harness side)

Every test runs as Task.Run(work) + task.Wait(timeout) (SharpTS.Tests/Infrastructure/TestHarness.cs), so each concurrent test parks a thread-pool thread for its full duration, with the interp event loop itself running on a TP thread and its I/O completions queueing behind parked workers. The TLS test makes many small loop-hops per handshake stage, so per-hop latency multiplies.

Next step

Black-box sampling is exhausted. Needs in-situ instrumentation: env-gated stage timestamps in SharpTSTlsSocket (connect / authenticate / per-hop ScheduleTimer-to-callback latency) plus event-loop wake-to-dispatch latency, run inside the full suite to localize which hop class eats the time (TP queue delay vs GC suspension vs TryTake wake).

Related: #94 (dotnet test speed - this is its biggest single line item), #194 (timer wake gap), #110 (Test262 Pass->Timeout under parallel-regen contention - plausibly the same amplification mechanism on the Test262 side).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions