feat: add client-side profiling trigger#326
Conversation
Adds an optional client-side trigger that fires POST /start_profile
at the performance phase start and /stop_profile at run end, so a
profiled run can be driven from a YAML/CLI flag without coupling
endpoints to any vendor harness.
Schema: ProfilerEngine enum (currently {vllm}) and ProfilingConfig
hung off Settings. URLs are auto-derived per entry in
endpoint_config.endpoints (strip /v1, append engine-specific path).
Default-off; warn-don't-fail throughout.
Report.txt gets a Profiling section and a sibling profiling.json is
written next to result_summary.json when the trigger is enabled.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces client-side profiling triggers for inference engines (specifically supporting vLLM) around the performance phase of a benchmark run. It adds a --profile CLI option, config schema updates, and logic to trigger start/stop profiling endpoints. The review feedback highlights a critical issue where the profiling stop requests are synchronous blocking network calls executed within the main event loop during the cleanup phase. This can block the event loop or prevent subsequent cleanup steps if interrupted, and should be run asynchronously using an executor.
| if profile_starts: | ||
| stop_reason = "phase_end" if session_completed_normally else "abort" | ||
| for i, start_rec in enumerate(profile_starts): | ||
| if start_rec["status"] != 200 or i >= len(profile_stop_urls): | ||
| continue | ||
| rec = _post_profile(profile_stop_urls[i]) | ||
| rec["stop_reason"] = stop_reason | ||
| if rec["status"] == 200: | ||
| logger.info("Profile stop: %s -> 200 OK", profile_stop_urls[i]) | ||
| else: | ||
| logger.warning( | ||
| "Profile stop: %s -> %s", | ||
| profile_stop_urls[i], | ||
| rec["error"] or rec["status"], | ||
| ) | ||
| profile_stops.append(rec) |
There was a problem hiding this comment.
The profiling stop requests are synchronous blocking network calls (_post_profile uses urllib.request.urlopen with a 2-second timeout). Running these sequentially in the main event loop thread inside the finally block can block the event loop for several seconds if the endpoints are slow or unresponsive.
Furthermore, if the user interrupts the cleanup process (e.g., by pressing Ctrl+C a second time because it appears hung), a KeyboardInterrupt will be raised during these blocking calls, which will bypass the rest of the critical cleanup (such as shutting down the HTTP client, closing the publisher, and terminating launcher services).
To prevent this, we should run the profiling stop requests asynchronously in a thread pool using loop.run_in_executor and wrap the entire block in a try...except BaseException to ensure that any network failures or interrupts during profiling stop do not prevent the rest of the cleanup from executing.
if profile_starts:
try:
stop_reason = "phase_end" if session_completed_normally else "abort"
for i, start_rec in enumerate(profile_starts):
if start_rec["status"] != 200 or i >= len(profile_stop_urls):
continue
try:
rec = await loop.run_in_executor(
None, _post_profile, profile_stop_urls[i]
)
rec["stop_reason"] = stop_reason
if rec["status"] == 200:
logger.info("Profile stop: %s -> 200 OK", profile_stop_urls[i])
else:
logger.warning(
"Profile stop: %s -> %s",
profile_stop_urls[i],
rec["error"] or rec["status"],
)
profile_stops.append(rec)
except Exception as e:
logger.warning(
"Failed to stop profile for %s: %s",
profile_stop_urls[i],
e,
)
except BaseException as e:
logger.warning("Profiling stop cleanup was interrupted: %s", e)
Adds an optional client-side trigger that fires POST /start_profile at the performance phase start and /stop_profile at run end, so a profiled run can be driven from a YAML/CLI flag without coupling endpoints to any vendor harness.
Schema: ProfilerEngine enum (currently {vllm}) and ProfilingConfig hung off Settings. URLs are auto-derived per entry in endpoint_config.endpoints (strip /v1, append engine-specific path). Default-off; warn-don't-fail throughout.
Report.txt gets a Profiling section and a sibling profiling.json is written next to result_summary.json when the trigger is enabled.
closes #324
Type of change
Related issues
Testing
Checklist