feat: add client-side profiling trigger by maxyanghu · Pull Request #326 · mlcommons/endpoints

maxyanghu · 2026-05-26T19:47:32Z

Adds an optional client-side trigger that fires POST /start_profile at the performance phase start and /stop_profile at run end, so a profiled run can be driven from a YAML/CLI flag without coupling endpoints to any vendor harness.

Schema: ProfilerEngine enum (currently {vllm}) and ProfilingConfig hung off Settings. URLs are auto-derived per entry in endpoint_config.endpoints (strip /v1, append engine-specific path). Default-off; warn-don't-fail throughout.

Report.txt gets a Profiling section and a sibling profiling.json is written next to result_summary.json when the trigger is enabled.

closes #324

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Adds an optional client-side trigger that fires POST /start_profile at the performance phase start and /stop_profile at run end, so a profiled run can be driven from a YAML/CLI flag without coupling endpoints to any vendor harness. Schema: ProfilerEngine enum (currently {vllm}) and ProfilingConfig hung off Settings. URLs are auto-derived per entry in endpoint_config.endpoints (strip /v1, append engine-specific path). Default-off; warn-don't-fail throughout. Report.txt gets a Profiling section and a sibling profiling.json is written next to result_summary.json when the trigger is enabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-26T19:47:45Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request introduces client-side profiling triggers for inference engines (specifically supporting vLLM) around the performance phase of a benchmark run. It adds a --profile CLI option, config schema updates, and logic to trigger start/stop profiling endpoints. The review feedback highlights a critical issue where the profiling stop requests are synchronous blocking network calls executed within the main event loop during the cleanup phase. This can block the event loop or prevent subsequent cleanup steps if interrupted, and should be run asynchronously using an executor.

gemini-code-assist · 2026-05-26T19:49:10Z

+            if profile_starts:
+                stop_reason = "phase_end" if session_completed_normally else "abort"
+                for i, start_rec in enumerate(profile_starts):
+                    if start_rec["status"] != 200 or i >= len(profile_stop_urls):
+                        continue
+                    rec = _post_profile(profile_stop_urls[i])
+                    rec["stop_reason"] = stop_reason
+                    if rec["status"] == 200:
+                        logger.info("Profile stop: %s -> 200 OK", profile_stop_urls[i])
+                    else:
+                        logger.warning(
+                            "Profile stop: %s -> %s",
+                            profile_stop_urls[i],
+                            rec["error"] or rec["status"],
+                        )
+                    profile_stops.append(rec)


The profiling stop requests are synchronous blocking network calls (_post_profile uses urllib.request.urlopen with a 2-second timeout). Running these sequentially in the main event loop thread inside the finally block can block the event loop for several seconds if the endpoints are slow or unresponsive.

Furthermore, if the user interrupts the cleanup process (e.g., by pressing Ctrl+C a second time because it appears hung), a KeyboardInterrupt will be raised during these blocking calls, which will bypass the rest of the critical cleanup (such as shutting down the HTTP client, closing the publisher, and terminating launcher services).

To prevent this, we should run the profiling stop requests asynchronously in a thread pool using loop.run_in_executor and wrap the entire block in a try...except BaseException to ensure that any network failures or interrupts during profiling stop do not prevent the rest of the cleanup from executing.

if profile_starts: try: stop_reason = "phase_end" if session_completed_normally else "abort" for i, start_rec in enumerate(profile_starts): if start_rec["status"] != 200 or i >= len(profile_stop_urls): continue try: rec = await loop.run_in_executor( None, _post_profile, profile_stop_urls[i] ) rec["stop_reason"] = stop_reason if rec["status"] == 200: logger.info("Profile stop: %s -> 200 OK", profile_stop_urls[i]) else: logger.warning( "Profile stop: %s -> %s", profile_stop_urls[i], rec["error"] or rec["status"], ) profile_stops.append(rec) except Exception as e: logger.warning( "Failed to stop profile for %s: %s", profile_stop_urls[i], e, ) except BaseException as e: logger.warning("Profiling stop cleanup was interrupted: %s", e)

maxyanghu requested a review from a team May 26, 2026 19:47

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

wangshangsam requested a review from arekay-nv May 27, 2026 05:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add client-side profiling trigger#326

feat: add client-side profiling trigger#326
maxyanghu wants to merge 1 commit into
mlcommons:mainfrom
CentML:feat/maxhu-add-profiler-trigger

maxyanghu commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maxyanghu commented May 26, 2026

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant