Create `app._experimental_server()` version of LLM Inference Examples by molocule · Pull Request #1580 · modal-labs/modal-examples

molocule · 2026-06-02T00:26:06Z

Type of Change

New example for the GitHub repo
- New example for the documentation site (Linked from a discoverable page, e.g. via the sidebar in /docs/examples)
Example updates (Bug fixes, new features, etc.)
Other (Changes to the codebase, but not to examples)

Monitoring Checklist

Example is configured for testing in the synthetic monitoring system, or lambda-test: false is provided in the example frontmatter and I have gotten approval from a maintainer
- Example is tested by executing with modal run, or an alternative cmd is provided in the example frontmatter (e.g. cmd: ["modal", "serve"])
- Example is tested by running the cmd with no arguments, or the args are provided in the example frontmatter (e.g. args: ["--prompt", "Formula for room temperature superconductor:"]
- Example does not require third-party dependencies besides fastapi to be installed locally (e.g. does not import requests or torch in the global scope or other code executed locally)

Documentation Site Checklist

Content

Example is documented with comments throughout, in a Literate Programming style
All media assets for the example that are rendered in the documentation site page are retrieved from modal-cdn.com

Build Stability

Example pins all dependencies in container images
- Example pins container images to a stable tag like v1, not a dynamic tag like latest
- Example specifies a python_version for the base image, if it is used
- Example pins all dependencies to at least SemVer minor version, ~=x.y.z or ==x.y, or we expect this example to work across major versions of the dependency and are committed to maintenance across those versions
  - Example dependencies with version < 1 are pinned to patch version, ==0.y.z

Outside Contributors

You're great! Thanks for your contribution.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

thomasjpfan · 2026-06-05T14:12:42Z

+        """Start SGLang server process and wait for it to be ready"""
+        self.proc = _start_server()
+        wait_for_server_ready()


In the lift and shift world, I think we can use a readiness probe to specify this:

@app._experimental_server( readiness_probe=modal.Probe.with_http("/healthz", SGLANG_PORT) )

We do not have with_http yet, but it's basically like kubernetes's readinessProbe + httpGet.

thomasjpfan · 2026-06-05T14:16:21Z

+    @modal.exit()
+    def stop(self):
+        """Terminate the SGLang server process"""
+        self.proc.terminate()
+        self.proc.wait()


In the lift and shift world:

app._experimental_server( name="Server", cmd=["sglang", ...], # When you pass `cmd` you can no longer decorate a class readiness_probe=modal.Probe.with_http("/healthz", SGLANG_PORT), )

devin-ai-integration

Devin Review found 2 new potential issues.

devin-ai-integration

Devin Review found 1 new potential issue.

devin-ai-integration · 2026-06-12T19:36:38Z

+    # allow generous time for all replicas to spin up based on rough heuristic;
+    # remove this sleep and increase CONTAINERS
+    # to observe session routing changes during autoscaling
+    await asyncio.sleep(5 + ((CONTAINERS - 10) // 2))


🚩 server_sticky.py sleep heuristic gives only 1 second wait with default CONTAINERS=2

At line 134, asyncio.sleep(5 + ((CONTAINERS - 10) // 2)) evaluates to asyncio.sleep(1) when CONTAINERS=2. The comment says "allow generous time for all replicas to spin up" but 1 second may not be enough for containers to become ready. The formula only gives meaningful positive delays when CONTAINERS > 10. This could cause flaky test results if the second container isn't ready after 1 second, though the test would just see routing to a single container (possibly causing false assertion failures for the sticky test).

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 3 new potential issues.

devin-ai-integration · 2026-06-12T21:55:19Z

    - name: Install the modal client
      shell: bash
-      run: uv pip install --system modal
+      run: uv pip install --system --prerelease allow modal


🚩 CI now installs pre-release modal versions

The setup action changes from uv pip install --system modal to uv pip install --system --prerelease allow modal, which means CI will pick up pre-release versions of the modal package. This is presumably intentional since the PR uses new API methods like app._experimental_server and Server.get_url() that may only exist in pre-release builds. Worth verifying this is temporary (for testing the new API) or intended as permanent.

Was this helpful? React with 👍 or 👎 to provide feedback.

yeah, let's remove this once the release goes out and before merging

devin-ai-integration

Devin Review found 4 new potential issues.

devin-ai-integration · 2026-06-12T23:02:58Z

🚩 Incomplete migration: sglang_snapshot.py and http_server.py still use old API

Files 06_gpu_and_ml/llm-serving/sglang_snapshot.py and 07_web/http_server.py still use the old import modal.experimental + @modal.experimental.http_server + @modal.concurrent pattern. These were not touched by this PR. If the old API is being deprecated, these will need updates in a follow-up. The http_server_sticky.py appears to be the predecessor of server_sticky.py (both exist simultaneously).

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 2 new potential issues.

devin-ai-integration · 2026-06-12T23:16:49Z

 import aiohttp
 import modal
-import modal.experimental
+from modal.server import Server


📝 Info: AGENTS.md: from modal.server import Server is acceptable for submodule imports

Three files (lfm_snapshot.py, sglang_kitchen_sink.py, vllm_low_latency.py) use from modal.server import Server. AGENTS.md says to prefer modal.X over direct imports, but Server is not available on the top-level modal module — it's only accessible via modal.server.Server. Since the rule was written for items like Image, Volume, etc. that are available as modal.Image, modal.Volume, this import pattern is the practical way to access the Server class and doesn't violate the spirit of the rule.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 4 new potential issues.

devin-ai-integration

Devin Review found 1 new potential issue.

charlesfrye · 2026-06-14T03:07:53Z

We need .aio on the calls to get_url, eg here, in order to avoid warnings when the examples are run, eg here

create server version of examples

c48badf

This comment was marked as resolved.

Sign in to view

molocule marked this pull request as draft June 2, 2026 00:48

Update 06_gpu_and_ml/llm-serving/sglang_kitchen_sink_server.py

af12ee5

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

thomasjpfan reviewed Jun 5, 2026

View reviewed changes

molocule added 5 commits June 10, 2026 13:34

Merge branch 'main' into create-server-version-of-examples

9673f1d

Merge branch 'main' into create-server-version-of-examples

5ffc9ac

rename

607c244

restored

48ea1ee

fix routing region

648e406

molocule marked this pull request as ready for review June 12, 2026 16:39

molocule added 4 commits June 12, 2026 12:41

fix

da5bca1

missing cinna

afd5f49

references

41d53f4

fix

1185219

This comment was marked as resolved.

Sign in to view

fix

f024187

This comment was marked as resolved.

Sign in to view

molocule added 3 commits June 12, 2026 14:55

Update sglang_low_latency.py

7ca852a

Update lfm_snapshot.py

1ee170d

fix

f6593be

This comment was marked as resolved.

Sign in to view

molocule added 2 commits June 12, 2026 15:07

fix

225fbf6

Update server_sticky.py

f31614d

devin-ai-integration Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread 06_gpu_and_ml/llm-serving/lfm_snapshot.py Outdated

Comment thread 07_web/server_sticky.py

fix

7e1169c

devin-ai-integration Bot reviewed Jun 12, 2026

View reviewed changes

molocule added 3 commits June 12, 2026 15:41

Update server.py

71cda1c

Update lfm_snapshot.py

9011450

Update lfm_snapshot.py

614cc33

molocule added 3 commits June 12, 2026 15:50

add

ec2d0ab

Update vllm_low_latency.py

3f386ca

Update sglang_snapshot.py

96e5441

This comment was marked as resolved.

Sign in to view

molocule added 3 commits June 12, 2026 16:05

Update sglang_kitchen_sink.py

11cedef

cls -> server

8a940be

Update server_sticky.py

04c7e5a

This comment was marked as resolved.

Sign in to view

molocule added 3 commits June 12, 2026 16:56

Update server.py

245b57c

Update server.py

e1f7205

Update action.yml

ca20d4b

devin-ai-integration Bot reviewed Jun 12, 2026

View reviewed changes

molocule added 3 commits June 12, 2026 18:49

fix import

0f8f351

fix

5a001e5

Update lfm_snapshot.py

40a703a

devin-ai-integration Bot reviewed Jun 12, 2026

View reviewed changes

fix sync vs async

9994372

devin-ai-integration Bot reviewed Jun 12, 2026

View reviewed changes

fix

0f9607a

devin-ai-integration Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread 06_gpu_and_ml/llm-serving/lfm_snapshot.py Outdated

Comment thread 06_gpu_and_ml/llm-serving/lfm_snapshot.py

Comment thread 06_gpu_and_ml/llm-serving/vllm_low_latency.py

Comment thread 06_gpu_and_ml/llm-serving/sglang_kitchen_sink.py

molocule added 3 commits June 12, 2026 20:07

empty

f4c525f

Update lfm_snapshot.py

f4e623a

Update lfm_snapshot.py

d41599b

devin-ai-integration Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread 06_gpu_and_ml/llm-serving/sglang_low_latency.py

Conversation

molocule commented Jun 2, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Monitoring Checklist

Documentation Site Checklist

Content

Build Stability

Outside Contributors

Uh oh!

This comment was marked as resolved.

Uh oh!

thomasjpfan Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

charlesfrye Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

charlesfrye commented Jun 14, 2026

molocule commented Jun 2, 2026 •

edited by devin-ai-integration Bot

Loading

thomasjpfan Jun 5, 2026 •

edited

Loading

devin-ai-integration Bot Jun 12, 2026 •

edited

Loading