Skip to content

feat: use ruff analyze graph to detect files for the code bundle #7595

Description

@pingsutw

Why

When you flyte.run(...) with the default copy_style="loaded_modules", the SDK decides which local files to ship in the code bundle by inspecting sys.modules at runtime (list_imported_modules_as_files). That has two downsides:

  • It can miss files. Only modules already imported by the time bundling runs are captured. Lazy imports, conditional imports (if TYPE_CHECKING:, inside-function imports), or modules imported later on the cluster can be silently left out of the bundle, surfacing as a ModuleNotFoundError at execution time.
  • It depends on interpreter state, so the bundle contents can vary with how/where flyte run is invoked.

A user asked for a more reliable, static way to determine the dependency set. ruff analyze graph produces a first-party import dependency graph statically (no need to actually import anything) and is fast. This is a nice-to-have, not a blocker — a good community contribution.

Concrete example: works with ruff analyze graph, broken today

# helper.py
def compute() -> int:
    return 42
# main.py  (the entrypoint you `flyte.run`)
import flyte

env = flyte.TaskEnvironment(name="demo")

@env.task
async def main() -> int:
    from helper import compute  # lazy, function-level import
    return compute()

Run it: flyte.run(main).

  • Today (loaded_modules): at packaging time on your laptop, main() never executes, so the
    from helper import compute line never runs and helper is not in sys.modules. The bundle
    ships without helper.py, and the task fails on the cluster with ModuleNotFoundError: No module named 'helper'.
  • With ruff analyze graph: static analysis sees the from helper import ... edge regardless of
    where it appears, so helper.py is included and the run succeeds.

The same gap applies to imports guarded by if TYPE_CHECKING: that are later needed at runtime, and
to any module only reachable through a code path that isn't exercised locally before packaging.

What to change

Note: the code bundling logic lives in the flyteorg/flyte-sdk repo (Python SDK), under src/flyte/_code_bundle/. The paths below refer to that repo. This issue is tracked here for v2 visibility.

Reimplement the existing loaded_modules detection on top of ruff analyze graph, which emits a JSON map of file -> [files it imports]. No new copy_style value or public API change — just make loaded_modules discover files statically from the import graph instead of from runtime sys.modules. Starting from the entrypoint module(s), walk that graph to collect the transitive set of first-party files, then feed them into the existing bundling path.

  • src/flyte/_code_bundle/_utils.py — in ls_files(), the copy_file_detection == "loaded_modules" branch currently calls list_imported_modules_as_files(str(source_path), sys_modules). Replace that implementation so it shells out to ruff analyze graph, parses the JSON, and resolves the transitive imports of the entrypoint under source_path. Keep filtering to files within source_path (drop third-party/stdlib), so the returned file list matches the existing contract.
  • The CopyFiles literal stays as-is (Literal["loaded_modules", "all", "none", "custom"]) — no new value.
  • Handle ruff not being installed: detect shutil.which("ruff") and fall back to the current sys.modules approach with a clear log. Don't hard-fail.
  • tests/ — cover a project where a module is imported lazily/conditionally and assert loaded_modules now includes it.

Outcome

  • copy_style="loaded_modules" (the default) bundles the full transitive first-party import set of the entrypoint, including lazily/conditionally imported local modules
  • Third-party and stdlib files are excluded (bundle stays minimal)
  • Graceful fallback to the old sys.modules behavior when ruff is not on PATH
  • Tests cover the lazy/conditional-import case that loaded_modules previously missed
  • No public API change (CopyFiles values unchanged)

Getting started

  • See the current logic: list_imported_modules_as_files and ls_files in src/flyte/_code_bundle/_utils.py (the copy_file_detection == "loaded_modules" branch).
  • Try the tool: ruff analyze graph path/to/entrypoint.py (outputs JSON). Docs: https://docs.astral.sh/ruff/
  • Entry points to read: src/flyte/_code_bundle/bundle.py (build_code_bundle), src/flyte/_run.py (copy_style).
  • Test: pytest tests/ — add a fixture project under a tmp dir with a lazy import.
  • Setup: see CONTRIBUTING in the flyte-sdk repo.

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions