Why
When you flyte.run(...) with the default copy_style="loaded_modules", the SDK decides which local files to ship in the code bundle by inspecting sys.modules at runtime (list_imported_modules_as_files). That has two downsides:
- It can miss files. Only modules already imported by the time bundling runs are captured. Lazy imports, conditional imports (
if TYPE_CHECKING:, inside-function imports), or modules imported later on the cluster can be silently left out of the bundle, surfacing as a ModuleNotFoundError at execution time.
- It depends on interpreter state, so the bundle contents can vary with how/where
flyte run is invoked.
A user asked for a more reliable, static way to determine the dependency set. ruff analyze graph produces a first-party import dependency graph statically (no need to actually import anything) and is fast. This is a nice-to-have, not a blocker — a good community contribution.
Concrete example: works with ruff analyze graph, broken today
# helper.py
def compute() -> int:
return 42
# main.py (the entrypoint you `flyte.run`)
import flyte
env = flyte.TaskEnvironment(name="demo")
@env.task
async def main() -> int:
from helper import compute # lazy, function-level import
return compute()
Run it: flyte.run(main).
- Today (
loaded_modules): at packaging time on your laptop, main() never executes, so the
from helper import compute line never runs and helper is not in sys.modules. The bundle
ships without helper.py, and the task fails on the cluster with ModuleNotFoundError: No module named 'helper'.
- With
ruff analyze graph: static analysis sees the from helper import ... edge regardless of
where it appears, so helper.py is included and the run succeeds.
The same gap applies to imports guarded by if TYPE_CHECKING: that are later needed at runtime, and
to any module only reachable through a code path that isn't exercised locally before packaging.
What to change
Note: the code bundling logic lives in the flyteorg/flyte-sdk repo (Python SDK), under src/flyte/_code_bundle/. The paths below refer to that repo. This issue is tracked here for v2 visibility.
Reimplement the existing loaded_modules detection on top of ruff analyze graph, which emits a JSON map of file -> [files it imports]. No new copy_style value or public API change — just make loaded_modules discover files statically from the import graph instead of from runtime sys.modules. Starting from the entrypoint module(s), walk that graph to collect the transitive set of first-party files, then feed them into the existing bundling path.
src/flyte/_code_bundle/_utils.py — in ls_files(), the copy_file_detection == "loaded_modules" branch currently calls list_imported_modules_as_files(str(source_path), sys_modules). Replace that implementation so it shells out to ruff analyze graph, parses the JSON, and resolves the transitive imports of the entrypoint under source_path. Keep filtering to files within source_path (drop third-party/stdlib), so the returned file list matches the existing contract.
- The
CopyFiles literal stays as-is (Literal["loaded_modules", "all", "none", "custom"]) — no new value.
- Handle
ruff not being installed: detect shutil.which("ruff") and fall back to the current sys.modules approach with a clear log. Don't hard-fail.
tests/ — cover a project where a module is imported lazily/conditionally and assert loaded_modules now includes it.
Outcome
Getting started
- See the current logic:
list_imported_modules_as_files and ls_files in src/flyte/_code_bundle/_utils.py (the copy_file_detection == "loaded_modules" branch).
- Try the tool:
ruff analyze graph path/to/entrypoint.py (outputs JSON). Docs: https://docs.astral.sh/ruff/
- Entry points to read:
src/flyte/_code_bundle/bundle.py (build_code_bundle), src/flyte/_run.py (copy_style).
- Test:
pytest tests/ — add a fixture project under a tmp dir with a lazy import.
- Setup: see CONTRIBUTING in the flyte-sdk repo.
Why
When you
flyte.run(...)with the defaultcopy_style="loaded_modules", the SDK decides which local files to ship in the code bundle by inspectingsys.modulesat runtime (list_imported_modules_as_files). That has two downsides:if TYPE_CHECKING:, inside-function imports), or modules imported later on the cluster can be silently left out of the bundle, surfacing as aModuleNotFoundErrorat execution time.flyte runis invoked.A user asked for a more reliable, static way to determine the dependency set.
ruff analyze graphproduces a first-party import dependency graph statically (no need to actually import anything) and is fast. This is a nice-to-have, not a blocker — a good community contribution.Concrete example: works with
ruff analyze graph, broken todayRun it:
flyte.run(main).loaded_modules): at packaging time on your laptop,main()never executes, so thefrom helper import computeline never runs andhelperis not insys.modules. The bundleships without
helper.py, and the task fails on the cluster withModuleNotFoundError: No module named 'helper'.ruff analyze graph: static analysis sees thefrom helper import ...edge regardless ofwhere it appears, so
helper.pyis included and the run succeeds.The same gap applies to imports guarded by
if TYPE_CHECKING:that are later needed at runtime, andto any module only reachable through a code path that isn't exercised locally before packaging.
What to change
Reimplement the existing
loaded_modulesdetection on top ofruff analyze graph, which emits a JSON map offile -> [files it imports]. No newcopy_stylevalue or public API change — just makeloaded_modulesdiscover files statically from the import graph instead of from runtimesys.modules. Starting from the entrypoint module(s), walk that graph to collect the transitive set of first-party files, then feed them into the existing bundling path.src/flyte/_code_bundle/_utils.py— inls_files(), thecopy_file_detection == "loaded_modules"branch currently callslist_imported_modules_as_files(str(source_path), sys_modules). Replace that implementation so it shells out toruff analyze graph, parses the JSON, and resolves the transitive imports of the entrypoint undersource_path. Keep filtering to files withinsource_path(drop third-party/stdlib), so the returned file list matches the existing contract.CopyFilesliteral stays as-is (Literal["loaded_modules", "all", "none", "custom"]) — no new value.ruffnot being installed: detectshutil.which("ruff")and fall back to the currentsys.modulesapproach with a clear log. Don't hard-fail.tests/— cover a project where a module is imported lazily/conditionally and assertloaded_modulesnow includes it.Outcome
copy_style="loaded_modules"(the default) bundles the full transitive first-party import set of the entrypoint, including lazily/conditionally imported local modulessys.modulesbehavior whenruffis not onPATHloaded_modulespreviously missedCopyFilesvalues unchanged)Getting started
list_imported_modules_as_filesandls_filesinsrc/flyte/_code_bundle/_utils.py(thecopy_file_detection == "loaded_modules"branch).ruff analyze graph path/to/entrypoint.py(outputs JSON). Docs: https://docs.astral.sh/ruff/src/flyte/_code_bundle/bundle.py(build_code_bundle),src/flyte/_run.py(copy_style).pytest tests/— add a fixture project under a tmp dir with a lazy import.