python-3.12.10-atls model files#78
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces pre-fetching of ImmuneBuilder model weights during the build process. It adds a new Node.js script (download-immunebuilder-weights.mjs) to download these weights from Zenodo, updates package.json to trigger this script during build, configures config.json to stage the weights, and updates .gitignore to ignore the downloaded files. The review feedback focuses on improving the download script by replacing synchronous file system operations (statSync, unlinkSync, existsSync) with their asynchronous counterparts from node:fs/promises to prevent blocking the event loop and avoid TOCTOU race conditions. Additionally, it is recommended to add a timeout to the fetch request to prevent hanging and to verify that res.body is not null before streaming.
Download and pack model weights during python-3.12.10-atls env build
Greptile Summary
This PR adds a Node.js script that pre-fetches ~1.3 GB of ImmuneBuilder model weights from Zenodo at build time, staging them into
shared/immunebuilder-weights/so they can be bundled into thepython-3.12.10-atlsrunenv instead of being downloaded on first use at runtime.download-immunebuilder-weights.mjs): Idempotent via size-based cache check, uses atomic.part\u2192 final rename, and cleans up partial files on error. Two suggestions: add a per-requestAbortControllertimeout to prevent indefinite hangs on stalled Zenodo connections, and switch toPromise.allfor concurrent downloads.config.json: Correctly wiresshared/immunebuilder-weights/\u2192share/immunebuilder-weights/forlinux-x64andmacosx-aarch64.package.json/.gitignore: Build script updated to runfetch-weightsbeforepl-py-builder; weight directory correctly excluded from git.Confidence Score: 4/5
Safe to merge; the download script is well-structured with atomic writes and cleanup on failure, and the config/package changes are correct.
The download script handles the happy path correctly and all config changes are straightforward. The missing per-request timeout means a stalled Zenodo connection can block the build indefinitely, and sequential downloads of ~1.3 GB inflate CI time unnecessarily. Neither blocks correctness, but both are worth addressing before this runs routinely in CI.
python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs — timeout handling and download concurrency
Important Files Changed
Sequence Diagram
sequenceDiagram participant Build as pnpm build participant Script as download-immunebuilder-weights.mjs participant FS as shared/immunebuilder-weights/ participant Zenodo as zenodo.org/record/7258553 participant Builder as pl-py-builder Build->>Script: pnpm run fetch-weights loop For each of 8 model files Script->>FS: existsSync + statSync (size check) alt cached (size matches) Script-->>Script: skip else missing or wrong size Script->>Zenodo: fetch(url, redirect:follow) Zenodo-->>Script: response body stream Script->>FS: pipeline to file.part Script->>Script: verify size Script->>FS: rename .part to final end end Script-->>Build: exit 0 Build->>Builder: pl-py-builder Builder->>FS: copyFiles shared to share (linux-x64, macosx-aarch64)Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "Download and pack model weights during p..." | Re-trigger Greptile