Skip to content

python-3.12.10-atls model files#78

Open
mzueva wants to merge 2 commits into
mainfrom
mzueva/immune-builder-model
Open

python-3.12.10-atls model files#78
mzueva wants to merge 2 commits into
mainfrom
mzueva/immune-builder-model

Conversation

@mzueva
Copy link
Copy Markdown
Contributor

@mzueva mzueva commented May 27, 2026

Download and pack model weights during python-3.12.10-atls env build

Greptile Summary

This PR adds a Node.js script that pre-fetches ~1.3 GB of ImmuneBuilder model weights from Zenodo at build time, staging them into shared/immunebuilder-weights/ so they can be bundled into the python-3.12.10-atls runenv instead of being downloaded on first use at runtime.

  • Download script (download-immunebuilder-weights.mjs): Idempotent via size-based cache check, uses atomic .part \u2192 final rename, and cleans up partial files on error. Two suggestions: add a per-request AbortController timeout to prevent indefinite hangs on stalled Zenodo connections, and switch to Promise.all for concurrent downloads.
  • config.json: Correctly wires shared/immunebuilder-weights/ \u2192 share/immunebuilder-weights/ for linux-x64 and macosx-aarch64.
  • package.json / .gitignore: Build script updated to run fetch-weights before pl-py-builder; weight directory correctly excluded from git.

Confidence Score: 4/5

Safe to merge; the download script is well-structured with atomic writes and cleanup on failure, and the config/package changes are correct.

The download script handles the happy path correctly and all config changes are straightforward. The missing per-request timeout means a stalled Zenodo connection can block the build indefinitely, and sequential downloads of ~1.3 GB inflate CI time unnecessarily. Neither blocks correctness, but both are worth addressing before this runs routinely in CI.

python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs — timeout handling and download concurrency

Important Files Changed

Filename Overview
python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs New script to pre-fetch ~1.3 GB of ImmuneBuilder model weights from Zenodo; idempotent with size-based cache check and atomic rename. Missing a per-request timeout and uses sequential downloads.
python-3.12.10-atls/config.json Adds copyFiles entry mapping shared/immunebuilder-weights/ to share/immunebuilder-weights/ for linux-x64 and macosx-aarch64; correctly omits unsupported platforms.
python-3.12.10-atls/package.json Adds fetch-weights script and prepends it to build via &&.
.gitignore Correctly excludes the shared/immunebuilder-weights/ directory from version control.

Sequence Diagram

sequenceDiagram
    participant Build as pnpm build
    participant Script as download-immunebuilder-weights.mjs
    participant FS as shared/immunebuilder-weights/
    participant Zenodo as zenodo.org/record/7258553
    participant Builder as pl-py-builder

    Build->>Script: pnpm run fetch-weights
    loop For each of 8 model files
        Script->>FS: existsSync + statSync (size check)
        alt cached (size matches)
            Script-->>Script: skip
        else missing or wrong size
            Script->>Zenodo: fetch(url, redirect:follow)
            Zenodo-->>Script: response body stream
            Script->>FS: pipeline to file.part
            Script->>Script: verify size
            Script->>FS: rename .part to final
        end
    end
    Script-->>Build: exit 0
    Build->>Builder: pl-py-builder
    Builder->>FS: copyFiles shared to share (linux-x64, macosx-aarch64)
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs:43-46
No fetch/stream timeout is set. Each model file is up to ~214 MB, and Zenodo is a public academic mirror that can stall mid-transfer. Without an `AbortController` timeout, a stalled connection will block the build indefinitely — the process will never proceed past `pipeline(res.body, ...)` even if the remote end has gone silent.

```suggestion
  const url = `${BASE}/${name}?download=1`;
  console.log(`  ↓ ${name} from ${url}`);
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), 15 * 60 * 1000);
  let res;
  try {
    res = await fetch(url, { redirect: 'follow', signal: controller.signal });
  } finally {
    clearTimeout(timer);
  }
  if (!res.ok) throw new Error(`HTTP ${res.status} for ${url}`);
```

### Issue 2 of 2
python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs:62-67
Downloads are sequential — each file waits for the previous one to complete. The 8 files total roughly 1.3 GB from a single Zenodo record; downloading them concurrently would cut wall-clock build time substantially.

```suggestion
mkdirSync(targetDir, { recursive: true });
console.log(`Fetching ImmuneBuilder weights into ${targetDir}`);
await Promise.all(FILES.map(f => downloadOne(f)));
console.log('ImmuneBuilder weights ready');
```

Reviews (1): Last reviewed commit: "Download and pack model weights during p..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces pre-fetching of ImmuneBuilder model weights during the build process. It adds a new Node.js script (download-immunebuilder-weights.mjs) to download these weights from Zenodo, updates package.json to trigger this script during build, configures config.json to stage the weights, and updates .gitignore to ignore the downloaded files. The review feedback focuses on improving the download script by replacing synchronous file system operations (statSync, unlinkSync, existsSync) with their asynchronous counterparts from node:fs/promises to prevent blocking the event loop and avoid TOCTOU race conditions. Additionally, it is recommended to add a timeout to the fetch request to prevent hanging and to verify that res.body is not null before streaming.

Comment thread python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs Outdated
Comment thread python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs
Comment thread python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs Outdated
Comment thread python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs
Comment thread python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs Outdated
Comment thread python-3.12.10-atls/scripts/download-immunebuilder-weights.mjs Outdated
@mzueva mzueva changed the title python-3.12.10-atls model python-3.12.10-atls model files May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant