feat: optimize model downloads by skipping duplicate weight formats by Jont828 · Pull Request #248 · ai-dynamo/modelexpress

Jont828 · 2026-04-22T00:50:24Z

Summary

Add --weight-format CLI flag (auto/safetensors/pytorch/all) and MODEL_EXPRESS_WEIGHT_FORMAT env var to control which weight file formats are downloaded
In auto mode (default), prefer safetensors over pytorch/h5/msgpack, deduplicate sharded vs consolidated weights using index file presence, and exclude GGUF files
Add LICENSE, LICENSE.md, LICENSE.txt, and NOTICE to the default ignored files list since they are never used by model runtimes

Closes #173

Test plan

All 292 existing tests pass
13 new unit tests for WeightFormat enum and filter_files_by_weight_format() logic
cargo clippy passes with no warnings
Manual test: download a dual-format model (e.g. openai/gpt-oss-20b) with --weight-format auto and verify only safetensors files are fetched
Manual test: --weight-format pytorch downloads only .bin files
Manual test: --weight-format all preserves current behavior
Manual test: MODEL_EXPRESS_WEIGHT_FORMAT=safetensors env var works

Summary by CodeRabbit

Release Notes

New Features
- Added --weight-format global option to specify model weight file format preferences: auto, safetensors, pytorch, or all (defaults to auto)
- Added MODEL_EXPRESS_WEIGHT_FORMAT environment variable for weight format configuration
Documentation
- Updated CLI documentation with new weight-format option
- Updated deployment documentation with environment variable details

Jont828 · 2026-04-22T00:57:11Z

The lint PR title job is failing with Resource not accessible by integration — looks like it's a permissions issue with the add_label: true setting in the workflow. The GITHUB_TOKEN for fork PRs only gets read access, so it can't add labels to the upstream repo. The PR title itself follows conventional commits format fine. Happy to adjust anything if needed though!

coderabbitai · 2026-04-22T00:57:45Z

Walkthrough

This pull request introduces weight format optimization for model downloads. A new WeightFormat enum (with variants Auto, Safetensors, Pytorch, All) is added with CLI/environment variable support. The weight_format parameter is propagated through all model download APIs from client to server. The HuggingFace provider now implements smart filtering to prefer safetensors, skip redundant format variants, and deduplicate sharded vs. consolidated weights based on index file presence.

Changes

Cohort / File(s)	Summary
Type Definitions & Enum `modelexpress_common/src/models.rs`	New `WeightFormat` enum with `Auto`/`Safetensors`/`Pytorch`/`All` variants; includes `FromStr`, `Display`, `ValueEnum` trait implementations for CLI parsing and serialization.
Protocol & Conversion `modelexpress_common/proto/model.proto`, `modelexpress_common/src/lib.rs`	Added `WeightFormat` enum and `weight_format` field to `ModelDownloadRequest` message; bidirectional `From` conversions between `models::WeightFormat` and `grpc::model::WeightFormat` with test coverage.
Configuration `modelexpress_common/src/client_config.rs`, `docs/CLI.md`, `docs/DEPLOYMENT.md`	Added `weight_format` field to `ClientArgs` and `ClientConfig`; introduced `--weight-format` CLI flag and `MODEL_EXPRESS_WEIGHT_FORMAT` environment variable with default `auto` and allowed values `auto\|safetensors\|pytorch\|all`.
Provider Interface & Base Implementation `modelexpress_common/src/providers.rs`	Updated `ModelProviderTrait::download_model` signature to accept `weight_format: WeightFormat` parameter; expanded default `is_ignored` to exclude additional metadata files (`LICENSE`, `LICENSE.md`, `LICENSE.txt`, `NOTICE`).
HuggingFace Provider `modelexpress_common/src/providers/huggingface.rs`	Implemented weight-format filtering logic with functions `filter_files_by_weight_format`, `is_non_preferred_weight`, `auto_filter_weight_files` to intelligently prefer safetensors, handle sharded vs. consolidated deduplication via index files, and fallback to pytorch; updated `download_model` to apply filtering and added comprehensive unit tests.
NGC Provider `modelexpress_common/src/providers/ngc.rs`	Updated `download_model` signature to include `_weight_format` parameter (unused/underscore-prefixed); updated test calls accordingly.
Download Module `modelexpress_common/src/download.rs`	Updated `download_model` signature to accept `weight_format` parameter and forward to provider implementation; updated all tests to pass `WeightFormat::default()`.
Client Library APIs `modelexpress_client/src/lib.rs`	Re-exported `WeightFormat`; added `weight_format` parameter to all public model download/request methods (`preload_model_to_cache`, `request_model`, `request_model_server_only`, `request_model_with_provider`, `request_model_with_provider_and_fallback`, `request_model_with_smart_fallback`, `download_model_directly`); threaded `weight_format` into gRPC `ModelDownloadRequest` and internal download calls; updated unit/integration tests.
Server Handler `modelexpress_server/src/services.rs`	Parsed `weight_format` from `ModelDownloadRequest`; threaded into `ModelDownloadTracker::ensure_model_downloaded` signature and forwarded to `download::download_model`; updated unit tests to populate `weight_format` field.
Client Binaries & Tests `modelexpress_client/src/bin/test_client.rs`, `modelexpress_client/src/bin/fallback_test.rs`, `modelexpress_client/src/bin/modules/handlers.rs`, `workspace-tests/tests/integration_tests.rs`	Updated imports and call sites to pass `WeightFormat::default()` through model download entry points; thread `weight_format` from config into download flows across `SmartFallback`, `ServerOnly`, and `Direct` strategies.
Architecture Documentation `docs/ARCHITECTURE.md`	Updated `ModelProviderTrait::download_model` signature documentation to reflect new `weight_format: WeightFormat` parameter.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hop skip and a hop through the weight format,
Safetensors first, then pytorch as backup,
No more downloading everything at once,
Smart filtering keeps the disk space intact,
A rabbit's delight: efficient downloads!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly summarizes the main change: adding weight-format optimization to skip duplicate weight formats during model downloads.
Linked Issues check	✅ Passed	All coding objectives from issue `#173` are met: weight-format enum with smart defaults, CLI flag and env var support, auto/safetensors/pytorch/all modes, index-file deduplication logic, and default ignored files expanded.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to implementing weight-format optimization per issue `#173`; no unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

modelexpress_client/src/lib.rs (2)

304-312: ⚠️ Potential issue | 🟠 Major

Use the client’s configured cache for direct fallback.

Both instance fallback paths switch to direct download with CacheConfig::discover()/download_model_directly(), which ignores self.cache_config and the provider-specific cache resolution in get_cache_dir(). A client configured with a custom cache can report fallback success while downloading into a different cache root.

Proposed fix

-                Self::download_model_directly(model_name, provider, ignore_weights, weight_format)
-                    .await
+                download::download_model(
+                    model_name,
+                    provider,
+                    Some(self.get_cache_dir(provider)),
+                    ignore_weights,
+                    weight_format,
+                )
+                .await
+                .map(|_| ())
+                .map_err(|e| {
+                    modelexpress_common::Error::Server(format!("Direct download failed: {e}"))
+                        .into()
+                })

-                    let cache_dir = CacheConfig::discover().ok().map(|config| config.local_path);
+                    let cache_dir = Some(self.get_cache_dir(provider));
                     match download::download_model(&model_name, provider, cache_dir, ignore_weights, weight_format).await {

Also applies to: 417-419

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelexpress_client/src/lib.rs` around lines 304 - 312, The fallback path
uses CacheConfig::discover()/download_model_directly() which ignores the
client's configured cache; update the fallback branches in the code paths that
call Self::download_model_directly (seen around the Err(e) block and the similar
block at lines 417-419) to use the client's self.cache_config (or resolve the
provider-specific dir via self.get_cache_dir(provider, &self.cache_config)) and
pass that cache/config into download_model_directly (or change
download_model_directly signature to accept a cache_config or explicit
cache_dir) so the direct download writes to the same cache the client was
configured to use.

298-299: ⚠️ Potential issue | 🟠 Major

Propagate weight_format through the file streaming request.

When shared_storage is disabled, the filtered server download is followed by a separate stream_model_files request that does not include weight_format. The server streams all cached files without applying the selected format filter, potentially delivering duplicate formats or GGUF files that the download request rejected.

Extend the ModelFilesRequest protobuf message to carry weight_format, update stream_model_files_from_server() to pass it, and ensure the server applies the same filtering during file enumeration.

Implementation changes needed

-                    self.stream_model_files_from_server(model_name, provider)
+                    self.stream_model_files_from_server(model_name, provider, weight_format)
                         .await?;

     pub async fn stream_model_files_from_server(
         &mut self,
         model_name: &str,
         provider: ModelProvider,
+        weight_format: WeightFormat,
     ) -> CommonResult<()> {

         let grpc_request = tonic::Request::new(ModelFilesRequest {
             model_name: model_name.to_string(),
             provider: modelexpress_common::grpc::model::ModelProvider::from(provider) as i32,
             chunk_size,
+            weight_format: modelexpress_common::grpc::model::WeightFormat::from(weight_format) as i32,
         });

Also applies to: 403-404, 466-470

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelexpress_client/src/lib.rs` around lines 298 - 299, The stream path is
missing the selected weight_format so the server streams all cached files;
extend the ModelFilesRequest protobuf to include a weight_format field, update
the client call sites (e.g., where
self.stream_model_files_from_server(model_name, provider) is invoked at the
noted locations) and change the signature of stream_model_files_from_server(...)
to accept and set that weight_format into the ModelFilesRequest, and finally
update the server-side handler that enumerates/caches files to apply the same
weight_format filter when streaming; ensure both client and server use the same
enum/type for weight_format so filtering behavior matches the download request.

🧹 Nitpick comments (2)

docs/ARCHITECTURE.md (1)

420-430: Keep the module inventory in sync with the new type.

The provider trait signature now documents WeightFormat, but the models module row still omits it.
Proposed documentation update
-| `models` | `Status`, `ModelProvider`, `ModelStatus`, `ModelStatusResponse` |
+| `models` | `Status`, `ModelProvider`, `WeightFormat`, `ModelStatus`, `ModelStatusResponse` |
As per coding guidelines, docs/ARCHITECTURE.md should be updated when making architecture or gRPC-service related changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ARCHITECTURE.md` around lines 420 - 430, The models module inventory is
out of sync: the ModelProviderTrait signature now uses the WeightFormat type but
the `models` row in docs/ARCHITECTURE.md omits it; update that row to list
`WeightFormat` alongside `Status`, `ModelProvider`, `ModelStatus`, and
`ModelStatusResponse` so the documented exports match the actual API (reference
symbols: ModelProviderTrait, WeightFormat, models).

modelexpress_common/src/lib.rs (1)

269-283: Assert the wire mapping, not only round-tripping.

A symmetric mapping mistake would still pass this test while putting the wrong enum value on the wire. Add direct assertions for each Rust variant against the expected gRPC variant.

Proposed test hardening

     #[test]
     fn test_weight_format_conversion_both_ways() {
-        let formats = vec![
-            models::WeightFormat::Auto,
-            models::WeightFormat::Safetensors,
-            models::WeightFormat::Pytorch,
-            models::WeightFormat::All,
+        let formats = [
+            (
+                models::WeightFormat::Auto,
+                grpc::model::WeightFormat::Auto,
+            ),
+            (
+                models::WeightFormat::Safetensors,
+                grpc::model::WeightFormat::Safetensors,
+            ),
+            (
+                models::WeightFormat::Pytorch,
+                grpc::model::WeightFormat::Pytorch,
+            ),
+            (
+                models::WeightFormat::All,
+                grpc::model::WeightFormat::All,
+            ),
         ];
 
-        for format in formats {
-            let grpc_format: grpc::model::WeightFormat = format.into();
+        for (format, expected_grpc_format) in formats {
+            let grpc_format: grpc::model::WeightFormat = format.into();
+            assert_eq!(grpc_format, expected_grpc_format);
+
             let back_to_model: models::WeightFormat = grpc_format.into();
             assert_eq!(format, back_to_model);
         }
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelexpress_common/src/lib.rs` around lines 269 - 283, The test
test_weight_format_conversion_both_ways currently only round-trips
models::WeightFormat <-> grpc::model::WeightFormat, which can hide symmetric
mapping errors; update the test to assert the exact expected wire enum for each
Rust variant (models::WeightFormat::Auto, ::Safetensors, ::Pytorch, ::All) by
comparing each direct conversion to the exact grpc::model::WeightFormat variant
you expect (e.g., assert that models::WeightFormat::Auto.into() equals the
specific grpc::model::WeightFormat::Auto value), and likewise verify any
non-roundtrip expected mappings explicitly rather than relying solely on
back-and-forth equality.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelexpress_common/src/providers/huggingface.rs`:
- Around line 195-213: The filter_files_by_weight_format behavior is
inconsistent: Auto currently excludes .gguf but Safetensors/Pytorch keep it
because is_weight_file() doesn't recognize .gguf and is_non_preferred_weight()
early-returns false; update the logic so .gguf is treated as a non-preferred
weight in Safetensors and Pytorch modes too (only WeightFormat::All should keep
GGUF). Concretely, either extend is_weight_file() to consider ".gguf" as a
weight file or change is_non_preferred_weight(model_filename:,
prefer_safetensors:bool) to explicitly return true for ".gguf" when
prefer_safetensors or prefer_pytorch is set, and ensure
filter_files_by_weight_format (the WeightFormat::Safetensors and ::Pytorch arms)
rely on that updated check.
- Around line 229-265: The dedup logic in auto_filter_weight_files currently
only ignores "model.safetensors" and "pytorch_model.bin" when their
corresponding index files exist; update auto_filter_weight_files to compute
consolidated filenames dynamically by scanning filenames for any entries that
end with ".index.json", stripping the ".index.json" suffix to produce the
consolidated name(s), and then filter out those consolidated filenames (e.g.,
"consolidated.safetensors") when a matching index file exists; use the existing
filenames iterator and Self::is_weight_file to locate index files and compare
against filenames rather than hardcoding "model.safetensors" or
"pytorch_model.bin".

---

Outside diff comments:
In `@modelexpress_client/src/lib.rs`:
- Around line 304-312: The fallback path uses
CacheConfig::discover()/download_model_directly() which ignores the client's
configured cache; update the fallback branches in the code paths that call
Self::download_model_directly (seen around the Err(e) block and the similar
block at lines 417-419) to use the client's self.cache_config (or resolve the
provider-specific dir via self.get_cache_dir(provider, &self.cache_config)) and
pass that cache/config into download_model_directly (or change
download_model_directly signature to accept a cache_config or explicit
cache_dir) so the direct download writes to the same cache the client was
configured to use.
- Around line 298-299: The stream path is missing the selected weight_format so
the server streams all cached files; extend the ModelFilesRequest protobuf to
include a weight_format field, update the client call sites (e.g., where
self.stream_model_files_from_server(model_name, provider) is invoked at the
noted locations) and change the signature of stream_model_files_from_server(...)
to accept and set that weight_format into the ModelFilesRequest, and finally
update the server-side handler that enumerates/caches files to apply the same
weight_format filter when streaming; ensure both client and server use the same
enum/type for weight_format so filtering behavior matches the download request.

---

Nitpick comments:
In `@docs/ARCHITECTURE.md`:
- Around line 420-430: The models module inventory is out of sync: the
ModelProviderTrait signature now uses the WeightFormat type but the `models` row
in docs/ARCHITECTURE.md omits it; update that row to list `WeightFormat`
alongside `Status`, `ModelProvider`, `ModelStatus`, and `ModelStatusResponse` so
the documented exports match the actual API (reference symbols:
ModelProviderTrait, WeightFormat, models).

In `@modelexpress_common/src/lib.rs`:
- Around line 269-283: The test test_weight_format_conversion_both_ways
currently only round-trips models::WeightFormat <-> grpc::model::WeightFormat,
which can hide symmetric mapping errors; update the test to assert the exact
expected wire enum for each Rust variant (models::WeightFormat::Auto,
::Safetensors, ::Pytorch, ::All) by comparing each direct conversion to the
exact grpc::model::WeightFormat variant you expect (e.g., assert that
models::WeightFormat::Auto.into() equals the specific
grpc::model::WeightFormat::Auto value), and likewise verify any non-roundtrip
expected mappings explicitly rather than relying solely on back-and-forth
equality.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d01c349-fbfe-441f-9191-9faebc3ce321

📥 Commits

Reviewing files that changed from the base of the PR and between a8a2fd4 and 0a5fd80.

📒 Files selected for processing (17)

docs/ARCHITECTURE.md
docs/CLI.md
docs/DEPLOYMENT.md
modelexpress_client/src/bin/fallback_test.rs
modelexpress_client/src/bin/modules/handlers.rs
modelexpress_client/src/bin/test_client.rs
modelexpress_client/src/lib.rs
modelexpress_common/proto/model.proto
modelexpress_common/src/client_config.rs
modelexpress_common/src/download.rs
modelexpress_common/src/lib.rs
modelexpress_common/src/models.rs
modelexpress_common/src/providers.rs
modelexpress_common/src/providers/huggingface.rs
modelexpress_common/src/providers/ngc.rs
modelexpress_server/src/services.rs
workspace-tests/tests/integration_tests.rs

coderabbitai · 2026-04-22T00:57:48Z

+    fn filter_files_by_weight_format(
+        filenames: &[String],
+        weight_format: WeightFormat,
+    ) -> Vec<String> {
+        match weight_format {
+            WeightFormat::All => filenames.to_vec(),
+            WeightFormat::Safetensors => filenames
+                .iter()
+                .filter(|f| !Self::is_non_preferred_weight(f, true))
+                .cloned()
+                .collect(),
+            WeightFormat::Pytorch => filenames
+                .iter()
+                .filter(|f| !Self::is_non_preferred_weight(f, false))
+                .cloned()
+                .collect(),
+            WeightFormat::Auto => Self::auto_filter_weight_files(filenames),
+        }
+    }


⚠️ Potential issue | 🟡 Minor

GGUF is excluded in Auto but kept in Safetensors/Pytorch modes.

is_non_preferred_weight() early-returns false for any file where is_weight_file() is false, and is_weight_file() does not include .gguf. So model.gguf passes through unfiltered in Safetensors/Pytorch modes while being excluded in Auto. A user running --weight-format safetensors on a repo like TheBloke's mixed safetensors+GGUF uploads would still get GGUF shards — likely not their intent.

Since the PR goal is "exclude GGUF files by default", consider making GGUF exclusion consistent across Auto/Safetensors/Pytorch (only All should pull GGUF).

♻️ One possible shape for the fix

fn is_non_preferred_weight(filename: &str, prefer_safetensors: bool) -> bool { - if !Self::is_weight_file(filename) { + // Treat .gguf as a weight-format file for filtering purposes so that + // explicit Safetensors/Pytorch modes also exclude it. + let is_weightish = Self::is_weight_file(filename) || filename.ends_with(".gguf"); + if !is_weightish { return false; } if prefer_safetensors { !filename.ends_with(".safetensors") } else { !filename.ends_with(".bin") } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelexpress_common/src/providers/huggingface.rs` around lines 195 - 213, The filter_files_by_weight_format behavior is inconsistent: Auto currently excludes .gguf but Safetensors/Pytorch keep it because is_weight_file() doesn't recognize .gguf and is_non_preferred_weight() early-returns false; update the logic so .gguf is treated as a non-preferred weight in Safetensors and Pytorch modes too (only WeightFormat::All should keep GGUF). Concretely, either extend is_weight_file() to consider ".gguf" as a weight file or change is_non_preferred_weight(model_filename:, prefer_safetensors:bool) to explicitly return true for ".gguf" when prefer_safetensors or prefer_pytorch is set, and ensure filter_files_by_weight_format (the WeightFormat::Safetensors and ::Pytorch arms) rely on that updated check.

coderabbitai · 2026-04-22T00:57:48Z

+    fn auto_filter_weight_files(filenames: &[String]) -> Vec<String> {
+        let has_safetensors = filenames.iter().any(|f| f.ends_with(".safetensors"));
+        let has_safetensors_index = filenames
+            .iter()
+            .any(|f| f.ends_with(".safetensors.index.json"));
+        let has_pytorch_index = filenames
+            .iter()
+            .any(|f| f == "pytorch_model.bin.index.json");
+
+        filenames
+            .iter()
+            .filter(|f| {
+                // Drop weight files in non-preferred formats
+                if Self::is_weight_file(f) || f.ends_with(".gguf") {
+                    if has_safetensors {
+                        if !f.ends_with(".safetensors") {
+                            return false;
+                        }
+                    } else if f.ends_with(".h5") || f.ends_with(".msgpack") || f.ends_with(".gguf")
+                    {
+                        return false;
+                    }
+                }
+
+                // Deduplicate sharded vs. consolidated within chosen format
+                if has_safetensors_index && *f == "model.safetensors" {
+                    return false;
+                }
+                if has_pytorch_index && *f == "pytorch_model.bin" {
+                    return false;
+                }
+
+                true
+            })
+            .cloned()
+            .collect()
+    }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the structure and find the file fd -t f "huggingface.rs" --type f

Repository: ai-dynamo/modelexpress

Length of output: 114

🏁 Script executed:

# Read the specific file and the function in context wc -l modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 119

🏁 Script executed:

# Examine the function and surrounding context sed -n '200,270p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 2695

🏁 Script executed:

# Search for tests related to auto_filter_weight_files rg -i "auto_filter_weight_files|weight.*file.*test|filter.*weight" --type rs -A 5

Repository: ai-dynamo/modelexpress

Length of output: 92

🏁 Script executed:

# Search for issue `#173` or related discussions rg -i "issue.*173|#173|dedup.*weight" --type rs --type md

Repository: ai-dynamo/modelexpress

Length of output: 92

🏁 Script executed:

# Look for test fixtures or test data that might show model layouts fd -t f "test" -type d | head -20

Repository: ai-dynamo/modelexpress

Length of output: 236

🏁 Script executed:

# Search for references to Diffusers pipelines or consolidated.safetensors in the codebase rg -i "diffuser|consolidated.safetensors|model-.*safetensors" --type rs --type md --type json

Repository: ai-dynamo/modelexpress

Length of output: 92

🏁 Script executed:

# Find is_weight_file definition sed -n '1,228p' modelexpress_common/src/providers/huggingface.rs | tail -100

Repository: ai-dynamo/modelexpress

Length of output: 3650

🏁 Script executed:

# Search for tests without rg type flag find . -name "*.rs" -type f | xargs grep -l "auto_filter_weight_files"

Repository: ai-dynamo/modelexpress

Length of output: 116

🏁 Script executed:

# Search for test directories and files find . -path "*/test*" -name "*.rs" -type f

Repository: ai-dynamo/modelexpress

Length of output: 244

🏁 Script executed:

# Look for issue `#173` reference grep -r "173" . --include="*.rs" --include="*.md"

Repository: ai-dynamo/modelexpress

Length of output: 125

🏁 Script executed:

# Search for Diffusers or consolidated patterns in the repo grep -r "diffuser\|consolidated" . --include="*.rs" --include="*.md" --include="*.json"

Repository: ai-dynamo/modelexpress

Length of output: 779

🏁 Script executed:

# Find the is_weight_file definition grep -n "fn is_weight_file" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 48

🏁 Script executed:

# Read the is_weight_file function sed -n '260,300p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 1449

🏁 Script executed:

# Search for all test functions related to filter_weight grep -n "fn test.*filter\|fn test.*weight" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 533

🏁 Script executed:

# Look at test cases in detail grep -n "test_filter_weight_format_auto" modelexpress_common/src/providers/huggingface.rs -A 50

Repository: ai-dynamo/modelexpress

Length of output: 5029

🏁 Script executed:

# Check if there are any tests with non-standard names or consolidated grep -n "consolidated\|diffusion\|unet" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 361

🏁 Script executed:

# Search for the complete is_weight_file definition more carefully grep -n "fn is_weight_file" modelexpress_common/src/providers/huggingface.rs -A 20

Repository: ai-dynamo/modelexpress

Length of output: 48

🏁 Script executed:

# Search for is_weight_file more thoroughly grep -n "is_weight_file" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 278

🏁 Script executed:

# Read around those lines to understand the function sed -n '155,175p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 604

🏁 Script executed:

# Let's look at the broader context - find where is_weight_file is defined awk '/fn is_weight_file/,/^ \}/' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 48

🏁 Script executed:

# Get the line numbers where is_weight_file appears grep -n "is_weight_file" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 278

🏁 Script executed:

# Read that section sed -n '145,185p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 1378

🏁 Script executed:

# Search the entire codebase for is_weight_file definition grep -r "fn is_weight_file" . --include="*.rs"

Repository: ai-dynamo/modelexpress

Length of output: 150

🏁 Script executed:

# Check the imports and see if it's defined elsewhere or in a trait grep -n "is_weight_file\|trait.*Weight\|impl.*Weight" modelexpress_common/src/providers/huggingface.rs | head -20

Repository: ai-dynamo/modelexpress

Length of output: 278

🏁 Script executed:

# Let's read more of the file to understand the structure - look for impl blocks grep -n "^impl\|^fn\|^ fn" modelexpress_common/src/providers/huggingface.rs | head -40

Repository: ai-dynamo/modelexpress

Length of output: 1684

🏁 Script executed:

# Try to find any reference to the function or trait definition with more context awk '/is_weight_file/ { for(i=NR-5;i<=NR+5;i++) if (i>0) print i": " lines[i%11]; } { lines[NR%11]=$0 }' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 1510

🏁 Script executed:

# Let's search in models.rs since we saw references to weight formats there grep -n "is_weight_file" modelexpress_common/src/models.rs

Repository: ai-dynamo/modelexpress

Length of output: 48

🏁 Script executed:

# Read the full models.rs file structure around weight-related code grep -n "fn.*weight\|is_weight" modelexpress_common/src/models.rs

Repository: ai-dynamo/modelexpress

Length of output: 301

🏁 Script executed:

# Found it! Read the providers.rs file to find is_weight_file definition grep -n "fn is_weight_file" modelexpress_common/src/providers.rs -A 20

Repository: ai-dynamo/modelexpress

Length of output: 588

🏁 Script executed:

# Also check what trait this belongs to grep -n "trait.*Provider\|fn is_weight_file" modelexpress_common/src/providers.rs -B 5

Repository: ai-dynamo/modelexpress

Length of output: 469

🏁 Script executed:

# Let's see what is in providers.rs more broadly wc -l modelexpress_common/src/providers.rs

Repository: ai-dynamo/modelexpress

Length of output: 106

🏁 Script executed:

# Read the relevant section sed -n '1,100p' modelexpress_common/src/providers.rs

Repository: ai-dynamo/modelexpress

Length of output: 3455

🏁 Script executed:

# Now let's look for any test data or references to actual models find . -name "*.json" -o -name "*.md" | xargs grep -l "safetensors\|pytorch_model" | head -10

Repository: ai-dynamo/modelexpress

Length of output: 123

🏁 Script executed:

# Check if there are integration tests that download real models grep -r "download_model\|diffuser\|consolidated" . --include="*.rs" | grep -v "Binary"

Repository: ai-dynamo/modelexpress

Length of output: 5023

🏁 Script executed:

# Let's examine the actual test to understand what the review is asking sed -n '1016,1030p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 613

🏁 Script executed:

# Search the documentation or comments for any references to model layouts grep -r "consolidated\|sharded\|index.json" . --include="*.md" --include="*.rs" -B 2 -A 2

Repository: ai-dynamo/modelexpress

Length of output: 7364

🏁 Script executed:

# Check the git history or issues for any mention of non-standard model names grep -r "consolidated.safetensors\|diffusion_pytorch" . --include="*.rs" --include="*.md"

Repository: ai-dynamo/modelexpress

Length of output: 48

🏁 Script executed:

# Let's check if there are any recent HF model examples that might use non-standard names # First, let's see what documentation exists find . -name "*.md" -type f | head -20

Repository: ai-dynamo/modelexpress

Length of output: 630

🏁 Script executed:

# Let's check the integration tests to see what models are actually tested grep -n "model_name\|download_model" workspace-tests/tests/integration_tests.rs -A 2 -B 2 | head -50

Repository: ai-dynamo/modelexpress

Length of output: 1259

🏁 Script executed:

# Check if there's any test data or fixture definitions find . -name "*.json" -o -name "fixtures" -type d

Repository: ai-dynamo/modelexpress

Length of output: 173

🏁 Script executed:

# Let's see what models are tested in integration tests sed -n '1,100p' workspace-tests/tests/integration_tests.rs

Repository: ai-dynamo/modelexpress

Length of output: 3623

🏁 Script executed:

# Search for any environment variables or config that might specify test models grep -r "MODEL\|HF_" . --include="*.rs" --include="*.md" | grep -v "Binary" | head -20

Repository: ai-dynamo/modelexpress

Length of output: 2115

🏁 Script executed:

# Check the CLAUDE.md or any documentation about the codebase cat ./CLAUDE.md

Repository: ai-dynamo/modelexpress

Length of output: 5517

🏁 Script executed:

# Let's check if there are any real HF models that use non-standard consolidated names # by looking at the actual HF hub documentation or searching for such patterns rg -i "consolidated\.safetensors|diffusion_pytorch_model" . --include="*.rs" --include="*.md"

Repository: ai-dynamo/modelexpress

Length of output: 507

🏁 Script executed:

# Let's verify the subdirectory filtering logic more carefully sed -n '174,195p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 1188

🏁 Script executed:

# Check how files are processed before auto_filter_weight_files is called sed -n '300,380p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 3104

🏁 Script executed:

# Let's look at how the filtering works step by step # First, understand what gets passed to auto_filter_weight_files sed -n '320,345p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 981

🏁 Script executed:

# Check test data more thoroughly to understand what models are used grep -n "bert-tiny" workspace-tests/tests/integration_tests.rs -B 5 -A 10

Repository: ai-dynamo/modelexpress

Length of output: 746

🏁 Script executed:

# Now let me check if there are subdirectory files being filtered out # and understand the full filtering chain better awk '/filter_files_by_weight_format/,/^ \}/' modelexpress_common/src/providers/huggingface.rs | head -60

Repository: ai-dynamo/modelexpress

Length of output: 2278

🏁 Script executed:

# Let me understand the exact logic flow more carefully # Check if there are any tests that would catch the non-standard name issue sed -n '988,1048p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 2739

🏁 Script executed:

# Let's create a test case that would expose the issue the reviewer mentions # First, understand what the current behavior would be with consolidated.safetensors cat > /tmp/test_consolidated.txt << 'EOF' Scenario: Model with consolidated.safetensors and sharded files Files in model: - config.json - consolidated.safetensors (consolidated file) - consolidated-00001-of-00002.safetensors (shard 1) - consolidated-00002-of-00002.safetensors (shard 2) - consolidated.safetensors.index.json (index file) Current behavior: 1. has_safetensors = true (consolidated.safetensors exists) 2. has_safetensors_index = true (.safetensors.index.json exists) 3. Filter logic checks: - if has_safetensors_index && *f == "model.safetensors" -> skip - This does NOT match consolidated.safetensors 4. Result: ALL files kept (consolidated + shards + index) - Should only keep: shards + index Expected behavior: - Detect that consolidated.safetensors.index.json exists - Strip .index.json to get "consolidated.safetensors" - Remove consolidated.safetensors from result - Keep shards and index EOF cat /tmp/test_consolidated.txt

Repository: ai-dynamo/modelexpress

Length of output: 946

🏁 Script executed:

# Check if there are real models on HF that use this pattern # by looking at what the reviewer claims grep -n "consolidated" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 361

🏁 Script executed:

# Let's also verify if the review's concern about the hardcoded names is valid # by checking if there are any comments or documentation about supported patterns sed -n '174,267p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 3805

Hardcoded consolidated filenames limit dedup to two naming patterns.

The dedup logic only checks for model.safetensors and pytorch_model.bin when index files exist. Models using other consolidated filenames (e.g., consolidated.safetensors + consolidated.safetensors.index.json) would download both the consolidated file and all shards, defeating the dedup goal.

Instead of hardcoding the two canonical names, derive the consolidated filename from each index file by stripping .index.json suffix. This handles arbitrary consolidated/sharded naming conventions while remaining backward compatible.

Note: The test matrix uses prajjwal1/bert-tiny, which is small and unlikely to use sharded weights, so this doesn't surface in current tests.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelexpress_common/src/providers/huggingface.rs` around lines 229 - 265, The dedup logic in auto_filter_weight_files currently only ignores "model.safetensors" and "pytorch_model.bin" when their corresponding index files exist; update auto_filter_weight_files to compute consolidated filenames dynamically by scanning filenames for any entries that end with ".index.json", stripping the ".index.json" suffix to produce the consolidated name(s), and then filter out those consolidated filenames (e.g., "consolidated.safetensors") when a matching index file exists; use the existing filenames iterator and Self::is_weight_file to locate index files and compare against filenames rather than hardcoding "model.safetensors" or "pytorch_model.bin".

Add --weight-format CLI flag (auto/safetensors/pytorch/all) to control which weight files are downloaded. In auto mode (the default), safetensors files are preferred over pytorch/h5/msgpack, sharded vs consolidated duplicates are deduplicated, and GGUF files are excluded. This prevents downloading 2-3x the data for repos that publish weights in multiple formats. Also adds LICENSE, LICENSE.md, LICENSE.txt, and NOTICE to the default ignored files list since these are never used by model runtimes. Closes ai-dynamo#173 Signed-off-by: Jonathan Tong <jonathan.tong@live.com> Signed-off-by: Jont828 <jt572@cornell.edu>

copy-pr-bot · 2026-06-11T00:27:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Jont828 had a problem deploying to GITLAB April 22, 2026 00:50 — with GitHub Actions Failure

pull-request-size Bot added the size/XL label Apr 22, 2026

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Jont828 force-pushed the Jont828/optimize-weight-downloads branch from 0a5fd80 to 17af317 Compare June 11, 2026 00:27

github-actions Bot added the feat label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: optimize model downloads by skipping duplicate weight formats#248

feat: optimize model downloads by skipping duplicate weight formats#248
Jont828 wants to merge 1 commit into
ai-dynamo:mainfrom
Jont828:Jont828/optimize-weight-downloads

Jont828 commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Jont828 commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Jont828 commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

Jont828 commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jont828 commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading