Skip to content

feat: optimize model downloads by skipping duplicate weight formats#248

Open
Jont828 wants to merge 1 commit into
ai-dynamo:mainfrom
Jont828:Jont828/optimize-weight-downloads
Open

feat: optimize model downloads by skipping duplicate weight formats#248
Jont828 wants to merge 1 commit into
ai-dynamo:mainfrom
Jont828:Jont828/optimize-weight-downloads

Conversation

@Jont828

@Jont828 Jont828 commented Apr 22, 2026

Copy link
Copy Markdown

Summary

  • Add --weight-format CLI flag (auto/safetensors/pytorch/all) and MODEL_EXPRESS_WEIGHT_FORMAT env var to control which weight file formats are downloaded
  • In auto mode (default), prefer safetensors over pytorch/h5/msgpack, deduplicate sharded vs consolidated weights using index file presence, and exclude GGUF files
  • Add LICENSE, LICENSE.md, LICENSE.txt, and NOTICE to the default ignored files list since they are never used by model runtimes

Closes #173

Test plan

  • All 292 existing tests pass
  • 13 new unit tests for WeightFormat enum and filter_files_by_weight_format() logic
  • cargo clippy passes with no warnings
  • Manual test: download a dual-format model (e.g. openai/gpt-oss-20b) with --weight-format auto and verify only safetensors files are fetched
  • Manual test: --weight-format pytorch downloads only .bin files
  • Manual test: --weight-format all preserves current behavior
  • Manual test: MODEL_EXPRESS_WEIGHT_FORMAT=safetensors env var works

Summary by CodeRabbit

Release Notes

  • New Features

    • Added --weight-format global option to specify model weight file format preferences: auto, safetensors, pytorch, or all (defaults to auto)
    • Added MODEL_EXPRESS_WEIGHT_FORMAT environment variable for weight format configuration
  • Documentation

    • Updated CLI documentation with new weight-format option
    • Updated deployment documentation with environment variable details

@Jont828

Jont828 commented Apr 22, 2026

Copy link
Copy Markdown
Author

The lint PR title job is failing with Resource not accessible by integration — looks like it's a permissions issue with the add_label: true setting in the workflow. The GITHUB_TOKEN for fork PRs only gets read access, so it can't add labels to the upstream repo. The PR title itself follows conventional commits format fine. Happy to adjust anything if needed though!

@coderabbitai

coderabbitai Bot commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This pull request introduces weight format optimization for model downloads. A new WeightFormat enum (with variants Auto, Safetensors, Pytorch, All) is added with CLI/environment variable support. The weight_format parameter is propagated through all model download APIs from client to server. The HuggingFace provider now implements smart filtering to prefer safetensors, skip redundant format variants, and deduplicate sharded vs. consolidated weights based on index file presence.

Changes

Cohort / File(s) Summary
Type Definitions & Enum
modelexpress_common/src/models.rs
New WeightFormat enum with Auto/Safetensors/Pytorch/All variants; includes FromStr, Display, ValueEnum trait implementations for CLI parsing and serialization.
Protocol & Conversion
modelexpress_common/proto/model.proto, modelexpress_common/src/lib.rs
Added WeightFormat enum and weight_format field to ModelDownloadRequest message; bidirectional From conversions between models::WeightFormat and grpc::model::WeightFormat with test coverage.
Configuration
modelexpress_common/src/client_config.rs, docs/CLI.md, docs/DEPLOYMENT.md
Added weight_format field to ClientArgs and ClientConfig; introduced --weight-format CLI flag and MODEL_EXPRESS_WEIGHT_FORMAT environment variable with default auto and allowed values auto|safetensors|pytorch|all.
Provider Interface & Base Implementation
modelexpress_common/src/providers.rs
Updated ModelProviderTrait::download_model signature to accept weight_format: WeightFormat parameter; expanded default is_ignored to exclude additional metadata files (LICENSE, LICENSE.md, LICENSE.txt, NOTICE).
HuggingFace Provider
modelexpress_common/src/providers/huggingface.rs
Implemented weight-format filtering logic with functions filter_files_by_weight_format, is_non_preferred_weight, auto_filter_weight_files to intelligently prefer safetensors, handle sharded vs. consolidated deduplication via index files, and fallback to pytorch; updated download_model to apply filtering and added comprehensive unit tests.
NGC Provider
modelexpress_common/src/providers/ngc.rs
Updated download_model signature to include _weight_format parameter (unused/underscore-prefixed); updated test calls accordingly.
Download Module
modelexpress_common/src/download.rs
Updated download_model signature to accept weight_format parameter and forward to provider implementation; updated all tests to pass WeightFormat::default().
Client Library APIs
modelexpress_client/src/lib.rs
Re-exported WeightFormat; added weight_format parameter to all public model download/request methods (preload_model_to_cache, request_model, request_model_server_only, request_model_with_provider, request_model_with_provider_and_fallback, request_model_with_smart_fallback, download_model_directly); threaded weight_format into gRPC ModelDownloadRequest and internal download calls; updated unit/integration tests.
Server Handler
modelexpress_server/src/services.rs
Parsed weight_format from ModelDownloadRequest; threaded into ModelDownloadTracker::ensure_model_downloaded signature and forwarded to download::download_model; updated unit tests to populate weight_format field.
Client Binaries & Tests
modelexpress_client/src/bin/test_client.rs, modelexpress_client/src/bin/fallback_test.rs, modelexpress_client/src/bin/modules/handlers.rs, workspace-tests/tests/integration_tests.rs
Updated imports and call sites to pass WeightFormat::default() through model download entry points; thread weight_format from config into download flows across SmartFallback, ServerOnly, and Direct strategies.
Architecture Documentation
docs/ARCHITECTURE.md
Updated ModelProviderTrait::download_model signature documentation to reflect new weight_format: WeightFormat parameter.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hop skip and a hop through the weight format,
Safetensors first, then pytorch as backup,
No more downloading everything at once,
Smart filtering keeps the disk space intact,
A rabbit's delight: efficient downloads!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly summarizes the main change: adding weight-format optimization to skip duplicate weight formats during model downloads.
Linked Issues check ✅ Passed All coding objectives from issue #173 are met: weight-format enum with smart defaults, CLI flag and env var support, auto/safetensors/pytorch/all modes, index-file deduplication logic, and default ignored files expanded.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing weight-format optimization per issue #173; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelexpress_client/src/lib.rs (2)

304-312: ⚠️ Potential issue | 🟠 Major

Use the client’s configured cache for direct fallback.

Both instance fallback paths switch to direct download with CacheConfig::discover()/download_model_directly(), which ignores self.cache_config and the provider-specific cache resolution in get_cache_dir(). A client configured with a custom cache can report fallback success while downloading into a different cache root.

Proposed fix
-                Self::download_model_directly(model_name, provider, ignore_weights, weight_format)
-                    .await
+                download::download_model(
+                    model_name,
+                    provider,
+                    Some(self.get_cache_dir(provider)),
+                    ignore_weights,
+                    weight_format,
+                )
+                .await
+                .map(|_| ())
+                .map_err(|e| {
+                    modelexpress_common::Error::Server(format!("Direct download failed: {e}"))
+                        .into()
+                })
-                    let cache_dir = CacheConfig::discover().ok().map(|config| config.local_path);
+                    let cache_dir = Some(self.get_cache_dir(provider));
                     match download::download_model(&model_name, provider, cache_dir, ignore_weights, weight_format).await {

Also applies to: 417-419

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_client/src/lib.rs` around lines 304 - 312, The fallback path
uses CacheConfig::discover()/download_model_directly() which ignores the
client's configured cache; update the fallback branches in the code paths that
call Self::download_model_directly (seen around the Err(e) block and the similar
block at lines 417-419) to use the client's self.cache_config (or resolve the
provider-specific dir via self.get_cache_dir(provider, &self.cache_config)) and
pass that cache/config into download_model_directly (or change
download_model_directly signature to accept a cache_config or explicit
cache_dir) so the direct download writes to the same cache the client was
configured to use.

298-299: ⚠️ Potential issue | 🟠 Major

Propagate weight_format through the file streaming request.

When shared_storage is disabled, the filtered server download is followed by a separate stream_model_files request that does not include weight_format. The server streams all cached files without applying the selected format filter, potentially delivering duplicate formats or GGUF files that the download request rejected.

Extend the ModelFilesRequest protobuf message to carry weight_format, update stream_model_files_from_server() to pass it, and ensure the server applies the same filtering during file enumeration.

Implementation changes needed
-                    self.stream_model_files_from_server(model_name, provider)
+                    self.stream_model_files_from_server(model_name, provider, weight_format)
                         .await?;
     pub async fn stream_model_files_from_server(
         &mut self,
         model_name: &str,
         provider: ModelProvider,
+        weight_format: WeightFormat,
     ) -> CommonResult<()> {
         let grpc_request = tonic::Request::new(ModelFilesRequest {
             model_name: model_name.to_string(),
             provider: modelexpress_common::grpc::model::ModelProvider::from(provider) as i32,
             chunk_size,
+            weight_format: modelexpress_common::grpc::model::WeightFormat::from(weight_format) as i32,
         });

Also applies to: 403-404, 466-470

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_client/src/lib.rs` around lines 298 - 299, The stream path is
missing the selected weight_format so the server streams all cached files;
extend the ModelFilesRequest protobuf to include a weight_format field, update
the client call sites (e.g., where
self.stream_model_files_from_server(model_name, provider) is invoked at the
noted locations) and change the signature of stream_model_files_from_server(...)
to accept and set that weight_format into the ModelFilesRequest, and finally
update the server-side handler that enumerates/caches files to apply the same
weight_format filter when streaming; ensure both client and server use the same
enum/type for weight_format so filtering behavior matches the download request.
🧹 Nitpick comments (2)
docs/ARCHITECTURE.md (1)

420-430: Keep the module inventory in sync with the new type.

The provider trait signature now documents WeightFormat, but the models module row still omits it.

Proposed documentation update
-| `models` | `Status`, `ModelProvider`, `ModelStatus`, `ModelStatusResponse` |
+| `models` | `Status`, `ModelProvider`, `WeightFormat`, `ModelStatus`, `ModelStatusResponse` |

As per coding guidelines, docs/ARCHITECTURE.md should be updated when making architecture or gRPC-service related changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ARCHITECTURE.md` around lines 420 - 430, The models module inventory is
out of sync: the ModelProviderTrait signature now uses the WeightFormat type but
the `models` row in docs/ARCHITECTURE.md omits it; update that row to list
`WeightFormat` alongside `Status`, `ModelProvider`, `ModelStatus`, and
`ModelStatusResponse` so the documented exports match the actual API (reference
symbols: ModelProviderTrait, WeightFormat, models).
modelexpress_common/src/lib.rs (1)

269-283: Assert the wire mapping, not only round-tripping.

A symmetric mapping mistake would still pass this test while putting the wrong enum value on the wire. Add direct assertions for each Rust variant against the expected gRPC variant.

Proposed test hardening
     #[test]
     fn test_weight_format_conversion_both_ways() {
-        let formats = vec![
-            models::WeightFormat::Auto,
-            models::WeightFormat::Safetensors,
-            models::WeightFormat::Pytorch,
-            models::WeightFormat::All,
+        let formats = [
+            (
+                models::WeightFormat::Auto,
+                grpc::model::WeightFormat::Auto,
+            ),
+            (
+                models::WeightFormat::Safetensors,
+                grpc::model::WeightFormat::Safetensors,
+            ),
+            (
+                models::WeightFormat::Pytorch,
+                grpc::model::WeightFormat::Pytorch,
+            ),
+            (
+                models::WeightFormat::All,
+                grpc::model::WeightFormat::All,
+            ),
         ];
 
-        for format in formats {
-            let grpc_format: grpc::model::WeightFormat = format.into();
+        for (format, expected_grpc_format) in formats {
+            let grpc_format: grpc::model::WeightFormat = format.into();
+            assert_eq!(grpc_format, expected_grpc_format);
+
             let back_to_model: models::WeightFormat = grpc_format.into();
             assert_eq!(format, back_to_model);
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_common/src/lib.rs` around lines 269 - 283, The test
test_weight_format_conversion_both_ways currently only round-trips
models::WeightFormat <-> grpc::model::WeightFormat, which can hide symmetric
mapping errors; update the test to assert the exact expected wire enum for each
Rust variant (models::WeightFormat::Auto, ::Safetensors, ::Pytorch, ::All) by
comparing each direct conversion to the exact grpc::model::WeightFormat variant
you expect (e.g., assert that models::WeightFormat::Auto.into() equals the
specific grpc::model::WeightFormat::Auto value), and likewise verify any
non-roundtrip expected mappings explicitly rather than relying solely on
back-and-forth equality.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelexpress_common/src/providers/huggingface.rs`:
- Around line 195-213: The filter_files_by_weight_format behavior is
inconsistent: Auto currently excludes .gguf but Safetensors/Pytorch keep it
because is_weight_file() doesn't recognize .gguf and is_non_preferred_weight()
early-returns false; update the logic so .gguf is treated as a non-preferred
weight in Safetensors and Pytorch modes too (only WeightFormat::All should keep
GGUF). Concretely, either extend is_weight_file() to consider ".gguf" as a
weight file or change is_non_preferred_weight(model_filename:,
prefer_safetensors:bool) to explicitly return true for ".gguf" when
prefer_safetensors or prefer_pytorch is set, and ensure
filter_files_by_weight_format (the WeightFormat::Safetensors and ::Pytorch arms)
rely on that updated check.
- Around line 229-265: The dedup logic in auto_filter_weight_files currently
only ignores "model.safetensors" and "pytorch_model.bin" when their
corresponding index files exist; update auto_filter_weight_files to compute
consolidated filenames dynamically by scanning filenames for any entries that
end with ".index.json", stripping the ".index.json" suffix to produce the
consolidated name(s), and then filter out those consolidated filenames (e.g.,
"consolidated.safetensors") when a matching index file exists; use the existing
filenames iterator and Self::is_weight_file to locate index files and compare
against filenames rather than hardcoding "model.safetensors" or
"pytorch_model.bin".

---

Outside diff comments:
In `@modelexpress_client/src/lib.rs`:
- Around line 304-312: The fallback path uses
CacheConfig::discover()/download_model_directly() which ignores the client's
configured cache; update the fallback branches in the code paths that call
Self::download_model_directly (seen around the Err(e) block and the similar
block at lines 417-419) to use the client's self.cache_config (or resolve the
provider-specific dir via self.get_cache_dir(provider, &self.cache_config)) and
pass that cache/config into download_model_directly (or change
download_model_directly signature to accept a cache_config or explicit
cache_dir) so the direct download writes to the same cache the client was
configured to use.
- Around line 298-299: The stream path is missing the selected weight_format so
the server streams all cached files; extend the ModelFilesRequest protobuf to
include a weight_format field, update the client call sites (e.g., where
self.stream_model_files_from_server(model_name, provider) is invoked at the
noted locations) and change the signature of stream_model_files_from_server(...)
to accept and set that weight_format into the ModelFilesRequest, and finally
update the server-side handler that enumerates/caches files to apply the same
weight_format filter when streaming; ensure both client and server use the same
enum/type for weight_format so filtering behavior matches the download request.

---

Nitpick comments:
In `@docs/ARCHITECTURE.md`:
- Around line 420-430: The models module inventory is out of sync: the
ModelProviderTrait signature now uses the WeightFormat type but the `models` row
in docs/ARCHITECTURE.md omits it; update that row to list `WeightFormat`
alongside `Status`, `ModelProvider`, `ModelStatus`, and `ModelStatusResponse` so
the documented exports match the actual API (reference symbols:
ModelProviderTrait, WeightFormat, models).

In `@modelexpress_common/src/lib.rs`:
- Around line 269-283: The test test_weight_format_conversion_both_ways
currently only round-trips models::WeightFormat <-> grpc::model::WeightFormat,
which can hide symmetric mapping errors; update the test to assert the exact
expected wire enum for each Rust variant (models::WeightFormat::Auto,
::Safetensors, ::Pytorch, ::All) by comparing each direct conversion to the
exact grpc::model::WeightFormat variant you expect (e.g., assert that
models::WeightFormat::Auto.into() equals the specific
grpc::model::WeightFormat::Auto value), and likewise verify any non-roundtrip
expected mappings explicitly rather than relying solely on back-and-forth
equality.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d01c349-fbfe-441f-9191-9faebc3ce321

📥 Commits

Reviewing files that changed from the base of the PR and between a8a2fd4 and 0a5fd80.

📒 Files selected for processing (17)
  • docs/ARCHITECTURE.md
  • docs/CLI.md
  • docs/DEPLOYMENT.md
  • modelexpress_client/src/bin/fallback_test.rs
  • modelexpress_client/src/bin/modules/handlers.rs
  • modelexpress_client/src/bin/test_client.rs
  • modelexpress_client/src/lib.rs
  • modelexpress_common/proto/model.proto
  • modelexpress_common/src/client_config.rs
  • modelexpress_common/src/download.rs
  • modelexpress_common/src/lib.rs
  • modelexpress_common/src/models.rs
  • modelexpress_common/src/providers.rs
  • modelexpress_common/src/providers/huggingface.rs
  • modelexpress_common/src/providers/ngc.rs
  • modelexpress_server/src/services.rs
  • workspace-tests/tests/integration_tests.rs

Comment on lines +195 to +213
fn filter_files_by_weight_format(
filenames: &[String],
weight_format: WeightFormat,
) -> Vec<String> {
match weight_format {
WeightFormat::All => filenames.to_vec(),
WeightFormat::Safetensors => filenames
.iter()
.filter(|f| !Self::is_non_preferred_weight(f, true))
.cloned()
.collect(),
WeightFormat::Pytorch => filenames
.iter()
.filter(|f| !Self::is_non_preferred_weight(f, false))
.cloned()
.collect(),
WeightFormat::Auto => Self::auto_filter_weight_files(filenames),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

GGUF is excluded in Auto but kept in Safetensors/Pytorch modes.

is_non_preferred_weight() early-returns false for any file where is_weight_file() is false, and is_weight_file() does not include .gguf. So model.gguf passes through unfiltered in Safetensors/Pytorch modes while being excluded in Auto. A user running --weight-format safetensors on a repo like TheBloke's mixed safetensors+GGUF uploads would still get GGUF shards — likely not their intent.

Since the PR goal is "exclude GGUF files by default", consider making GGUF exclusion consistent across Auto/Safetensors/Pytorch (only All should pull GGUF).

♻️ One possible shape for the fix
     fn is_non_preferred_weight(filename: &str, prefer_safetensors: bool) -> bool {
-        if !Self::is_weight_file(filename) {
+        // Treat .gguf as a weight-format file for filtering purposes so that
+        // explicit Safetensors/Pytorch modes also exclude it.
+        let is_weightish = Self::is_weight_file(filename) || filename.ends_with(".gguf");
+        if !is_weightish {
             return false;
         }
         if prefer_safetensors {
             !filename.ends_with(".safetensors")
         } else {
             !filename.ends_with(".bin")
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_common/src/providers/huggingface.rs` around lines 195 - 213, The
filter_files_by_weight_format behavior is inconsistent: Auto currently excludes
.gguf but Safetensors/Pytorch keep it because is_weight_file() doesn't recognize
.gguf and is_non_preferred_weight() early-returns false; update the logic so
.gguf is treated as a non-preferred weight in Safetensors and Pytorch modes too
(only WeightFormat::All should keep GGUF). Concretely, either extend
is_weight_file() to consider ".gguf" as a weight file or change
is_non_preferred_weight(model_filename:, prefer_safetensors:bool) to explicitly
return true for ".gguf" when prefer_safetensors or prefer_pytorch is set, and
ensure filter_files_by_weight_format (the WeightFormat::Safetensors and
::Pytorch arms) rely on that updated check.

Comment on lines +229 to +265
fn auto_filter_weight_files(filenames: &[String]) -> Vec<String> {
let has_safetensors = filenames.iter().any(|f| f.ends_with(".safetensors"));
let has_safetensors_index = filenames
.iter()
.any(|f| f.ends_with(".safetensors.index.json"));
let has_pytorch_index = filenames
.iter()
.any(|f| f == "pytorch_model.bin.index.json");

filenames
.iter()
.filter(|f| {
// Drop weight files in non-preferred formats
if Self::is_weight_file(f) || f.ends_with(".gguf") {
if has_safetensors {
if !f.ends_with(".safetensors") {
return false;
}
} else if f.ends_with(".h5") || f.ends_with(".msgpack") || f.ends_with(".gguf")
{
return false;
}
}

// Deduplicate sharded vs. consolidated within chosen format
if has_safetensors_index && *f == "model.safetensors" {
return false;
}
if has_pytorch_index && *f == "pytorch_model.bin" {
return false;
}

true
})
.cloned()
.collect()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the structure and find the file
fd -t f "huggingface.rs" --type f

Repository: ai-dynamo/modelexpress

Length of output: 114


🏁 Script executed:

# Read the specific file and the function in context
wc -l modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 119


🏁 Script executed:

# Examine the function and surrounding context
sed -n '200,270p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 2695


🏁 Script executed:

# Search for tests related to auto_filter_weight_files
rg -i "auto_filter_weight_files|weight.*file.*test|filter.*weight" --type rs -A 5

Repository: ai-dynamo/modelexpress

Length of output: 92


🏁 Script executed:

# Search for issue `#173` or related discussions
rg -i "issue.*173|#173|dedup.*weight" --type rs --type md

Repository: ai-dynamo/modelexpress

Length of output: 92


🏁 Script executed:

# Look for test fixtures or test data that might show model layouts
fd -t f "test" -type d | head -20

Repository: ai-dynamo/modelexpress

Length of output: 236


🏁 Script executed:

# Search for references to Diffusers pipelines or consolidated.safetensors in the codebase
rg -i "diffuser|consolidated.safetensors|model-.*safetensors" --type rs --type md --type json

Repository: ai-dynamo/modelexpress

Length of output: 92


🏁 Script executed:

# Find is_weight_file definition
sed -n '1,228p' modelexpress_common/src/providers/huggingface.rs | tail -100

Repository: ai-dynamo/modelexpress

Length of output: 3650


🏁 Script executed:

# Search for tests without rg type flag
find . -name "*.rs" -type f | xargs grep -l "auto_filter_weight_files"

Repository: ai-dynamo/modelexpress

Length of output: 116


🏁 Script executed:

# Search for test directories and files
find . -path "*/test*" -name "*.rs" -type f

Repository: ai-dynamo/modelexpress

Length of output: 244


🏁 Script executed:

# Look for issue `#173` reference
grep -r "173" . --include="*.rs" --include="*.md"

Repository: ai-dynamo/modelexpress

Length of output: 125


🏁 Script executed:

# Search for Diffusers or consolidated patterns in the repo
grep -r "diffuser\|consolidated" . --include="*.rs" --include="*.md" --include="*.json"

Repository: ai-dynamo/modelexpress

Length of output: 779


🏁 Script executed:

# Find the is_weight_file definition
grep -n "fn is_weight_file" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 48


🏁 Script executed:

# Read the is_weight_file function
sed -n '260,300p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 1449


🏁 Script executed:

# Search for all test functions related to filter_weight
grep -n "fn test.*filter\|fn test.*weight" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 533


🏁 Script executed:

# Look at test cases in detail
grep -n "test_filter_weight_format_auto" modelexpress_common/src/providers/huggingface.rs -A 50

Repository: ai-dynamo/modelexpress

Length of output: 5029


🏁 Script executed:

# Check if there are any tests with non-standard names or consolidated
grep -n "consolidated\|diffusion\|unet" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 361


🏁 Script executed:

# Search for the complete is_weight_file definition more carefully
grep -n "fn is_weight_file" modelexpress_common/src/providers/huggingface.rs -A 20

Repository: ai-dynamo/modelexpress

Length of output: 48


🏁 Script executed:

# Search for is_weight_file more thoroughly
grep -n "is_weight_file" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 278


🏁 Script executed:

# Read around those lines to understand the function
sed -n '155,175p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 604


🏁 Script executed:

# Let's look at the broader context - find where is_weight_file is defined
awk '/fn is_weight_file/,/^    \}/' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 48


🏁 Script executed:

# Get the line numbers where is_weight_file appears
grep -n "is_weight_file" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 278


🏁 Script executed:

# Read that section
sed -n '145,185p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 1378


🏁 Script executed:

# Search the entire codebase for is_weight_file definition
grep -r "fn is_weight_file" . --include="*.rs"

Repository: ai-dynamo/modelexpress

Length of output: 150


🏁 Script executed:

# Check the imports and see if it's defined elsewhere or in a trait
grep -n "is_weight_file\|trait.*Weight\|impl.*Weight" modelexpress_common/src/providers/huggingface.rs | head -20

Repository: ai-dynamo/modelexpress

Length of output: 278


🏁 Script executed:

# Let's read more of the file to understand the structure - look for impl blocks
grep -n "^impl\|^fn\|^    fn" modelexpress_common/src/providers/huggingface.rs | head -40

Repository: ai-dynamo/modelexpress

Length of output: 1684


🏁 Script executed:

# Try to find any reference to the function or trait definition with more context
awk '/is_weight_file/ { for(i=NR-5;i<=NR+5;i++) if (i>0) print i": " lines[i%11]; } { lines[NR%11]=$0 }' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 1510


🏁 Script executed:

# Let's search in models.rs since we saw references to weight formats there
grep -n "is_weight_file" modelexpress_common/src/models.rs

Repository: ai-dynamo/modelexpress

Length of output: 48


🏁 Script executed:

# Read the full models.rs file structure around weight-related code
grep -n "fn.*weight\|is_weight" modelexpress_common/src/models.rs

Repository: ai-dynamo/modelexpress

Length of output: 301


🏁 Script executed:

# Found it! Read the providers.rs file to find is_weight_file definition
grep -n "fn is_weight_file" modelexpress_common/src/providers.rs -A 20

Repository: ai-dynamo/modelexpress

Length of output: 588


🏁 Script executed:

# Also check what trait this belongs to
grep -n "trait.*Provider\|fn is_weight_file" modelexpress_common/src/providers.rs -B 5

Repository: ai-dynamo/modelexpress

Length of output: 469


🏁 Script executed:

# Let's see what is in providers.rs more broadly
wc -l modelexpress_common/src/providers.rs

Repository: ai-dynamo/modelexpress

Length of output: 106


🏁 Script executed:

# Read the relevant section
sed -n '1,100p' modelexpress_common/src/providers.rs

Repository: ai-dynamo/modelexpress

Length of output: 3455


🏁 Script executed:

# Now let's look for any test data or references to actual models
find . -name "*.json" -o -name "*.md" | xargs grep -l "safetensors\|pytorch_model" | head -10

Repository: ai-dynamo/modelexpress

Length of output: 123


🏁 Script executed:

# Check if there are integration tests that download real models
grep -r "download_model\|diffuser\|consolidated" . --include="*.rs" | grep -v "Binary"

Repository: ai-dynamo/modelexpress

Length of output: 5023


🏁 Script executed:

# Let's examine the actual test to understand what the review is asking
sed -n '1016,1030p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 613


🏁 Script executed:

# Search the documentation or comments for any references to model layouts
grep -r "consolidated\|sharded\|index.json" . --include="*.md" --include="*.rs" -B 2 -A 2

Repository: ai-dynamo/modelexpress

Length of output: 7364


🏁 Script executed:

# Check the git history or issues for any mention of non-standard model names
grep -r "consolidated.safetensors\|diffusion_pytorch" . --include="*.rs" --include="*.md"

Repository: ai-dynamo/modelexpress

Length of output: 48


🏁 Script executed:

# Let's check if there are any recent HF model examples that might use non-standard names
# First, let's see what documentation exists
find . -name "*.md" -type f | head -20

Repository: ai-dynamo/modelexpress

Length of output: 630


🏁 Script executed:

# Let's check the integration tests to see what models are actually tested
grep -n "model_name\|download_model" workspace-tests/tests/integration_tests.rs -A 2 -B 2 | head -50

Repository: ai-dynamo/modelexpress

Length of output: 1259


🏁 Script executed:

# Check if there's any test data or fixture definitions
find . -name "*.json" -o -name "fixtures" -type d

Repository: ai-dynamo/modelexpress

Length of output: 173


🏁 Script executed:

# Let's see what models are tested in integration tests
sed -n '1,100p' workspace-tests/tests/integration_tests.rs

Repository: ai-dynamo/modelexpress

Length of output: 3623


🏁 Script executed:

# Search for any environment variables or config that might specify test models
grep -r "MODEL\|HF_" . --include="*.rs" --include="*.md" | grep -v "Binary" | head -20

Repository: ai-dynamo/modelexpress

Length of output: 2115


🏁 Script executed:

# Check the CLAUDE.md or any documentation about the codebase
cat ./CLAUDE.md

Repository: ai-dynamo/modelexpress

Length of output: 5517


🏁 Script executed:

# Let's check if there are any real HF models that use non-standard consolidated names
# by looking at the actual HF hub documentation or searching for such patterns
rg -i "consolidated\.safetensors|diffusion_pytorch_model" . --include="*.rs" --include="*.md"

Repository: ai-dynamo/modelexpress

Length of output: 507


🏁 Script executed:

# Let's verify the subdirectory filtering logic more carefully
sed -n '174,195p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 1188


🏁 Script executed:

# Check how files are processed before auto_filter_weight_files is called
sed -n '300,380p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 3104


🏁 Script executed:

# Let's look at how the filtering works step by step
# First, understand what gets passed to auto_filter_weight_files
sed -n '320,345p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 981


🏁 Script executed:

# Check test data more thoroughly to understand what models are used
grep -n "bert-tiny" workspace-tests/tests/integration_tests.rs -B 5 -A 10

Repository: ai-dynamo/modelexpress

Length of output: 746


🏁 Script executed:

# Now let me check if there are subdirectory files being filtered out
# and understand the full filtering chain better
awk '/filter_files_by_weight_format/,/^    \}/' modelexpress_common/src/providers/huggingface.rs | head -60

Repository: ai-dynamo/modelexpress

Length of output: 2278


🏁 Script executed:

# Let me understand the exact logic flow more carefully
# Check if there are any tests that would catch the non-standard name issue
sed -n '988,1048p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 2739


🏁 Script executed:

# Let's create a test case that would expose the issue the reviewer mentions
# First, understand what the current behavior would be with consolidated.safetensors
cat > /tmp/test_consolidated.txt << 'EOF'
Scenario: Model with consolidated.safetensors and sharded files

Files in model:
- config.json
- consolidated.safetensors (consolidated file)
- consolidated-00001-of-00002.safetensors (shard 1)
- consolidated-00002-of-00002.safetensors (shard 2)  
- consolidated.safetensors.index.json (index file)

Current behavior:
1. has_safetensors = true (consolidated.safetensors exists)
2. has_safetensors_index = true (.safetensors.index.json exists)
3. Filter logic checks:
   - if has_safetensors_index && *f == "model.safetensors" -> skip
   - This does NOT match consolidated.safetensors
4. Result: ALL files kept (consolidated + shards + index)
   - Should only keep: shards + index

Expected behavior:
- Detect that consolidated.safetensors.index.json exists
- Strip .index.json to get "consolidated.safetensors"
- Remove consolidated.safetensors from result
- Keep shards and index
EOF
cat /tmp/test_consolidated.txt

Repository: ai-dynamo/modelexpress

Length of output: 946


🏁 Script executed:

# Check if there are real models on HF that use this pattern
# by looking at what the reviewer claims
grep -n "consolidated" modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 361


🏁 Script executed:

# Let's also verify if the review's concern about the hardcoded names is valid
# by checking if there are any comments or documentation about supported patterns
sed -n '174,267p' modelexpress_common/src/providers/huggingface.rs

Repository: ai-dynamo/modelexpress

Length of output: 3805


Hardcoded consolidated filenames limit dedup to two naming patterns.

The dedup logic only checks for model.safetensors and pytorch_model.bin when index files exist. Models using other consolidated filenames (e.g., consolidated.safetensors + consolidated.safetensors.index.json) would download both the consolidated file and all shards, defeating the dedup goal.

Instead of hardcoding the two canonical names, derive the consolidated filename from each index file by stripping .index.json suffix. This handles arbitrary consolidated/sharded naming conventions while remaining backward compatible.

Note: The test matrix uses prajjwal1/bert-tiny, which is small and unlikely to use sharded weights, so this doesn't surface in current tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_common/src/providers/huggingface.rs` around lines 229 - 265, The
dedup logic in auto_filter_weight_files currently only ignores
"model.safetensors" and "pytorch_model.bin" when their corresponding index files
exist; update auto_filter_weight_files to compute consolidated filenames
dynamically by scanning filenames for any entries that end with ".index.json",
stripping the ".index.json" suffix to produce the consolidated name(s), and then
filter out those consolidated filenames (e.g., "consolidated.safetensors") when
a matching index file exists; use the existing filenames iterator and
Self::is_weight_file to locate index files and compare against filenames rather
than hardcoding "model.safetensors" or "pytorch_model.bin".

Add --weight-format CLI flag (auto/safetensors/pytorch/all) to control
which weight files are downloaded. In auto mode (the default), safetensors
files are preferred over pytorch/h5/msgpack, sharded vs consolidated
duplicates are deduplicated, and GGUF files are excluded. This prevents
downloading 2-3x the data for repos that publish weights in multiple formats.

Also adds LICENSE, LICENSE.md, LICENSE.txt, and NOTICE to the default
ignored files list since these are never used by model runtimes.

Closes ai-dynamo#173

Signed-off-by: Jonathan Tong <jonathan.tong@live.com>
Signed-off-by: Jont828 <jt572@cornell.edu>
@Jont828 Jont828 force-pushed the Jont828/optimize-weight-downloads branch from 0a5fd80 to 17af317 Compare June 11, 2026 00:27
@copy-pr-bot

copy-pr-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the feat label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Optimize model downloads by skipping duplicate weight formats

1 participant