feat: optimize model downloads by skipping duplicate weight formats#248
feat: optimize model downloads by skipping duplicate weight formats#248Jont828 wants to merge 1 commit into
Conversation
|
The lint PR title job is failing with |
WalkthroughThis pull request introduces weight format optimization for model downloads. A new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelexpress_client/src/lib.rs (2)
304-312:⚠️ Potential issue | 🟠 MajorUse the client’s configured cache for direct fallback.
Both instance fallback paths switch to direct download with
CacheConfig::discover()/download_model_directly(), which ignoresself.cache_configand the provider-specific cache resolution inget_cache_dir(). A client configured with a custom cache can report fallback success while downloading into a different cache root.Proposed fix
- Self::download_model_directly(model_name, provider, ignore_weights, weight_format) - .await + download::download_model( + model_name, + provider, + Some(self.get_cache_dir(provider)), + ignore_weights, + weight_format, + ) + .await + .map(|_| ()) + .map_err(|e| { + modelexpress_common::Error::Server(format!("Direct download failed: {e}")) + .into() + })- let cache_dir = CacheConfig::discover().ok().map(|config| config.local_path); + let cache_dir = Some(self.get_cache_dir(provider)); match download::download_model(&model_name, provider, cache_dir, ignore_weights, weight_format).await {Also applies to: 417-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelexpress_client/src/lib.rs` around lines 304 - 312, The fallback path uses CacheConfig::discover()/download_model_directly() which ignores the client's configured cache; update the fallback branches in the code paths that call Self::download_model_directly (seen around the Err(e) block and the similar block at lines 417-419) to use the client's self.cache_config (or resolve the provider-specific dir via self.get_cache_dir(provider, &self.cache_config)) and pass that cache/config into download_model_directly (or change download_model_directly signature to accept a cache_config or explicit cache_dir) so the direct download writes to the same cache the client was configured to use.
298-299:⚠️ Potential issue | 🟠 MajorPropagate
weight_formatthrough the file streaming request.When
shared_storageis disabled, the filtered server download is followed by a separatestream_model_filesrequest that does not includeweight_format. The server streams all cached files without applying the selected format filter, potentially delivering duplicate formats or GGUF files that the download request rejected.Extend the
ModelFilesRequestprotobuf message to carryweight_format, updatestream_model_files_from_server()to pass it, and ensure the server applies the same filtering during file enumeration.Implementation changes needed
- self.stream_model_files_from_server(model_name, provider) + self.stream_model_files_from_server(model_name, provider, weight_format) .await?;pub async fn stream_model_files_from_server( &mut self, model_name: &str, provider: ModelProvider, + weight_format: WeightFormat, ) -> CommonResult<()> {let grpc_request = tonic::Request::new(ModelFilesRequest { model_name: model_name.to_string(), provider: modelexpress_common::grpc::model::ModelProvider::from(provider) as i32, chunk_size, + weight_format: modelexpress_common::grpc::model::WeightFormat::from(weight_format) as i32, });Also applies to: 403-404, 466-470
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelexpress_client/src/lib.rs` around lines 298 - 299, The stream path is missing the selected weight_format so the server streams all cached files; extend the ModelFilesRequest protobuf to include a weight_format field, update the client call sites (e.g., where self.stream_model_files_from_server(model_name, provider) is invoked at the noted locations) and change the signature of stream_model_files_from_server(...) to accept and set that weight_format into the ModelFilesRequest, and finally update the server-side handler that enumerates/caches files to apply the same weight_format filter when streaming; ensure both client and server use the same enum/type for weight_format so filtering behavior matches the download request.
🧹 Nitpick comments (2)
docs/ARCHITECTURE.md (1)
420-430: Keep the module inventory in sync with the new type.The provider trait signature now documents
WeightFormat, but themodelsmodule row still omits it.Proposed documentation update
-| `models` | `Status`, `ModelProvider`, `ModelStatus`, `ModelStatusResponse` | +| `models` | `Status`, `ModelProvider`, `WeightFormat`, `ModelStatus`, `ModelStatusResponse` |As per coding guidelines,
docs/ARCHITECTURE.mdshould be updated when making architecture or gRPC-service related changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/ARCHITECTURE.md` around lines 420 - 430, The models module inventory is out of sync: the ModelProviderTrait signature now uses the WeightFormat type but the `models` row in docs/ARCHITECTURE.md omits it; update that row to list `WeightFormat` alongside `Status`, `ModelProvider`, `ModelStatus`, and `ModelStatusResponse` so the documented exports match the actual API (reference symbols: ModelProviderTrait, WeightFormat, models).modelexpress_common/src/lib.rs (1)
269-283: Assert the wire mapping, not only round-tripping.A symmetric mapping mistake would still pass this test while putting the wrong enum value on the wire. Add direct assertions for each Rust variant against the expected gRPC variant.
Proposed test hardening
#[test] fn test_weight_format_conversion_both_ways() { - let formats = vec![ - models::WeightFormat::Auto, - models::WeightFormat::Safetensors, - models::WeightFormat::Pytorch, - models::WeightFormat::All, + let formats = [ + ( + models::WeightFormat::Auto, + grpc::model::WeightFormat::Auto, + ), + ( + models::WeightFormat::Safetensors, + grpc::model::WeightFormat::Safetensors, + ), + ( + models::WeightFormat::Pytorch, + grpc::model::WeightFormat::Pytorch, + ), + ( + models::WeightFormat::All, + grpc::model::WeightFormat::All, + ), ]; - for format in formats { - let grpc_format: grpc::model::WeightFormat = format.into(); + for (format, expected_grpc_format) in formats { + let grpc_format: grpc::model::WeightFormat = format.into(); + assert_eq!(grpc_format, expected_grpc_format); + let back_to_model: models::WeightFormat = grpc_format.into(); assert_eq!(format, back_to_model); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelexpress_common/src/lib.rs` around lines 269 - 283, The test test_weight_format_conversion_both_ways currently only round-trips models::WeightFormat <-> grpc::model::WeightFormat, which can hide symmetric mapping errors; update the test to assert the exact expected wire enum for each Rust variant (models::WeightFormat::Auto, ::Safetensors, ::Pytorch, ::All) by comparing each direct conversion to the exact grpc::model::WeightFormat variant you expect (e.g., assert that models::WeightFormat::Auto.into() equals the specific grpc::model::WeightFormat::Auto value), and likewise verify any non-roundtrip expected mappings explicitly rather than relying solely on back-and-forth equality.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelexpress_common/src/providers/huggingface.rs`:
- Around line 195-213: The filter_files_by_weight_format behavior is
inconsistent: Auto currently excludes .gguf but Safetensors/Pytorch keep it
because is_weight_file() doesn't recognize .gguf and is_non_preferred_weight()
early-returns false; update the logic so .gguf is treated as a non-preferred
weight in Safetensors and Pytorch modes too (only WeightFormat::All should keep
GGUF). Concretely, either extend is_weight_file() to consider ".gguf" as a
weight file or change is_non_preferred_weight(model_filename:,
prefer_safetensors:bool) to explicitly return true for ".gguf" when
prefer_safetensors or prefer_pytorch is set, and ensure
filter_files_by_weight_format (the WeightFormat::Safetensors and ::Pytorch arms)
rely on that updated check.
- Around line 229-265: The dedup logic in auto_filter_weight_files currently
only ignores "model.safetensors" and "pytorch_model.bin" when their
corresponding index files exist; update auto_filter_weight_files to compute
consolidated filenames dynamically by scanning filenames for any entries that
end with ".index.json", stripping the ".index.json" suffix to produce the
consolidated name(s), and then filter out those consolidated filenames (e.g.,
"consolidated.safetensors") when a matching index file exists; use the existing
filenames iterator and Self::is_weight_file to locate index files and compare
against filenames rather than hardcoding "model.safetensors" or
"pytorch_model.bin".
---
Outside diff comments:
In `@modelexpress_client/src/lib.rs`:
- Around line 304-312: The fallback path uses
CacheConfig::discover()/download_model_directly() which ignores the client's
configured cache; update the fallback branches in the code paths that call
Self::download_model_directly (seen around the Err(e) block and the similar
block at lines 417-419) to use the client's self.cache_config (or resolve the
provider-specific dir via self.get_cache_dir(provider, &self.cache_config)) and
pass that cache/config into download_model_directly (or change
download_model_directly signature to accept a cache_config or explicit
cache_dir) so the direct download writes to the same cache the client was
configured to use.
- Around line 298-299: The stream path is missing the selected weight_format so
the server streams all cached files; extend the ModelFilesRequest protobuf to
include a weight_format field, update the client call sites (e.g., where
self.stream_model_files_from_server(model_name, provider) is invoked at the
noted locations) and change the signature of stream_model_files_from_server(...)
to accept and set that weight_format into the ModelFilesRequest, and finally
update the server-side handler that enumerates/caches files to apply the same
weight_format filter when streaming; ensure both client and server use the same
enum/type for weight_format so filtering behavior matches the download request.
---
Nitpick comments:
In `@docs/ARCHITECTURE.md`:
- Around line 420-430: The models module inventory is out of sync: the
ModelProviderTrait signature now uses the WeightFormat type but the `models` row
in docs/ARCHITECTURE.md omits it; update that row to list `WeightFormat`
alongside `Status`, `ModelProvider`, `ModelStatus`, and `ModelStatusResponse` so
the documented exports match the actual API (reference symbols:
ModelProviderTrait, WeightFormat, models).
In `@modelexpress_common/src/lib.rs`:
- Around line 269-283: The test test_weight_format_conversion_both_ways
currently only round-trips models::WeightFormat <-> grpc::model::WeightFormat,
which can hide symmetric mapping errors; update the test to assert the exact
expected wire enum for each Rust variant (models::WeightFormat::Auto,
::Safetensors, ::Pytorch, ::All) by comparing each direct conversion to the
exact grpc::model::WeightFormat variant you expect (e.g., assert that
models::WeightFormat::Auto.into() equals the specific
grpc::model::WeightFormat::Auto value), and likewise verify any non-roundtrip
expected mappings explicitly rather than relying solely on back-and-forth
equality.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8d01c349-fbfe-441f-9191-9faebc3ce321
📒 Files selected for processing (17)
docs/ARCHITECTURE.mddocs/CLI.mddocs/DEPLOYMENT.mdmodelexpress_client/src/bin/fallback_test.rsmodelexpress_client/src/bin/modules/handlers.rsmodelexpress_client/src/bin/test_client.rsmodelexpress_client/src/lib.rsmodelexpress_common/proto/model.protomodelexpress_common/src/client_config.rsmodelexpress_common/src/download.rsmodelexpress_common/src/lib.rsmodelexpress_common/src/models.rsmodelexpress_common/src/providers.rsmodelexpress_common/src/providers/huggingface.rsmodelexpress_common/src/providers/ngc.rsmodelexpress_server/src/services.rsworkspace-tests/tests/integration_tests.rs
| fn filter_files_by_weight_format( | ||
| filenames: &[String], | ||
| weight_format: WeightFormat, | ||
| ) -> Vec<String> { | ||
| match weight_format { | ||
| WeightFormat::All => filenames.to_vec(), | ||
| WeightFormat::Safetensors => filenames | ||
| .iter() | ||
| .filter(|f| !Self::is_non_preferred_weight(f, true)) | ||
| .cloned() | ||
| .collect(), | ||
| WeightFormat::Pytorch => filenames | ||
| .iter() | ||
| .filter(|f| !Self::is_non_preferred_weight(f, false)) | ||
| .cloned() | ||
| .collect(), | ||
| WeightFormat::Auto => Self::auto_filter_weight_files(filenames), | ||
| } | ||
| } |
There was a problem hiding this comment.
GGUF is excluded in Auto but kept in Safetensors/Pytorch modes.
is_non_preferred_weight() early-returns false for any file where is_weight_file() is false, and is_weight_file() does not include .gguf. So model.gguf passes through unfiltered in Safetensors/Pytorch modes while being excluded in Auto. A user running --weight-format safetensors on a repo like TheBloke's mixed safetensors+GGUF uploads would still get GGUF shards — likely not their intent.
Since the PR goal is "exclude GGUF files by default", consider making GGUF exclusion consistent across Auto/Safetensors/Pytorch (only All should pull GGUF).
♻️ One possible shape for the fix
fn is_non_preferred_weight(filename: &str, prefer_safetensors: bool) -> bool {
- if !Self::is_weight_file(filename) {
+ // Treat .gguf as a weight-format file for filtering purposes so that
+ // explicit Safetensors/Pytorch modes also exclude it.
+ let is_weightish = Self::is_weight_file(filename) || filename.ends_with(".gguf");
+ if !is_weightish {
return false;
}
if prefer_safetensors {
!filename.ends_with(".safetensors")
} else {
!filename.ends_with(".bin")
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelexpress_common/src/providers/huggingface.rs` around lines 195 - 213, The
filter_files_by_weight_format behavior is inconsistent: Auto currently excludes
.gguf but Safetensors/Pytorch keep it because is_weight_file() doesn't recognize
.gguf and is_non_preferred_weight() early-returns false; update the logic so
.gguf is treated as a non-preferred weight in Safetensors and Pytorch modes too
(only WeightFormat::All should keep GGUF). Concretely, either extend
is_weight_file() to consider ".gguf" as a weight file or change
is_non_preferred_weight(model_filename:, prefer_safetensors:bool) to explicitly
return true for ".gguf" when prefer_safetensors or prefer_pytorch is set, and
ensure filter_files_by_weight_format (the WeightFormat::Safetensors and
::Pytorch arms) rely on that updated check.
| fn auto_filter_weight_files(filenames: &[String]) -> Vec<String> { | ||
| let has_safetensors = filenames.iter().any(|f| f.ends_with(".safetensors")); | ||
| let has_safetensors_index = filenames | ||
| .iter() | ||
| .any(|f| f.ends_with(".safetensors.index.json")); | ||
| let has_pytorch_index = filenames | ||
| .iter() | ||
| .any(|f| f == "pytorch_model.bin.index.json"); | ||
|
|
||
| filenames | ||
| .iter() | ||
| .filter(|f| { | ||
| // Drop weight files in non-preferred formats | ||
| if Self::is_weight_file(f) || f.ends_with(".gguf") { | ||
| if has_safetensors { | ||
| if !f.ends_with(".safetensors") { | ||
| return false; | ||
| } | ||
| } else if f.ends_with(".h5") || f.ends_with(".msgpack") || f.ends_with(".gguf") | ||
| { | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| // Deduplicate sharded vs. consolidated within chosen format | ||
| if has_safetensors_index && *f == "model.safetensors" { | ||
| return false; | ||
| } | ||
| if has_pytorch_index && *f == "pytorch_model.bin" { | ||
| return false; | ||
| } | ||
|
|
||
| true | ||
| }) | ||
| .cloned() | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the structure and find the file
fd -t f "huggingface.rs" --type fRepository: ai-dynamo/modelexpress
Length of output: 114
🏁 Script executed:
# Read the specific file and the function in context
wc -l modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 119
🏁 Script executed:
# Examine the function and surrounding context
sed -n '200,270p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 2695
🏁 Script executed:
# Search for tests related to auto_filter_weight_files
rg -i "auto_filter_weight_files|weight.*file.*test|filter.*weight" --type rs -A 5Repository: ai-dynamo/modelexpress
Length of output: 92
🏁 Script executed:
# Search for issue `#173` or related discussions
rg -i "issue.*173|#173|dedup.*weight" --type rs --type mdRepository: ai-dynamo/modelexpress
Length of output: 92
🏁 Script executed:
# Look for test fixtures or test data that might show model layouts
fd -t f "test" -type d | head -20Repository: ai-dynamo/modelexpress
Length of output: 236
🏁 Script executed:
# Search for references to Diffusers pipelines or consolidated.safetensors in the codebase
rg -i "diffuser|consolidated.safetensors|model-.*safetensors" --type rs --type md --type jsonRepository: ai-dynamo/modelexpress
Length of output: 92
🏁 Script executed:
# Find is_weight_file definition
sed -n '1,228p' modelexpress_common/src/providers/huggingface.rs | tail -100Repository: ai-dynamo/modelexpress
Length of output: 3650
🏁 Script executed:
# Search for tests without rg type flag
find . -name "*.rs" -type f | xargs grep -l "auto_filter_weight_files"Repository: ai-dynamo/modelexpress
Length of output: 116
🏁 Script executed:
# Search for test directories and files
find . -path "*/test*" -name "*.rs" -type fRepository: ai-dynamo/modelexpress
Length of output: 244
🏁 Script executed:
# Look for issue `#173` reference
grep -r "173" . --include="*.rs" --include="*.md"Repository: ai-dynamo/modelexpress
Length of output: 125
🏁 Script executed:
# Search for Diffusers or consolidated patterns in the repo
grep -r "diffuser\|consolidated" . --include="*.rs" --include="*.md" --include="*.json"Repository: ai-dynamo/modelexpress
Length of output: 779
🏁 Script executed:
# Find the is_weight_file definition
grep -n "fn is_weight_file" modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 48
🏁 Script executed:
# Read the is_weight_file function
sed -n '260,300p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 1449
🏁 Script executed:
# Search for all test functions related to filter_weight
grep -n "fn test.*filter\|fn test.*weight" modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 533
🏁 Script executed:
# Look at test cases in detail
grep -n "test_filter_weight_format_auto" modelexpress_common/src/providers/huggingface.rs -A 50Repository: ai-dynamo/modelexpress
Length of output: 5029
🏁 Script executed:
# Check if there are any tests with non-standard names or consolidated
grep -n "consolidated\|diffusion\|unet" modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 361
🏁 Script executed:
# Search for the complete is_weight_file definition more carefully
grep -n "fn is_weight_file" modelexpress_common/src/providers/huggingface.rs -A 20Repository: ai-dynamo/modelexpress
Length of output: 48
🏁 Script executed:
# Search for is_weight_file more thoroughly
grep -n "is_weight_file" modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 278
🏁 Script executed:
# Read around those lines to understand the function
sed -n '155,175p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 604
🏁 Script executed:
# Let's look at the broader context - find where is_weight_file is defined
awk '/fn is_weight_file/,/^ \}/' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 48
🏁 Script executed:
# Get the line numbers where is_weight_file appears
grep -n "is_weight_file" modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 278
🏁 Script executed:
# Read that section
sed -n '145,185p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 1378
🏁 Script executed:
# Search the entire codebase for is_weight_file definition
grep -r "fn is_weight_file" . --include="*.rs"Repository: ai-dynamo/modelexpress
Length of output: 150
🏁 Script executed:
# Check the imports and see if it's defined elsewhere or in a trait
grep -n "is_weight_file\|trait.*Weight\|impl.*Weight" modelexpress_common/src/providers/huggingface.rs | head -20Repository: ai-dynamo/modelexpress
Length of output: 278
🏁 Script executed:
# Let's read more of the file to understand the structure - look for impl blocks
grep -n "^impl\|^fn\|^ fn" modelexpress_common/src/providers/huggingface.rs | head -40Repository: ai-dynamo/modelexpress
Length of output: 1684
🏁 Script executed:
# Try to find any reference to the function or trait definition with more context
awk '/is_weight_file/ { for(i=NR-5;i<=NR+5;i++) if (i>0) print i": " lines[i%11]; } { lines[NR%11]=$0 }' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 1510
🏁 Script executed:
# Let's search in models.rs since we saw references to weight formats there
grep -n "is_weight_file" modelexpress_common/src/models.rsRepository: ai-dynamo/modelexpress
Length of output: 48
🏁 Script executed:
# Read the full models.rs file structure around weight-related code
grep -n "fn.*weight\|is_weight" modelexpress_common/src/models.rsRepository: ai-dynamo/modelexpress
Length of output: 301
🏁 Script executed:
# Found it! Read the providers.rs file to find is_weight_file definition
grep -n "fn is_weight_file" modelexpress_common/src/providers.rs -A 20Repository: ai-dynamo/modelexpress
Length of output: 588
🏁 Script executed:
# Also check what trait this belongs to
grep -n "trait.*Provider\|fn is_weight_file" modelexpress_common/src/providers.rs -B 5Repository: ai-dynamo/modelexpress
Length of output: 469
🏁 Script executed:
# Let's see what is in providers.rs more broadly
wc -l modelexpress_common/src/providers.rsRepository: ai-dynamo/modelexpress
Length of output: 106
🏁 Script executed:
# Read the relevant section
sed -n '1,100p' modelexpress_common/src/providers.rsRepository: ai-dynamo/modelexpress
Length of output: 3455
🏁 Script executed:
# Now let's look for any test data or references to actual models
find . -name "*.json" -o -name "*.md" | xargs grep -l "safetensors\|pytorch_model" | head -10Repository: ai-dynamo/modelexpress
Length of output: 123
🏁 Script executed:
# Check if there are integration tests that download real models
grep -r "download_model\|diffuser\|consolidated" . --include="*.rs" | grep -v "Binary"Repository: ai-dynamo/modelexpress
Length of output: 5023
🏁 Script executed:
# Let's examine the actual test to understand what the review is asking
sed -n '1016,1030p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 613
🏁 Script executed:
# Search the documentation or comments for any references to model layouts
grep -r "consolidated\|sharded\|index.json" . --include="*.md" --include="*.rs" -B 2 -A 2Repository: ai-dynamo/modelexpress
Length of output: 7364
🏁 Script executed:
# Check the git history or issues for any mention of non-standard model names
grep -r "consolidated.safetensors\|diffusion_pytorch" . --include="*.rs" --include="*.md"Repository: ai-dynamo/modelexpress
Length of output: 48
🏁 Script executed:
# Let's check if there are any recent HF model examples that might use non-standard names
# First, let's see what documentation exists
find . -name "*.md" -type f | head -20Repository: ai-dynamo/modelexpress
Length of output: 630
🏁 Script executed:
# Let's check the integration tests to see what models are actually tested
grep -n "model_name\|download_model" workspace-tests/tests/integration_tests.rs -A 2 -B 2 | head -50Repository: ai-dynamo/modelexpress
Length of output: 1259
🏁 Script executed:
# Check if there's any test data or fixture definitions
find . -name "*.json" -o -name "fixtures" -type dRepository: ai-dynamo/modelexpress
Length of output: 173
🏁 Script executed:
# Let's see what models are tested in integration tests
sed -n '1,100p' workspace-tests/tests/integration_tests.rsRepository: ai-dynamo/modelexpress
Length of output: 3623
🏁 Script executed:
# Search for any environment variables or config that might specify test models
grep -r "MODEL\|HF_" . --include="*.rs" --include="*.md" | grep -v "Binary" | head -20Repository: ai-dynamo/modelexpress
Length of output: 2115
🏁 Script executed:
# Check the CLAUDE.md or any documentation about the codebase
cat ./CLAUDE.mdRepository: ai-dynamo/modelexpress
Length of output: 5517
🏁 Script executed:
# Let's check if there are any real HF models that use non-standard consolidated names
# by looking at the actual HF hub documentation or searching for such patterns
rg -i "consolidated\.safetensors|diffusion_pytorch_model" . --include="*.rs" --include="*.md"Repository: ai-dynamo/modelexpress
Length of output: 507
🏁 Script executed:
# Let's verify the subdirectory filtering logic more carefully
sed -n '174,195p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 1188
🏁 Script executed:
# Check how files are processed before auto_filter_weight_files is called
sed -n '300,380p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 3104
🏁 Script executed:
# Let's look at how the filtering works step by step
# First, understand what gets passed to auto_filter_weight_files
sed -n '320,345p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 981
🏁 Script executed:
# Check test data more thoroughly to understand what models are used
grep -n "bert-tiny" workspace-tests/tests/integration_tests.rs -B 5 -A 10Repository: ai-dynamo/modelexpress
Length of output: 746
🏁 Script executed:
# Now let me check if there are subdirectory files being filtered out
# and understand the full filtering chain better
awk '/filter_files_by_weight_format/,/^ \}/' modelexpress_common/src/providers/huggingface.rs | head -60Repository: ai-dynamo/modelexpress
Length of output: 2278
🏁 Script executed:
# Let me understand the exact logic flow more carefully
# Check if there are any tests that would catch the non-standard name issue
sed -n '988,1048p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 2739
🏁 Script executed:
# Let's create a test case that would expose the issue the reviewer mentions
# First, understand what the current behavior would be with consolidated.safetensors
cat > /tmp/test_consolidated.txt << 'EOF'
Scenario: Model with consolidated.safetensors and sharded files
Files in model:
- config.json
- consolidated.safetensors (consolidated file)
- consolidated-00001-of-00002.safetensors (shard 1)
- consolidated-00002-of-00002.safetensors (shard 2)
- consolidated.safetensors.index.json (index file)
Current behavior:
1. has_safetensors = true (consolidated.safetensors exists)
2. has_safetensors_index = true (.safetensors.index.json exists)
3. Filter logic checks:
- if has_safetensors_index && *f == "model.safetensors" -> skip
- This does NOT match consolidated.safetensors
4. Result: ALL files kept (consolidated + shards + index)
- Should only keep: shards + index
Expected behavior:
- Detect that consolidated.safetensors.index.json exists
- Strip .index.json to get "consolidated.safetensors"
- Remove consolidated.safetensors from result
- Keep shards and index
EOF
cat /tmp/test_consolidated.txtRepository: ai-dynamo/modelexpress
Length of output: 946
🏁 Script executed:
# Check if there are real models on HF that use this pattern
# by looking at what the reviewer claims
grep -n "consolidated" modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 361
🏁 Script executed:
# Let's also verify if the review's concern about the hardcoded names is valid
# by checking if there are any comments or documentation about supported patterns
sed -n '174,267p' modelexpress_common/src/providers/huggingface.rsRepository: ai-dynamo/modelexpress
Length of output: 3805
Hardcoded consolidated filenames limit dedup to two naming patterns.
The dedup logic only checks for model.safetensors and pytorch_model.bin when index files exist. Models using other consolidated filenames (e.g., consolidated.safetensors + consolidated.safetensors.index.json) would download both the consolidated file and all shards, defeating the dedup goal.
Instead of hardcoding the two canonical names, derive the consolidated filename from each index file by stripping .index.json suffix. This handles arbitrary consolidated/sharded naming conventions while remaining backward compatible.
Note: The test matrix uses prajjwal1/bert-tiny, which is small and unlikely to use sharded weights, so this doesn't surface in current tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelexpress_common/src/providers/huggingface.rs` around lines 229 - 265, The
dedup logic in auto_filter_weight_files currently only ignores
"model.safetensors" and "pytorch_model.bin" when their corresponding index files
exist; update auto_filter_weight_files to compute consolidated filenames
dynamically by scanning filenames for any entries that end with ".index.json",
stripping the ".index.json" suffix to produce the consolidated name(s), and then
filter out those consolidated filenames (e.g., "consolidated.safetensors") when
a matching index file exists; use the existing filenames iterator and
Self::is_weight_file to locate index files and compare against filenames rather
than hardcoding "model.safetensors" or "pytorch_model.bin".
Add --weight-format CLI flag (auto/safetensors/pytorch/all) to control which weight files are downloaded. In auto mode (the default), safetensors files are preferred over pytorch/h5/msgpack, sharded vs consolidated duplicates are deduplicated, and GGUF files are excluded. This prevents downloading 2-3x the data for repos that publish weights in multiple formats. Also adds LICENSE, LICENSE.md, LICENSE.txt, and NOTICE to the default ignored files list since these are never used by model runtimes. Closes ai-dynamo#173 Signed-off-by: Jonathan Tong <jonathan.tong@live.com> Signed-off-by: Jont828 <jt572@cornell.edu>
0a5fd80 to
17af317
Compare
Summary
--weight-formatCLI flag (auto/safetensors/pytorch/all) andMODEL_EXPRESS_WEIGHT_FORMATenv var to control which weight file formats are downloadedautomode (default), prefer safetensors over pytorch/h5/msgpack, deduplicate sharded vs consolidated weights using index file presence, and exclude GGUF filesLICENSE,LICENSE.md,LICENSE.txt, andNOTICEto the default ignored files list since they are never used by model runtimesCloses #173
Test plan
WeightFormatenum andfilter_files_by_weight_format()logiccargo clippypasses with no warningsopenai/gpt-oss-20b) with--weight-format autoand verify only safetensors files are fetched--weight-format pytorchdownloads only.binfiles--weight-format allpreserves current behaviorMODEL_EXPRESS_WEIGHT_FORMAT=safetensorsenv var worksSummary by CodeRabbit
Release Notes
New Features
--weight-formatglobal option to specify model weight file format preferences:auto,safetensors,pytorch, orall(defaults toauto)MODEL_EXPRESS_WEIGHT_FORMATenvironment variable for weight format configurationDocumentation