Skip to content

V0.2.1 small patch release for infmax v3 support#4

Merged
RogerLiu312 merged 26 commits into
mainfrom
feature/infmax_v3
Apr 28, 2026
Merged

V0.2.1 small patch release for infmax v3 support#4
RogerLiu312 merged 26 commits into
mainfrom
feature/infmax_v3

Conversation

@RogerLiu312
Copy link
Copy Markdown
Collaborator

sflow v0.2.1 Release Notes

Release date: April 2026
Previous release: v0.2.0 (March 2026)


Highlights

sflow v0.2.1 is a documentation and workflow polish release for the InfMax v3 migration path. It documents the branch behavior for CSV-driven execution, self-contained YAML batch submission, replica variable domains, node placement, and probe orchestration.


User-Facing Changes

CLI and Batch Workflows

  • sflow run --bulk-input now has documented single-row CSV execution. Use --row with exactly one selector to run a specific CSV row.
  • Advanced --row selectors are documented for run, compose, and batch: repeated flags, comma lists, Python-style slices with exclusive end, open-ended slices, and negative indices such as --row=-1.
  • sflow batch --bulk-submit is documented for submitting self-contained YAML files, folders, or glob patterns without CSV merging.
  • Auto-derived node counts are documented. Single-job and bulk-submit batch modes can derive --nodes from the Slurm backend; bulk-input mode requires either --nodes or a CSV node-count column.
  • --sflow-version is documented for pinning the git ref installed by generated sbatch scripts.
  • Expression-aware --sbatch-extra-args is documented. Extra sbatch directives can resolve ${{ variables.X }} or shorthand ${{ X }} from config defaults, CLI --set, and CSV row values.

Variables and Replica Sweeps

  • Variable domain metadata is documented through ${{ variables.NAME.domain }}.
  • Replica sweep behavior is clarified: ${{ variables.NAME }} resolves to the per-replica value, while ${{ variables.NAME.domain }} remains the full domain list.
  • Domain overrides via --set are documented: JSON-style list values update the variable domain, and the variable value becomes the first list item.

Resources and Placement

  • resources.nodes.exclude is documented for removing nodes from the placement pool before applying indices, count, or GPU packing.
  • Negative node indices are clarified, including the fact that negative indices are resolved after exclude filtering.
  • Default Slurm placement is documented: when a task does not set resources.nodes, sflow passes the full backend allocation to srun.
  • GPU packing behavior is documented, including multi-node expansion when a GPU request is an exact multiple of gpus_per_node.

Probes

  • Probe timing defaults are documented, including timeout: 1200 for readiness probes and each_check_timeout: 30.
  • HTTP probes (http_get and http_post) are documented with examples.
  • Multiple readiness probes are documented as AND semantics: all readiness probes must trigger before a task becomes ready.
  • Failure probes are documented as fail-fast signals that mark tasks as failed by probe and cancel downstream work.
  • Replica HTTP probe deduplication is documented for parallel replicas with identical HTTP probes.

Documentation Updated

  • docs/user/cli.md
  • docs/user/variables.md
  • docs/user/resources.md
  • docs/user/probes.md
  • docs/user/quick-reference.md
  • docs/user/configuration.md
  • docs/user/architecture.md

… variable expressions in batch scripts. Add tests for expression handling and ensure backward compatibility with existing args.
…on for bulk input operations. Update CLI help text and enhance parsing logic to handle new formats. Extend tests to cover new functionality, ensuring correct behavior for various row selection scenarios.
…perations. Enhance tests to verify presence and correctness of the new column in output files.
…es in task allocation. Update documentation to clarify usage and provide examples. Improve validation logic for node exclusion and indices, ensuring correct error handling for out-of-range values. Extend unit tests to cover new functionality and edge cases.
…ted tests. Adjust local_variable_domain.yaml to reflect new concurrency domain values. Improve script execution to verify per-replica value resolution and domain access in task scripts.
…r CSV inputs for variable overrides. Modify related documentation and tests to reflect this change, ensuring consistency across batch and compose operations. Breaking, due to real business scenario, for --bulk-input, --set overwrite csv value fits real scenario more
…nhance `full_sample_tests.sh` to verify default sflow version aligns with the current environment. Update `batch.py` to determine the effective sflow version based on the installed package or git reference. Extend unit tests to cover new version resolution logic and ensure correct behavior in batch operations.
… indices and exclusion lists. Update `build_task_graph` to resolve these expressions correctly, allowing for dynamic node selection in workflows. Extend unit tests to validate the new functionality, ensuring correct behavior for both indices and exclusion scenarios.
…s. Update `ProbesConfig` to accept a list of readiness probes, ensuring backward compatibility with single probe objects. Modify `build_task_graph` to handle multiple readiness probes and adjust orchestrator logic to require all probes to trigger for task readiness. Extend unit tests to validate the new functionality and ensure compatibility with existing configurations.
…w enhancements, variable domain metadata, resource management updates, and probe behavior clarifications. Update related documentation files to reflect these changes.
@RogerLiu312 RogerLiu312 merged commit 84c2fa3 into main Apr 28, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant