Skip to content

Add logs_size to con-duct ls output#406

Draft
Copilot wants to merge 6 commits into
mainfrom
copilot/add-files-size-to-ls-f
Draft

Add logs_size to con-duct ls output#406
Copilot wants to merge 6 commits into
mainfrom
copilot/add-files-size-to-ls-f

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 8, 2026

Users accumulating large log directories had no quick way to correlate session file sizes with other execution metrics like duration and memory usage.

Changes

  • New logs_size field: Computed at query time by globbing all files matching the session prefix and summing their sizes. Not stored in the info JSON — computed on the fly in load_duct_runs only when needed.
  • Lazy computation: logs_size is only computed when it is actually requested — i.e. when "logs_size" is in args.fields or referenced in args.eval_filter. This avoids unnecessary glob + stat calls on large/NFS log directories when the field isn't needed.
  • Humanized display: Uses the existing !S conversion ("{value!S}"), so it renders as e.g. 4.2 MB in summaries/pyout output and as raw bytes in json/json_pp/yaml.
  • Default field: Added to the default --fields list so it shows up without any extra flags.
  • COMPUTED_FIELDS list: Distinguishes fields computed at query time from those present in the info JSON schema, allowing the schema conformance test to exclude them.
  • OSError handling: Files that cannot be accessed (e.g. due to permissions) emit a WARNING and are skipped; the sizes of accessible files are still summed. The try/except OSError wraps both is_file() and stat() to correctly catch PermissionError on Python 3.12+.

Example

$ con-duct ls
Prefix: .duct/logs/2024.03.01T12.00.00_
  Command: python train.py
  Exit Code: 0
  Wall Clock Time: 3621.042 sec
  Peak Rss: 12.4 GB
  Logs Size: 587.3 MB

Copilot AI linked an issue Apr 8, 2026 that may be closed by this pull request
@yarikoptic yarikoptic added the semver-minor Increment the minor version when merged label Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.52%. Comparing base (2162da9) to head (c19eaf9).

Files with missing lines Patch % Lines
src/con_duct/ls.py 76.92% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #406      +/-   ##
==========================================
- Coverage   91.87%   91.52%   -0.35%     
==========================================
  Files          15       15              
  Lines        1120     1133      +13     
  Branches      139      141       +2     
==========================================
+ Hits         1029     1037       +8     
- Misses         69       72       +3     
- Partials       22       24       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI and others added 2 commits April 8, 2026 11:56
- Add `files_size` to `VALUE_TRANSFORMATION_MAP` with `{value!S}` (humanized size)
- Add `COMPUTED_FIELDS` list to distinguish computed fields from JSON schema fields
- Add `compute_files_size()` function that sums sizes of all session files via glob
- Compute `files_size` in `load_duct_runs` after loading each run
- Add `files_size` to default `--fields` in `_create_ls_parser`
- Update `test_schema.py` to exclude computed fields from schema equality check
- Add tests for `compute_files_size`, `load_duct_runs`, and `ls` output

Agent-Logs-Url: https://github.com/con/duct/sessions/829c6010-40e6-416a-99be-6f9361c06057

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
Copilot AI changed the title [WIP] Add files_size to ls -F command Add files_size to con-duct ls output Apr 8, 2026
Copilot AI requested a review from yarikoptic April 8, 2026 12:02
Comment thread src/con_duct/ls.py Outdated
if path.is_file():
try:
total += path.stat().st_size
except OSError:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warrants at least logging for debug

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot add logging ... since unexpected, I think it could even be WARNING

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added lgr.debug("Could not get size of %s: %s", path_str, e) in the except OSError block in ca997ba.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgraded to lgr.warning(...) in c19eaf9.

Comment thread test/test_ls.py Outdated
result = self._run_ls(["file1_info.json"], "summaries", args)
assert "Files Size:" in result
# Should show humanized size (Bytes or kB etc.) not raw integer
assert any(unit in result for unit in ["Byte", "kB", "MB", "GB"])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of those tests warrant just addition of testing within existing tests instead of breeding their own -- I bet we do not have a test per each column. @copilot please review and adjust to retain coverage and testing but reduce code added in the tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated in ca997ba:

  • Merged the empty-prefix case into test_compute_files_size_sums_all_files as a second assertion
  • Removed test_load_duct_runs_includes_files_size — added assert "files_size" in result[0] to test_load_duct_runs_sanity instead
  • Removed test_ls_files_size_in_output and test_ls_files_size_humanized_in_summaries — added files_size to the _run_ls helper's default fields, then added the assertions to test_ls_sanity (humanized check) and test_ls_json_output (field presence check)

…to existing tests

Agent-Logs-Url: https://github.com/con/duct/sessions/74a1b4cf-2d72-4835-af90-44d1e11414f3

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
@asmacdo
Copy link
Copy Markdown
Member

asmacdo commented Apr 21, 2026

@copilot and @yarikoptic, thanks for the iteration so far. After a claude-augmented review, I request these follow-ups

1. Only compute files_size when it's actually needed

Right now compute_files_size runs inside load_duct_runs for every session on every con-duct ls invocation, even when the user didn't ask for it (e.g. con-duct ls -F command exit_code). That turns a pure-JSON read into glob + stat per session and is a real regression for the PR's own target use case (large log directories, often on NFS/HPC).

Please make it lazy — only compute when the field is requested. Concretely:

  • Skip the call entirely if "files_size" is not in args.fields and args.eval_filter does not reference "files_size" (a simple substring check on the filter string is fine — false positives just mean we compute when we didn't need to, which is the status quo).
  • Thread that decision into load_duct_runs (e.g. an extra parameter, or compute it in process_run_data where fields is already available — whichever fits the existing structure better).
  • Add a test that asserts compute_files_size is not called when the field isn't requested (e.g. unittest.mock.patch on compute_files_size + assert not called).

2. Add a test for the OSError WARNING branch

The lgr.warning("Could not get size of %s: %s", ...) branch was added in response to review feedback but has no test. Please cover it — e.g. unittest.mock.patch("pathlib.Path.stat", side_effect=PermissionError("denied")) plus assert the WARNING was emitted and that compute_files_size still returned a total for the other files.

3. Fix the stale docstring

compute_files_size's docstring still says "Files that cannot be accessed (e.g. due to permissions) are silently skipped." After the WARNING upgrade that's no longer true. Please update it to describe the current behavior and also document that directories and broken symlinks are excluded.

4. MISC

  • rename files_size -> logs_size to read more naturally as a sum across multiple files.
  • Add an assertion in the existing test_compute_files_size_sums_all_files that a subdirectory inside the prefix is not counted, to lock in the is_file() behavior.

Thanks!

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 21, 2026

@copilot and @yarikoptic, thanks for the iteration so far. After a claude-augmented review, I request these follow-ups

1. Only compute files_size when it's actually needed

...

Done in 37c199c:

  1. Lazy computationcompute_logs_size is now only called when "logs_size" is in args.fields or referenced in args.eval_filter. A new compute_size: bool = False parameter was added to load_duct_runs. Two tests cover this: one asserts the function is called when requested (compute_size=True), the other asserts mock_size.assert_not_called() when not requested.

  2. OSError WARNING testtest_compute_logs_size_warns_on_inaccessible_file patches Path.stat to raise PermissionError for a specific file, then asserts the WARNING is emitted and the total only counts the accessible file. Also discovered that Python 3.12's is_file() re-raises PermissionError (EACCES is not in _IGNORED_ERRNOS), so the try/except was moved to wrap is_file() as well.

  3. Docstring updated — now states that directories and broken symlinks are excluded, and that inaccessible files emit a WARNING.

  4. Renamed files_sizelogs_size (including compute_files_sizecompute_logs_size) in ls.py, cli.py, and tests.

  5. Subdirectory assertion added in test_compute_logs_size_sums_all_files — creates a directory matching the prefix and asserts it's not counted in the total.

Copilot AI changed the title Add files_size to con-duct ls output Add logs_size to con-duct ls output Apr 21, 2026
Copilot AI requested a review from asmacdo April 21, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver-minor Increment the minor version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Include files_size into ls -F

3 participants