Skip to content

feat(codex): include archived sessions#1176

Open
yashau wants to merge 1 commit into
ryoppippi:mainfrom
yashau:codex/include-archived-sessions
Open

feat(codex): include archived sessions#1176
yashau wants to merge 1 commit into
ryoppippi:mainfrom
yashau:codex/include-archived-sessions

Conversation

@yashau
Copy link
Copy Markdown

@yashau yashau commented May 28, 2026

Summary

Codex usage discovery now includes archived sessions by default for Codex homes. When CODEX_HOME points at a Codex home such as ~/.codex, the Rust adapter reads both sessions/ and archived_sessions/ so focused and unified Codex reports include archived conversation history.

This ports the behavior proposed in #849 to the current Rust implementation after the TypeScript adapter was retired.

What Changed

  • Added Codex home discovery for both sessions/ and archived_sessions/.
  • Kept direct JSONL directory behavior unchanged for saved codex exec --json output.
  • Added shared Codex file collection helpers so event loading and aggregate streaming use the same ordered file list.
  • Deduplicated active and archived files by relative JSONL path before parsing, so if a file exists in both directories the active sessions/ copy wins.
  • Updated the Codex guide and adapter source notes to document archived session coverage and de-duplication behavior.

Why

Archived Codex sessions are still part of a user's local usage history, but the Rust adapter only read active sessions. That undercounted token and cost totals for users with archived conversations. This change makes the default behavior match what users expect from ccusage codex, while preserving explicit/direct JSONL-directory support.

Implementation Notes

The key decision is to de-duplicate at the file discovery layer by relative JSONL path before parsing. That prevents double counting a session copied from sessions/ to archived_sessions/, while still allowing distinct nested paths with the same basename to be counted separately.

The aggregate path matters because normal table output can stream/aggregate without first building a full event vector. This PR updates that path as well as the event loader so --json, table output, and focused reports all see the same file set.

Testing

  • cargo test --manifest-path rust/Cargo.toml --workspace codex -- --nocapture
  • cargo check --manifest-path rust/Cargo.toml --workspace
  • git diff --check
  • Local smoke test against a real Codex home with 18 active JSONL files and 33 archived JSONL files:
    • active-only: 289,650,416 total tokens, $214.107875 estimated cost
    • default active + archived: 1,665,817,444 total tokens, $1,250.900924 estimated cost

I also ran cargo test --manifest-path rust/Cargo.toml --workspace. It reached 191 passing tests and failed two existing timezone-sensitive tests unrelated to this Codex change on this Windows machine:

  • commands::tests::builds_statusline_today_filter_from_timezone
  • tests::formats_dates_with_timezone

Both failures reproduce individually and show named timezone conversion behaving like UTC locally.

AI-assisted: This code was written with assistance from AI.


Summary by cubic

Include archived Codex sessions by default to fix undercounted usage and align ccusage reports with real totals. Direct JSONL directories still work the same.

  • New Features
    • Read from sessions/ and archived_sessions/ when CODEX_HOME points to a Codex home.
    • De-duplicate by relative JSONL path; the active sessions/ copy wins if both exist.
    • Keep saved codex exec --json directories unchanged.
    • Use shared file discovery in both the event loader and aggregate streaming for consistent results.

Written for commit d29832d. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • New Features

    • CODEX_HOME accepts multiple comma-separated roots
    • Automatically discovers both sessions/ and archived_sessions/ (falls back to JSONL dirs when subfolders are absent)
    • Cross-directory deduplication; active sessions take precedence over archived to avoid double-counting
  • Documentation

    • Updated CODEX_HOME docs and environment-variable table to reflect multi-root and directory discovery behavior
  • Tests

    • Added tests validating deduplication across active and archived session locations

Review Change Stack

Load Codex usage from both sessions and archived_sessions when a CODEX_HOME entry is detected as a Codex home. Direct JSONL directories still load as before for saved codex exec output.

Deduplicate active and archived files by relative JSONL path before parsing so copied archived sessions do not double count. The aggregate streaming path and event loader now share the same file discovery behavior.

Update Codex docs to describe archived session coverage and active-session precedence.
@yashau
Copy link
Copy Markdown
Author

yashau commented May 28, 2026

@coderabbitai @cubic-dev-ai please review this PR.

@github-actions
Copy link
Copy Markdown
Contributor

This PR was auto-closed. Only contributors approved with lgtm can open PRs. Open an issue first.

Maintainers review auto-closed issues and reopen worthwhile ones. Issues that do not meet the quality bar in CONTRIBUTING.md may not be reopened or receive a reply.

If a maintainer replies lgtmi, your future issues will stay open. If a maintainer replies lgtm, your future issues and PRs will stay open.

See CONTRIBUTING.md.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a17af7c-ed56-4b32-89a7-ec270ec1f439

📥 Commits

Reviewing files that changed from the base of the PR and between 284899c and d29832d.

📒 Files selected for processing (5)
  • docs/guide/codex/index.md
  • rust/crates/ccusage/src/adapter/codex/README.md
  • rust/crates/ccusage/src/adapter/codex/aggregate.rs
  • rust/crates/ccusage/src/adapter/codex/loader.rs
  • rust/crates/ccusage/src/adapter/codex/paths.rs

📝 Walkthrough

Walkthrough

Extends Codex discovery and loading to support multiple CODEX_HOME roots and both sessions/ and archived_sessions/, collecting usage files across directories and deduplicating by relative session path so active sessions/ entries take precedence.

Changes

Codex Multi-Directory Session Loading

Layer / File(s) Summary
Documentation updates
docs/guide/codex/index.md, rust/crates/ccusage/src/adapter/codex/README.md
User guide and adapter README now document comma-separated CODEX_HOME roots, discovery of sessions/ and archived_sessions/, direct JSONL root handling, and deduplication precedence favoring sessions/ over archived_sessions/.
Path discovery and file collection
rust/crates/ccusage/src/adapter/codex/paths.rs
codex_usage_paths_from_homes detects sessions/ and archived_sessions/; adds collect_codex_usage_files and collect_deduped_codex_usage_files to gather and dedupe files by relative session path. Unit tests validate discovery and fallback behavior.
Event loading from multiple directories
rust/crates/ccusage/src/adapter/codex/loader.rs
Adds load_codex_events_from_directories to read events from one or many session directories (serial or parallel reads per single_thread) and perform a final cross-directory dedupe_codex_events. load_codex_events_inner delegates to the new loader; tests assert archived/active dedupe.
Group aggregation across directories
rust/crates/ccusage/src/adapter/codex/aggregate.rs
Adds load_groups_from_directories to aggregate deduped file groups across directories using shared dedupe shards and serial/parallel processing; load_groups and per-directory aggregator updated to use Codex-specific file collection. Tests verify per-day token totals with duplicates present.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

enhancement

🐰 I hopped through sessions, old and new,
Roots joined and duplicates few;
Active paths win, archives stay mild —
One JSONL story, tidy and styled. ✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot closed this May 28, 2026
@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 28, 2026

@coderabbitai @cubic-dev-ai please review this PR.

@yashau I have started the AI code review. It will take a few minutes to complete.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

@yashau Sure! I'll kick off a full review of this PR right away.

✅ Actions performed

Full review triggered.

@yashau
Copy link
Copy Markdown
Author

yashau commented May 28, 2026

Opened the requested contribution issue first: #1177. This closed PR can serve as the ready implementation reference if maintainers approve/reopen.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="rust/crates/ccusage/src/adapter/codex/paths.rs">

<violation number="1" location="rust/crates/ccusage/src/adapter/codex/paths.rs:58">
P1: Deduplication key is global across all CODEX_HOME roots, causing valid files from different homes to be incorrectly dropped when they share the same relative path.</violation>

<violation number="2" location="rust/crates/ccusage/src/adapter/codex/paths.rs:72">
P2: Non-UTF-8 path components are silently discarded in deduplication keys, which can cause false duplicate detection and undercounted usage.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

pub(super) fn collect_deduped_codex_usage_files(
sessions_dirs: &[PathBuf],
) -> Vec<(PathBuf, Vec<PathBuf>)> {
let mut seen = FxHashSet::default();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Deduplication key is global across all CODEX_HOME roots, causing valid files from different homes to be incorrectly dropped when they share the same relative path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At rust/crates/ccusage/src/adapter/codex/paths.rs, line 58:

<comment>Deduplication key is global across all CODEX_HOME roots, causing valid files from different homes to be incorrectly dropped when they share the same relative path.</comment>

<file context>
@@ -31,3 +44,72 @@ pub(super) fn codex_home_paths() -> Result<Vec<PathBuf>> {
+pub(super) fn collect_deduped_codex_usage_files(
+    sessions_dirs: &[PathBuf],
+) -> Vec<(PathBuf, Vec<PathBuf>)> {
+    let mut seen = FxHashSet::default();
+    let mut grouped_files = Vec::new();
+    for sessions_dir in sessions_dirs {
</file context>

grouped_files
}

fn codex_relative_session_path(sessions_dir: &Path, path: &Path) -> String {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Non-UTF-8 path components are silently discarded in deduplication keys, which can cause false duplicate detection and undercounted usage.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At rust/crates/ccusage/src/adapter/codex/paths.rs, line 72:

<comment>Non-UTF-8 path components are silently discarded in deduplication keys, which can cause false duplicate detection and undercounted usage.</comment>

<file context>
@@ -31,3 +44,72 @@ pub(super) fn codex_home_paths() -> Result<Vec<PathBuf>> {
+    grouped_files
+}
+
+fn codex_relative_session_path(sessions_dir: &Path, path: &Path) -> String {
+    path.strip_prefix(sessions_dir)
+        .unwrap_or(path)
</file context>

@ryoppippi ryoppippi reopened this May 29, 2026
@ryoppippi
Copy link
Copy Markdown
Owner

@yashau looks reasonable. thank you for this pr.
i'll take a look in a couple of days!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants