Skip to content

Show expected vs actual worldview content in read eval reports#12

Merged
tcdent merged 2 commits into
mainfrom
claude/github-auth-setup-LdWTm
Jan 19, 2026
Merged

Show expected vs actual worldview content in read eval reports#12
tcdent merged 2 commits into
mainfrom
claude/github-auth-setup-LdWTm

Conversation

@tcdent

@tcdent tcdent commented Jan 19, 2026

Copy link
Copy Markdown
Owner
  • Add generated_worldview_content field to EvalResult to capture CLI output
  • Update report to display both expected (predefined) and actual (CLI-generated)
    worldview content when CLI mode is used
  • Include generated content in JSON results for analysis
  • Update agent TASK_INSTRUCTIONS to encode only explicitly stated facts,
    avoiding supplementary knowledge from training data for token efficiency

- Add generated_worldview_content field to EvalResult to capture CLI output
- Update report to display both expected (predefined) and actual (CLI-generated)
  worldview content when CLI mode is used
- Include generated content in JSON results for analysis
- Update agent TASK_INSTRUCTIONS to encode only explicitly stated facts,
  avoiding supplementary knowledge from training data for token efficiency
- Create evals/common/ for shared code (config, llm_clients)
- Create evals/read_eval/ for read evaluation (tests LLM response to context)
- Keep evals/write_eval/ for write evaluation (tests document generation)
- Update all imports to use new module paths
- Remove expected worldview content from report (only show CLI-generated)
- Add proper __init__.py exports for all submodules

This structure prepares for adding additional evaluation types in the future.
@tcdent tcdent merged commit 10dff82 into main Jan 19, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants