Skip to content

Add real stdin source#27

Merged
simonsmallchua merged 3 commits into
mainfrom
work/hungry-fermat-533b02
May 9, 2026
Merged

Add real stdin source#27
simonsmallchua merged 3 commits into
mainfrom
work/hungry-fermat-533b02

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented May 9, 2026

Summary

  • Replaces the stdin source stub with a real implementation that yields lines from sys.stdin (or an injected test stream). Supports format / format_keys with the same JSON-only conflict rule as flyctl and file. Intended for piping pre-captured logs into a one-shot paperbark monitor / analyse / search run.
  • A piped stdin is single-use: the first iteration drains it, subsequent iterations yield nothing rather than re-raising. There is intentionally no encoding knob in v0.2 — use the file source if byte-level robustness is needed.
  • Drops the third remaining v0.2 stub (wrangler / kubectl / cloudwatch still pending).

Drive-by

  • README's Plain files row was still labelled "stub (interface only, post-v1)" despite landing in v0.1.4 — fixed in the same source table.
  • docs/ROADMAP.md "Last verified" bumped to today.
  • uv.lock had paperbark 0.1.3 pinned despite three subsequent releases; uv sync reconciled it. The auto-release workflow doesn't refresh uv.lock when bumping pyproject.toml — worth a follow-up to fix at source rather than carrying the drift.

CHANGELOG.md carries an Added entry under [Unreleased] so the changelog-check gate is satisfied; this is a patch bump.

Test plan

  • uv run pytest -q — 419 passed (was 407, +12).
  • uv run ruff check . and uv run ruff format --check . — clean.
  • uv run mypy src/paperbark tests — clean.
  • Live smoke: printf '<JSON line with leading TS>\n' | paperbark monitor --iterations 1 writes a canonical run dir, captures the line, renders analysis.md with the severity rollup, exits 0.
  • Live smoke with format = "apache-combined" and Apache combined log lines: both lines captured via the format-aware cursor; HTTP status probe correctly identified the 200 / 500.

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Summary by CodeRabbit

  • New Features

    • stdin source now reads piped log lines, supports format patterns and JSON key-mapping, and is single-use per run (subsequent reads yield nothing).
  • Documentation

    • Updated changelog, README and docs with quickstart, examples, supported options, one-shot behaviour and intentional lack of encoding option.
  • Tests

    • Added focused tests for stdin behaviour, option validation and one-shot semantics.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: ae3c85fd-eb33-4be3-b609-810694c85c3e

📥 Commits

Reviewing files that changed from the base of the PR and between 79d3b2f and 8157918.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • docs/CONFIG.md
  • docs/SOURCES.md
  • src/paperbark/sources/stdin.py

📝 Walkthrough

Walkthrough

This pull request implements a real stdin source that replaces the previous stub implementation. The StdinSource class now reads lines from sys.stdin with test injection support, validates configuration options (format and format_keys) through the dispatcher, and includes comprehensive documentation and test coverage for the feature.

Changes

Real stdin Source Implementation

Layer / File(s) Summary
Core stdin Implementation
src/paperbark/sources/stdin.py
StdinSource replaces NotImplementedError with a working implementation that yields lines from sys.stdin or an injected stream; adds __init__ accepting optional format_keys, line_format, and stream parameters; capture() discards since and lazily yields lines until EOF.
Dispatcher Configuration
src/paperbark/dispatcher.py
build_source() for stdin now validates allowed options (format, format_keys), resolves preset formats via registry, rejects format combined with format_keys, and constructs StdinSource with derived line_format and format_keys instead of an unconfigured instance.
Source Tests
tests/test_sources.py
New test section covers StdinSource with injected stream yielding, empty stream handling, sys.stdin default, and draining behaviour (second capture after EOF yields nothing); removes StdinSource from stub test coverage.
Dispatcher Tests
tests/test_dispatcher.py
Added stdin-specific tests for default construction, unknown-option rejection, format preset wiring, format_keys parsing, and format+format_keys conflict validation; removes stdin from generic stub test.
Documentation
CHANGELOG.md, README.md, docs/CONFIG.md, docs/SOURCES.md, docs/ROADMAP.md
Updated status tables and CHANGELOG entry to mark stdin as implemented; added dedicated stdin configuration section in CONFIG.md and reference section in SOURCES.md; reorganised stub source sections; updated ROADMAP milestone and v0.2 status tracking.

Possibly Related PRs

  • Good-Native/paperbark#21: Main PR adds a real StdinSource and updates dispatcher to accept/validate format and format_keys and pass a line_format into StdinSource — which directly uses the same format/line_format plumbing introduced in the retrieved PR, so they are related.
  • Good-Native/paperbark#23: Both PRs implement real built-in sources and make parallel, code-level changes to dispatcher.build_source (adding per-type option validation and constructing concrete Source instances), so they are related.
  • Good-Native/paperbark#15: Both PRs modify dispatcher.py’s build_source logic (including parsing/validation of per-source options like format/format_keys) — they touch the same function and related option-handling code, so they are related.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add real stdin source' directly and accurately summarises the primary change: replacing the stdin source stub with a functional implementation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Release Preview

patch: v0.1.6v0.1.7

Changelog

Added

  • The stdin source is now a real implementation: capture() yields
    lines from sys.stdin rather than raising NotImplementedError.
    Supports format / format_keys with the same conflict rules as
    flyctl and file. Intended for piping pre-captured logs into a
    one-shot paperbark monitor run, e.g. cat app.log | paperbark monitor --iterations 1 (analyse and search read existing run
    artefacts and never consume stdin). A piped stdin is single-use:
    the first iteration drains it, subsequent iterations yield nothing.
    There is intentionally no encoding knob — use the file source
    if you need byte-level robustness or a custom encoding. See
    docs/SOURCES.md and docs/CONFIG.md for the matrix.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/CONFIG.md`:
- Around line 292-294: Add a shell language tag to the stdin usage code fence
that currently contains "cat my.log | paperbark analyse --run latest" so the
fenced block is marked as shell/sh; locate the fenced block with that exact
command in CONFIG.md and change the opening triple backticks to include the
language identifier (e.g., ```sh) to satisfy markdown linting.

In `@docs/ROADMAP.md`:
- Line 8: Update the "**Last verified:**" entry in docs/ROADMAP.md from
2026-05-10 to 2026-05-09 so the recorded verification date is not in the future;
locate the bold "Last verified:" text and replace the date accordingly.

In `@docs/SOURCES.md`:
- Around line 159-161: The Markdown fenced code block containing the pipeline
command "cat app.log | paperbark monitor --iterations 1" is missing a language
tag; update that fence to include a shell language identifier (e.g., change the
opening "```" to "```sh") so the block becomes a shell snippet and satisfies
markdownlint MD040.

In `@README.md`:
- Around line 69-70: Update the README Quickstart section to include a short
non-Fly example using the newly-implemented stdin (or file) source so the docs
match the source table change; locate the "Quickstart" heading in README.md and
add one concise snippet demonstrating how to run the project with stdin (or a
file) as the data source (showing the minimal install/run commands and the
stdin/file invocation), and keep it parallel to the existing Fly quickstart
wording so both Fly and a non-Fly path are present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1574135b-a4fb-47fe-b88f-a73f5f2443eb

📥 Commits

Reviewing files that changed from the base of the PR and between b57b1fb and 92c45c8.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • CHANGELOG.md
  • README.md
  • docs/CONFIG.md
  • docs/ROADMAP.md
  • docs/SOURCES.md
  • src/paperbark/dispatcher.py
  • src/paperbark/sources/stdin.py
  • tests/test_dispatcher.py
  • tests/test_sources.py

Comment thread docs/CONFIG.md Outdated
Comment thread docs/ROADMAP.md
## Current state

- **Last verified:** 2026-05-05
- **Last verified:** 2026-05-10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Last verified is set to a future date.

Line 8 says 2026-05-10, but today is 2026-05-09. Please set it to 2026-05-09 now, or only update to 2026-05-10 after that date is actually verified.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ROADMAP.md` at line 8, Update the "**Last verified:**" entry in
docs/ROADMAP.md from 2026-05-10 to 2026-05-09 so the recorded verification date
is not in the future; locate the bold "Last verified:" text and replace the date
accordingly.

Comment thread docs/SOURCES.md Outdated
Comment thread README.md
@simonsmallchua
Copy link
Copy Markdown
Contributor Author

Pushed 79d3b2f addressing the CodeRabbit findings:

  • docs/CONFIG.md:292 — added sh language tag to the stdin code fence.
  • docs/SOURCES.md:159 — same MD040 fix.
  • README.md — added a stdin quickstart snippet after the Fly block (with a one-liner pointer to the file source as the on-disk equivalent), per the README.md maintenance standard in CLAUDE.md. Smoke-tested the minimal [[sources]] config locally — cat app.log | paperbark monitor --iterations 1 produces a clean run dir with the default [paperbark] / [monitor] settings.
  • ⏭️ docs/ROADMAP.md:8 — skipped. The Last verified: 2026-05-10 date is correct as of the commit time; the bot's clock appears to be a day behind.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/CONFIG.md`:
- Around line 292-294: The example pipes stdin into the wrong CLI command:
`paperbark analyse` does not consume source input, so replace the piped command
usage with the capture flow using `paperbark monitor` (e.g., change `cat my.log
| paperbark analyse --run latest` to use `cat my.log | paperbark monitor
--iterations 1`) so stdin is actually captured; update the example in CONFIG.md
to reference `paperbark monitor` and the `--iterations` flag accordingly.

In `@docs/SOURCES.md`:
- Around line 155-157: Update the documentation text that currently states
"Reads lines from `sys.stdin` and yields them" to limit stdin usage to the
capture step: clarify that `sys.stdin` input is consumed only by `paperbark
monitor` (or the monitor capture mode) for one-shot piping of pre-captured logs,
and that `analyse` and `search` operate on existing run artifacts rather than
reading from `sys.stdin`; keep references to `paperbark monitor`, `analyse`,
`search`, and `sys.stdin` so readers can locate the relevant sentences to edit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 12900ad1-1838-4238-a0f7-9b35173e4b5b

📥 Commits

Reviewing files that changed from the base of the PR and between 92c45c8 and 79d3b2f.

📒 Files selected for processing (3)
  • README.md
  • docs/CONFIG.md
  • docs/SOURCES.md

Comment thread docs/CONFIG.md
Comment thread docs/SOURCES.md Outdated
@simonsmallchua
Copy link
Copy Markdown
Contributor Author

Pushed 8157918 — both findings valid, fixed at the root:

  • docs/CONFIG.md:294 — example now uses paperbark monitor --iterations 1 (was incorrectly paperbark analyse --run latest).
  • docs/SOURCES.md:155-157 — wording now scopes stdin capture to monitor and explicitly notes that analyse / search never consume stdin.
  • ➕ Same misleading wording also appeared in CHANGELOG.md (the Added entry) and src/paperbark/sources/stdin.py (module docstring) — fixed both for consistency.

@simonsmallchua simonsmallchua merged commit 187ad8f into main May 9, 2026
6 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request May 10, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant