Skip to content

Next#495

Merged
mohanagy merged 63 commits into
mainfrom
next
Jun 3, 2026
Merged

Next#495
mohanagy merged 63 commits into
mainfrom
next

Conversation

@mohanagy
Copy link
Copy Markdown
Owner

@mohanagy mohanagy commented Jun 3, 2026

Summary

Testing

  • npm run test:run
  • npm run typecheck
  • npm run build
  • npm pack --dry-run (if packaging or install behavior changed)

Checklist

  • I updated docs for any user-visible change
  • I added or updated tests when behavior changed
  • This PR does not include private corpora, secrets, credentials, proprietary prompts, sensitive raw logs, or accidental generated artifacts
  • I kept this PR focused on a single change or tightly related set of changes

Related issues

Summary by CodeRabbit

Release Notes

  • New Features

    • Added madar try command for instant one-command local proof generation
    • Graph freshness status reporting with --require-fresh-graph and --require-fresh-context enforcement flags
    • Telemetry clear and report commands for local spool management
    • Design partner receipt workflow for creating reproducible, share-safe evaluations
    • Benchmark suite expanded with Python and Go service fixtures
  • Documentation

    • New agent quickstarts guide, launch checklist, and design-partner workflow documentation
    • CLI/MCP reference expanded with freshness contract details and telemetry updates
  • Bug Fixes

    • Windows prompt hook generation now uses generated scripts instead of inline commands
  • Chores

    • Version bumped to 0.27.9-next.3
    • Repository metadata and MCP registry updated

mohanagy and others added 30 commits June 2, 2026 16:50
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expose overall and selected-context freshness across pack, prompt, handoff, doctor/status, and MCP surfaces; add the scoped strict flag; refresh cached context_pack freshness; and document the new contract.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address the current PR blockers by making prompt graph strictness fail fast, aligning handoff types, hardening freshness selection, making freshness assertions Windows-safe, and applying the small CodeRabbit cleanup fixes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement scoped graph freshness surfaces
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
docs: clarify legacy graphify-ts rename guidance
Closes #470

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
feat: add public benchmark language fixtures
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ck-loop

feat: add design partner reporting loop
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mohanagy and others added 14 commits June 3, 2026 08:48
chore: prepare 0.27.9-next.0 release
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Windows cmd.exe has a hard limit of 8,191 characters for command lines.
The previous implementation generated a 11,801+ character command that
exceeded this limit, causing 'unexpected eof while looking for matching'
errors on Windows.

This commit aggressively minifies the hook's JavaScript source code by:
- Using single-letter variable names
- Removing whitespace and comments
- Condensing function definitions
- Using terse property access patterns
- Simplifying error handling

The resulting command is now 8,049 characters (142 chars under the limit),
allowing the hook to execute successfully on Windows systems.

All 2,432 tests pass, confirming functionality is preserved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The debug flag regex had a malformed escape sequence. Changed from
/^(1|true|yes)$/i to /^(1|true|yes)$/i using template substitution
to properly match the end-of-string anchor.
fix: reduce hook command to fit Windows cmd.exe 8191 char limit
Updates:
- CHANGELOG.md: Add entry for Windows cmd.exe length limit fix
- README.md: Update highlights and version references
- package.json/package-lock.json: Bump to 0.27.9-next.2
- docs/mcp-registry/server.json: Sync registry metadata

This release fixes the Windows cmd.exe command-line length limit issue by
aggressively minifying the hook source code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chore: prepare 0.27.9-next.2 release
Move the Claude UserPromptSubmit hook out of an inline node -e command
and into a generated project-local .claude/madar-user-prompt-submit.cjs
script so Windows shell wrapping cannot truncate the command.

Also tighten Claude uninstall and native-agent install verification to
track the generated hook script lifecycle.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix: replace Claude prompt hook with generated script
Document the generated Claude prompt hook script fix, bump the package
version to 0.27.9-next.3, and sync the MCP registry metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chore: prepare 0.27.9-next.3 release
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Warning

Review limit reached

@mohanagy, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 10 minutes and 17 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b82d662f-6d1c-4dfe-9231-fe7f8d0e7428

📥 Commits

Reviewing files that changed from the base of the PR and between 38d152d and 1f2bb15.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (17)
  • CHANGELOG.md
  • README.md
  • docs/mcp-registry/server.json
  • docs/reference/cli-and-mcp.md
  • docs/tutorials/getting-started.md
  • package.json
  • src/infrastructure/benchmark/runner.ts
  • src/infrastructure/benchmark/suite.ts
  • src/infrastructure/compare.ts
  • src/infrastructure/review-compare.ts
  • src/shared/shell.ts
  • tests/unit/benchmark-suite.test.ts
  • tests/unit/benchmark.test.ts
  • tests/unit/compare-native-agent.test.ts
  • tests/unit/compare.test.ts
  • tests/unit/implementation-pack.test.ts
  • tests/unit/shell.test.ts
📝 Walkthrough

Walkthrough

Adds graph-context freshness analysis and enforcement across CLI/MCP surfaces, migrates telemetry to a v2 command/stage schema with clear/report, introduces a “madar try” flow, switches Claude hooks to a generated script, expands benchmarks with Go/Python fixtures, and updates extensive docs/tests and metadata.

Changes

Core runtime, CLI, MCP, and installer changes

Layer / File(s) Summary
Graph-context freshness model and enforcement
src/runtime/freshness.ts, src/infrastructure/context-*.ts, src/runtime/stdio/*, src/contracts/context-pack.ts, src/cli/parser.ts
Adds analyzer/types, surfaces freshness in pack/prompt/handoff/doctor, enforces require_fresh_* flags, and updates schemas/renderers.
Telemetry v2 schema and CLI/server wiring
src/shared/telemetry.ts, src/cli/main.ts, src/runtime/stdio-server.ts
Moves to v2 command/stage spool, adds clear/report, emits stage/buckets from CLI and MCP server.
New 'madar try' command and parser wiring
src/cli/parser.ts, src/cli/main.ts, src/infrastructure/try-command.ts
Implements try flow (reuse/rebuild/fallback) with install recommendation and help text.
Claude hook script generation and compare/install checks
src/infrastructure/install.ts, src/runtime/task-applicability.ts, src/infrastructure/compare.ts
Generates local hook script, updates install/uninstall and compare validation, and payload handling.
Graph build freshness metadata and git helpers
src/shared/git.ts, src/shared/graph-build-freshness.ts, src/infrastructure/generate.ts, src/pipeline/*
Adds git snapshot/diff and graph_build_freshness; exports/imports metadata during build/export.

Benchmarks, docs, metadata, and package updates

Layer / File(s) Summary
Benchmark fixtures: Go and Python login session services
docs/benchmarks/suite/fixtures/*, docs/benchmarks/suite/*
Adds Go/Python fixtures, updates suite docs/manifests/tasks to “ready”.
Docs: freshness contract, CLI/MCP reference, quickstarts, getting started
docs/concepts/context-packs.md, docs/reference/cli-and-mcp.md, docs/tutorials/*
Documents freshness receipt/enforcement, strict profile, quickstarts, and try-first flow.
Docs: design partners, launch checklist, release/roadmap/claims
docs/design-partners.md, docs/launch-checklist.md, docs/release.md, docs/roadmap.md, docs/claims-and-evidence.md
Adds design-partner workflow, launch checklist, release steps, roadmap, and positioning.
Meta/README/Changelog/MCP registry/package
README.md, CHANGELOG.md, docs/mcp-registry/server.json, .github/repo-metadata.json, package.json
Updates positioning, next-track notes, registry metadata, repo description/topics, and version/keywords.

Test updates aligned to features

Layer / File(s) Summary
Freshness surfaces across CLI/MCP/doctor/status and enforcement
tests/unit/freshness-surfaces.test.ts, tests/unit/context-prompt-command.test.ts
Validates statuses, drift, strict guards, and consistent outputs.
CLI parser/dispatch, help, try, and flag passthrough
tests/unit/cli.test.ts
Covers try routing, telemetry args, help, and require-fresh passthrough.
Telemetry v2 spool, clear/report, CLI/server emission
tests/unit/telemetry*.test.ts, tests/unit/stdio-server.test.ts
Asserts v2 schema, clear/report, funnels, and stage/buckets.
Hook generation/compare integration and uninstall behavior
tests/unit/install*.test.ts, tests/unit/compare-native-agent.test.ts
Validates generated script install/uninstall and compare checks.
Benchmark suite docs/manifests and existence checks
tests/unit/benchmark-suite*.test.ts
Ensures fixtures are ready and prompts wired with files present.
Docs contract tests (README, tutorials, reference, telemetry, roadmap, design partners, launch)
tests/unit/*docs*.test.ts, tests/unit/design-partners-doc.test.ts
Checks wording/order/links across updated docs.
Generate unsupported corpus typed errors and helpers
tests/unit/generate.test.ts, tests/unit/graph-build-freshness.test.ts
Confirms typed errors and helper determinism.
Answer-ready and handoff governance freshness fixtures
tests/unit/*pack*.test.ts, tests/unit/handoff-command.test.ts
Expands fixtures with new governance.graph_freshness fields.
Native-agent compare timeout and Windows script missing check
tests/unit/compare-native-agent.test.ts
Adjusts timeout and verifies missing script invalidates state.

Sequence Diagram(s)

(skipped)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • mohanagy/madar#478 — Introduces the same graph-context freshness enforcement and receipt surfaces used here across CLI/MCP.
  • mohanagy/madar#484 — Refactors telemetry to the command/stage v2 model with CLI wiring similar to this PR’s changes.
  • mohanagy/madar#493 — Switches Claude UserPromptSubmit to a generated local script and adjusts installer/validator code paths.

Poem

A rabbit compiles a proof so spry,
Fresh graphs blink green beneath the sky.
Hooks hop to scripts, no shells to fear,
Telemetry hums in stages clear.
Go and Python join the run,
“madar try” — one hop, you’re done. 🐇✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch next

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/infrastructure/context-pack-command.ts (1)

2096-2120: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Compute selected-context freshness for task=impact before returning.

This branch exits before the post-retrieval freshness pass, so --require-fresh-context is silently ignored for impact, and the emitted governance receipt reports whole-graph freshness instead of impact-specific freshness. Either reject the option here like review does, or reuse the analyzeGraphContextFreshness(...selected_source_files...) path before building the response.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/infrastructure/context-pack-command.ts` around lines 2096 - 2120, The
impact branch currently returns before computing selected-source freshness so
--require-fresh-context is ignored and graphFreshness uses
initialGraphFreshness; update the branch in context-pack-command (the block
using dependencies.retrieveContext, pickImpactTarget,
dependencies.analyzeImpact, dependencies.compactImpactResult) to either reject
options.retrievalLevel/retrievalStrategy like the review path does or call
analyzeGraphContextFreshness (or the existing analyzeGraphContextFreshness-like
utility) on the selected_source_files from the retrieval result and use that
value when building baseResponse/buildPackSchemaV1 (replace
initialGraphFreshness with the computed selected-context freshness); keep
existing inclusion of routing debug via buildRoutingDebug(retrieval) and ensure
impactMetadata still receives the same args.
🧹 Nitpick comments (4)
tests/unit/implementation-pack.test.ts (1)

636-757: ⚡ Quick win

Add a negative-intent regression test for plain “server” wording.

The new test validates the positive path well. Please add one case where the prompt mentions "server" but not "server action" to ensure server-action preference is not applied unintentionally.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/implementation-pack.test.ts` around lines 636 - 757, Add a
negative-intent regression test that mirrors the existing positive case but uses
a retrieval.question that contains the word "server" (e.g., "move dashboard
owner filter persistence into the server and keep the client component
presentational") rather than the phrase "server action"; reuse
buildServerActionPreferenceGraph() to create the graph and call
buildImplementationPackGuidance(graph, retrieval, { budget: 2200, taskIntent:
'implement', limit: 4 }); then assert that guidance.workflow_centers[0].path is
NOT 'app/dashboard/actions.ts' (or that likely_edit_files[0] does not point to
actions.ts) and that the cautions include treating app/dashboard/page.tsx as
primary for runtime-boundary prompts to ensure the server-action preference is
not applied.
docs/benchmarks/suite/fixtures/go-service/internal/service/session_service_test.go (1)

19-21: ⚡ Quick win

Assert recorded audit payload, not only count.

Right now this only proves one event exists. Add a value assertion so formatting/content regressions are caught.

Proposed test enhancement
 	if len(loginAudit.Events) != 1 {
 		t.Fatalf("expected 1 audit event, got %d", len(loginAudit.Events))
 	}
+	if loginAudit.Events[0] != "u-123:"+session.ID {
+		t.Fatalf("expected audit event for user and session id, got %q", loginAudit.Events[0])
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@docs/benchmarks/suite/fixtures/go-service/internal/service/session_service_test.go`
around lines 19 - 21, The test currently only checks the count of
loginAudit.Events; add an assertion that the recorded event's payload/fields
match expected values to catch formatting/content regressions. After the
existing len(loginAudit.Events) check, inspect loginAudit.Events[0] (or the
single event) and assert specific fields (e.g. event.Type/Event.Action,
event.UserID, event.Timestamp format, or event.Payload JSON string) against
known expected values or a canonical expected struct; use t.Fatalf or t.Errorf
to fail with a clear message on mismatch. Ensure you reference the same
loginAudit and Events slice used in the test and compare only stable fields (or
normalize timestamps) to avoid flaky failures.
docs/benchmarks/suite/fixtures/python-service/app/routes/session_routes.py (1)

24-25: ⚡ Quick win

Avoid duplicating the audit event literal.

The route hardcodes "login_session_created" independently of AuditLog.record_login_session(). If that string changes in one place, the API response will drift from the event actually recorded. Please source this from a shared constant or the service result.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/benchmarks/suite/fixtures/python-service/app/routes/session_routes.py`
around lines 24 - 25, The route in session_routes.py duplicates the
"login_session_created" audit-event literal; change it to use the canonical
source (either a shared constant on AuditLog, e.g.
AuditLog.LOGIN_SESSION_CREATED, or the audit/event field returned by
service.create_login_session) so LoginSessionResponse(...) uses that shared
value instead of a hardcoded string; update the call site around
service.create_login_session and LoginSessionResponse to pull the audit event
from the service result or AuditLog constant and remove the duplicated literal.
src/infrastructure/context-prompt-command.ts (1)

74-86: ⚡ Quick win

Avoid double-running freshness analysis on the default prompt path.

analyzeGraphContextFreshness() runs once before retrieval and again after retrieval, but the first result is only used when requireFreshGraph is enabled. That means every normal madar prompt call pays for an extra repo freshness scan with no user-visible benefit.

♻️ Proposed fix
   const graph = dependencies.loadGraph(options.graphPath)
-  const initialGraphFreshness = analyzeGraphContextFreshness(options.graphPath, graph)
   if (options.requireFreshGraph === true) {
+    const initialGraphFreshness = analyzeGraphContextFreshness(options.graphPath, graph)
     requireFreshGraph(initialGraphFreshness)
   }
   const retrieval = dependencies.retrieveContext(graph, {
     question: options.prompt,
     budget: DEFAULT_PROMPT_BUDGET,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/infrastructure/context-prompt-command.ts` around lines 74 - 86,
analyzeGraphContextFreshness is being called unconditionally before
retrieveContext, causing an unnecessary extra repo scan; only run it when
required by the requireFreshGraph flag. Change the flow so you call
analyzeGraphContextFreshness(...) and pass its result into
requireFreshGraph(...) only inside an if (options.requireFreshGraph === true)
block, keep the retrieval via retrieveContext(...) as-is, and continue to
compute graphFreshness after retrieval using
selectedContextSourceFilesFromRetrieveResult(retrieval) for
requireFreshSelectedContext(...); reference analyzeGraphContextFreshness,
requireFreshGraph, retrieveContext,
selectedContextSourceFilesFromRetrieveResult, and requireFreshSelectedContext to
locate the affected code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@docs/benchmarks/suite/fixtures/go-service/internal/service/session_service.go`:
- Around line 13-17: NewSessionService currently permits nil repository or audit
which leads to panics later in CreateLoginSession; update NewSessionService to
fail fast by validating its inputs (repository *repository.SessionRepository and
audit *audit.LoginAudit) and refusing to construct an invalid
SessionService—either return an error from NewSessionService (change signature
to (*SessionService, error)) or explicitly panic with a clear message if either
dependency is nil; ensure callers of NewSessionService are updated to handle the
new error or accept the panic and include descriptive messages identifying the
missing dependency.

In `@docs/benchmarks/suite/fixtures/python-service/app/routes/session_routes.py`:
- Around line 9-11: LoginSessionRequest currently allows empty strings for
user_id and device_id; update the model (LoginSessionRequest) to validate that
both fields are non-empty (e.g., use pydantic validators or constrained types
like conint/constr with min_length=1, or add `@validator` methods for user_id and
device_id) and raise a clear validation error so callers are rejected at the API
boundary before invoking the session creation service.

In
`@docs/benchmarks/suite/fixtures/python-service/app/services/session_service.py`:
- Around line 16-21: get_session_service currently reuses the process-wide
_default_audit_log causing unbounded event accumulation; change it to
instantiate a fresh AuditLog for each call (or swap the default sink to a
stateless implementation) so SessionService is created with a per-request audit
log. Update get_session_service to construct a new AuditLog() and pass it to
SessionService(repository=_default_repository, audit_log=new_audit_log) instead
of using _default_audit_log; you can keep _default_repository but do not reuse
_default_audit_log.

In `@docs/integrations/compatibility.md`:
- Line 5: Replace the branch-pinned GitHub URL in the sentence pointing to the
agent quickstarts with a repo-relative Markdown link so it doesn't hardcode
"next"; update the link target from
"https://github.com/mohanagy/madar/blob/next/docs/tutorials/agent-quickstarts.md"
to a relative path like "docs/tutorials/agent-quickstarts.md" while keeping the
link text "[agent quickstarts]" unchanged to preserve readability and
portability.

In `@docs/tutorials/agent-quickstarts.md`:
- Line 5: The link in the agent-quickstarts.md content uses a branch-pinned
GitHub URL
("https://github.com/.../blob/next/docs/tutorials/getting-started.md") which can
break after merges; replace that absolute branch-specific URL with a relative
docs link (e.g., "../getting-started.md" or "./getting-started.md") for the
"[getting started tutorial]" link so it resolves correctly across branches,
verify the Markdown link text remains unchanged and test local navigation in the
docs site after updating.

In `@src/cli/main.ts`:
- Around line 907-914: The 'try' branch in main.ts exits before emitting any
telemetry so the trial/onboarding flow never appears in reports; wrap the
parseTryArgs -> dependencies.runTry call to emit telemetry.started before
running, telemetry.succeeded after a successful run (include outcome/option
metadata and whether output was produced), and telemetry.failed in the catch
block (include the error) and then re-use the existing return codes. Use the
same telemetry API available on the dependencies object (e.g.
dependencies.telemetry.track or similar), reference the existing symbols command
=== 'try', parseTryArgs, dependencies.runTry and io.log, and ensure you still
log output and return 0 on success and non-zero on failure.

In `@src/infrastructure/compare.ts`:
- Around line 2970-2981: The current Windows cleanup treats any
spawnSync('taskkill', ...) invocation as success because it returns immediately
on no throw; change it to inspect the spawnSync result (const res =
spawnSync(...)) and only return when res.status === 0 (or res.signal indicates
success); if res.status is non-zero or res.error exists, fall through to call
child.kill() to ensure the timed-out child is terminated; keep the existing
catch block to call child.kill() on spawn errors. Ensure references:
useProcessGroup, child.pid, spawnSync('taskkill', ...), and child.kill().

In `@src/infrastructure/doctor.ts`:
- Around line 365-368: computeNextCommands() only adds the regenerate hint for
'possibly_stale' and 'stale', but
buildDoctorReport()/analyzeGraphContextFreshness() treat any non-'fresh'
(including 'partially_stale') as unhealthy, so the doctor can report attention
needed without a repair suggestion; update the conditional that checks
report.graph.freshness in computeNextCommands() to include 'partially_stale'
alongside 'possibly_stale' and 'stale' so nextCommands.add('madar generate .
--update') is also added when report.graph.freshness === 'partially_stale'.

In `@src/infrastructure/generate.ts`:
- Around line 425-430: The error branches currently pass detected.total_files
which misclassifies repos with only unsupported files; change the logic to use
detected.extractableFiles: when !graph throw
missingCodeExtractionError(detected.extractableFiles) (since no extractable
files produced a graph), and in the options.clusterOnly && graph.numberOfNodes()
=== 0 branch inspect detected.extractableFiles — if it is 0 throw
missingCodeExtractionError(detected.extractableFiles) (NO_SUPPORTED_FILES),
otherwise throw missingCodeExtractionError(detected.total_files) to indicate
supported inputs existed but produced NO_GRAPH_NODES. Ensure you update the
checks around graph, options.clusterOnly, and graph.numberOfNodes() accordingly.

In `@src/pipeline/build.ts`:
- Around line 120-122: Currently the code accepts any object because it guards
with isRecord; change the guard to validate the shape with
isGraphBuildFreshnessMetadata and only assign when that returns true: replace
the isRecord(extraction.graph_build_freshness) check with
isGraphBuildFreshnessMetadata(extraction.graph_build_freshness) and then set
graph.graph.graph_build_freshness = extraction.graph_build_freshness inside that
branch so only validated freshness metadata is propagated.

In `@src/runtime/freshness.ts`:
- Around line 149-160: The code collapses matches by label into a single entry
via matchedByLabel, so when a slice anchor omits node_id and labels are
duplicated across files you only check one node; change matchedByLabel from
Map<label, node> to Map<label, Node[]>, populate arrays (push each node) instead
of overwriting, and where you resolve an anchor (the block using
matchedById.get(...) or matchedByLabel.get(...)) handle the label-only case by
collecting all matching nodes' source_file values (e.g., build a list of source
files) and add every matching source_file to whatever freshness set/logic is
used (also update the similar logic referenced around lines 175-178 to iterate
the array of nodes rather than assuming a single node).

In `@src/runtime/implementation-pack.ts`:
- Line 37: The current EXPLICIT_SERVER_ACTION_TARGET_PATTERN (used by
promptExplicitlyTargetsServerAction) incorrectly matches lone "server"; update
the regex to only match explicit phrases like "server action", "server-action",
or "server side action" (e.g. replace /\bserver(?:\s+action)?\b/i with something
like /\bserver(?:[-\s]action|\s+side\s+action)\b/i) and apply the same change to
the other occurrence referenced at lines 240-242; ensure
promptExplicitlyTargetsServerAction and any consumers reference the updated
EXPLICIT_SERVER_ACTION_TARGET_PATTERN constant so plain "server" no longer
triggers.

In `@src/runtime/stdio-server.ts`:
- Around line 721-741: The current recordContextPackTelemetry function may throw
while calling buildGraphSummary(loadGraphCached(graphPath)) and thereby skip
emitting the base failed/succeeded telemetry; change it so recordTelemetryEvent
is called immediately with the core fields (command, stage, version, os,
nodeMajor) and include failureBucket from
classifyToolTelemetryFailure(toolResponse.error.message,
toolResponse.error.code) when toolResponse.error exists, then separately try to
load/build the graph summary (using loadGraphCached and buildGraphSummary) and,
if successful, call recordTelemetryEvent again (or update/send an enriched
event) with repoSizeBucket and graphSizeBucket; ensure the try/catch only wraps
the summary lookup/enrichment so graph-read failures do not prevent the initial
telemetry from being emitted.

In `@src/runtime/stdio/tools.ts`:
- Around line 287-314: In selectedContextSourceFilesFromCachedExplainPayload,
don't return early when expandableFocusFiles exists; instead compute the file
list from the pack (using payload.pack -> selection built from
pack.matched_nodes, pack.execution_slice, pack.slice) via
selectedContextSourceFilesFromRetrieveResult and then union (dedupe) that result
with expandableFocusFiles so the final returned array contains both the
pack-derived files and any expandable.follow_up.focus_files; use the existing
symbols expandableFocusFiles, payload.pack, selection, and
selectedContextSourceFilesFromRetrieveResult to locate and implement the union.

In `@src/shared/graph-build-freshness.ts`:
- Around line 135-147: The runtime type guard is too loose: tighten checks in
the git branch (candidate.strategy === 'git') to ensure
candidate.git.dirty_files is an array of strings (use
Array.isArray(candidate.git.dirty_files) && candidate.git.dirty_files.every(i =>
typeof i === 'string')) and ensure candidate.git.dirty_file_fingerprints is a
plain object whose values are strings
(Object.values(candidate.git.dirty_file_fingerprints).every(v => typeof v ===
'string')). Do the same stricter value-type check for
candidate.filesystem.file_fingerprints (ensure it's an object and all values are
strings) so malformed payloads like dirty_files: [123] are rejected.
- Around line 31-42: normalizeSourceFile currently only boundary-checks absolute
inputs, letting relative inputs like "../outside.ts" escape rootPath; change the
logic to resolve relative inputs against rootPath first. Specifically, use
resolve(rootPath, trimmed) (or compute a resolvedPath variable via resolve when
isAbsolute(trimmed) is false) and then compute relativePath =
relative(resolve(rootPath), resolvedPath); keep the existing checks (empty/'.'
=> '', '..' or startsWith(`..${sep}`) or isAbsolute(relativePath) => '') and
finally return the normalized path with backslashes replaced; reference symbols:
normalizeSourceFile, trimmed, rootPath, resolve, relative, sep.

In `@tests/unit/install.test.ts`:
- Line 201: The regex used in current.matchAll (the line with
/(?:['\"])?([A-Za-z0-9+/=]{40,})(?:['\"])?/g) is too permissive and will match
arbitrary long alphanumeric runs; narrow it to only match intended base64-like
tokens by requiring context and stricter structure — for example, require a
prefix or surrounding quotes (e.g. lookbehind for "token[:=]\s*" or require
quotes), restrict the length range (e.g. {40,88}), and require base64 padding
rules (use [A-Za-z0-9+/]{40,88}={0,2}) so only realistic base64 payloads are
captured; update the regex used in current.matchAll accordingly.

In `@tests/unit/rebrand-surface.test.ts`:
- Line 110: The slice that computes renameSection uses readme.indexOf('\n## ',
headingIndex + RENAME_NOTE_HEADING.length) which can return -1 for the final
section and cause slice(headingIndex, -1) to drop the last character; fix by
computing an endIndex variable (e.g., const endIndex = readme.indexOf('\n## ',
headingIndex + RENAME_NOTE_HEADING.length); if endIndex === -1 set endIndex =
readme.length) and then use readme.slice(headingIndex, endIndex) when building
renameSection so the final-section case is guarded; update references around
renameSection, headingIndex and RENAME_NOTE_HEADING.

---

Outside diff comments:
In `@src/infrastructure/context-pack-command.ts`:
- Around line 2096-2120: The impact branch currently returns before computing
selected-source freshness so --require-fresh-context is ignored and
graphFreshness uses initialGraphFreshness; update the branch in
context-pack-command (the block using dependencies.retrieveContext,
pickImpactTarget, dependencies.analyzeImpact, dependencies.compactImpactResult)
to either reject options.retrievalLevel/retrievalStrategy like the review path
does or call analyzeGraphContextFreshness (or the existing
analyzeGraphContextFreshness-like utility) on the selected_source_files from the
retrieval result and use that value when building baseResponse/buildPackSchemaV1
(replace initialGraphFreshness with the computed selected-context freshness);
keep existing inclusion of routing debug via buildRoutingDebug(retrieval) and
ensure impactMetadata still receives the same args.

---

Nitpick comments:
In
`@docs/benchmarks/suite/fixtures/go-service/internal/service/session_service_test.go`:
- Around line 19-21: The test currently only checks the count of
loginAudit.Events; add an assertion that the recorded event's payload/fields
match expected values to catch formatting/content regressions. After the
existing len(loginAudit.Events) check, inspect loginAudit.Events[0] (or the
single event) and assert specific fields (e.g. event.Type/Event.Action,
event.UserID, event.Timestamp format, or event.Payload JSON string) against
known expected values or a canonical expected struct; use t.Fatalf or t.Errorf
to fail with a clear message on mismatch. Ensure you reference the same
loginAudit and Events slice used in the test and compare only stable fields (or
normalize timestamps) to avoid flaky failures.

In `@docs/benchmarks/suite/fixtures/python-service/app/routes/session_routes.py`:
- Around line 24-25: The route in session_routes.py duplicates the
"login_session_created" audit-event literal; change it to use the canonical
source (either a shared constant on AuditLog, e.g.
AuditLog.LOGIN_SESSION_CREATED, or the audit/event field returned by
service.create_login_session) so LoginSessionResponse(...) uses that shared
value instead of a hardcoded string; update the call site around
service.create_login_session and LoginSessionResponse to pull the audit event
from the service result or AuditLog constant and remove the duplicated literal.

In `@src/infrastructure/context-prompt-command.ts`:
- Around line 74-86: analyzeGraphContextFreshness is being called
unconditionally before retrieveContext, causing an unnecessary extra repo scan;
only run it when required by the requireFreshGraph flag. Change the flow so you
call analyzeGraphContextFreshness(...) and pass its result into
requireFreshGraph(...) only inside an if (options.requireFreshGraph === true)
block, keep the retrieval via retrieveContext(...) as-is, and continue to
compute graphFreshness after retrieval using
selectedContextSourceFilesFromRetrieveResult(retrieval) for
requireFreshSelectedContext(...); reference analyzeGraphContextFreshness,
requireFreshGraph, retrieveContext,
selectedContextSourceFilesFromRetrieveResult, and requireFreshSelectedContext to
locate the affected code.

In `@tests/unit/implementation-pack.test.ts`:
- Around line 636-757: Add a negative-intent regression test that mirrors the
existing positive case but uses a retrieval.question that contains the word
"server" (e.g., "move dashboard owner filter persistence into the server and
keep the client component presentational") rather than the phrase "server
action"; reuse buildServerActionPreferenceGraph() to create the graph and call
buildImplementationPackGuidance(graph, retrieval, { budget: 2200, taskIntent:
'implement', limit: 4 }); then assert that guidance.workflow_centers[0].path is
NOT 'app/dashboard/actions.ts' (or that likely_edit_files[0] does not point to
actions.ts) and that the cautions include treating app/dashboard/page.tsx as
primary for runtime-boundary prompts to ensure the server-action preference is
not applied.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3c559b6b-da0e-4710-adfb-8e00aea731a3

📥 Commits

Reviewing files that changed from the base of the PR and between 5370422 and 38d152d.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (95)
  • .github/ISSUE_TEMPLATE/design_partner_report.yml
  • .github/repo-metadata.json
  • CHANGELOG.md
  • README.md
  • docs/agent-governance.md
  • docs/benchmarks/suite/README.md
  • docs/benchmarks/suite/fixtures/go-service/README.md
  • docs/benchmarks/suite/fixtures/go-service/cmd/api/main.go
  • docs/benchmarks/suite/fixtures/go-service/go.mod
  • docs/benchmarks/suite/fixtures/go-service/internal/audit/login_audit.go
  • docs/benchmarks/suite/fixtures/go-service/internal/httpapi/session_handler.go
  • docs/benchmarks/suite/fixtures/go-service/internal/repository/session_repository.go
  • docs/benchmarks/suite/fixtures/go-service/internal/service/session_service.go
  • docs/benchmarks/suite/fixtures/go-service/internal/service/session_service_test.go
  • docs/benchmarks/suite/fixtures/python-service/README.md
  • docs/benchmarks/suite/fixtures/python-service/app/__init__.py
  • docs/benchmarks/suite/fixtures/python-service/app/audit.py
  • docs/benchmarks/suite/fixtures/python-service/app/main.py
  • docs/benchmarks/suite/fixtures/python-service/app/repositories/__init__.py
  • docs/benchmarks/suite/fixtures/python-service/app/repositories/session_repository.py
  • docs/benchmarks/suite/fixtures/python-service/app/routes/__init__.py
  • docs/benchmarks/suite/fixtures/python-service/app/routes/session_routes.py
  • docs/benchmarks/suite/fixtures/python-service/app/services/__init__.py
  • docs/benchmarks/suite/fixtures/python-service/app/services/session_service.py
  • docs/benchmarks/suite/fixtures/python-service/requirements.txt
  • docs/benchmarks/suite/fixtures/python-service/tests/test_session_service.py
  • docs/benchmarks/suite/repos.json
  • docs/benchmarks/suite/tasks.json
  • docs/claims-and-evidence.md
  • docs/concepts/context-packs.md
  • docs/design-partners.md
  • docs/integrations/compatibility.md
  • docs/launch-checklist.md
  • docs/mcp-registry/server.json
  • docs/reference/cli-and-mcp.md
  • docs/release.md
  • docs/roadmap.md
  • docs/telemetry.md
  • docs/tutorials/agent-quickstarts.md
  • docs/tutorials/getting-started.md
  • package.json
  • sbom.cdx.json
  • src/cli/main.ts
  • src/cli/parser.ts
  • src/contracts/context-pack.ts
  • src/infrastructure/compare.ts
  • src/infrastructure/context-pack-command.ts
  • src/infrastructure/context-prompt-command.ts
  • src/infrastructure/doctor.ts
  • src/infrastructure/generate.ts
  • src/infrastructure/handoff-command.ts
  • src/infrastructure/install.ts
  • src/infrastructure/try-command.ts
  • src/pipeline/build.ts
  • src/pipeline/export.ts
  • src/runtime/context-pack-governance.ts
  • src/runtime/freshness.ts
  • src/runtime/implementation-pack.ts
  • src/runtime/stdio-server.ts
  • src/runtime/stdio/definitions.ts
  • src/runtime/stdio/tools.ts
  • src/runtime/task-applicability.ts
  • src/shared/git.ts
  • src/shared/graph-build-freshness.ts
  • src/shared/telemetry.ts
  • tests/unit/answer-ready-explain-pack.test.ts
  • tests/unit/benchmark-suite-docs.test.ts
  • tests/unit/benchmark-suite.test.ts
  • tests/unit/cli.test.ts
  • tests/unit/compare-native-agent.test.ts
  • tests/unit/context-pack-command.test.ts
  • tests/unit/context-prompt-command.test.ts
  • tests/unit/design-partners-doc.test.ts
  • tests/unit/freshness-surfaces.test.ts
  • tests/unit/generate.test.ts
  • tests/unit/getting-started-docs.test.ts
  • tests/unit/github-repo-metadata.test.ts
  • tests/unit/graph-build-freshness.test.ts
  • tests/unit/handoff-command.test.ts
  • tests/unit/implementation-pack.test.ts
  • tests/unit/install-compatibility.test.ts
  • tests/unit/install-docs.test.ts
  • tests/unit/install-templates.test.ts
  • tests/unit/install.test.ts
  • tests/unit/launch-checklist-doc.test.ts
  • tests/unit/mcp-registry-metadata.test.ts
  • tests/unit/package-metadata.test.ts
  • tests/unit/rebrand-surface.test.ts
  • tests/unit/roadmap-docs.test.ts
  • tests/unit/stdio-server.test.ts
  • tests/unit/stdio-tool-profile.test.ts
  • tests/unit/telemetry-doc.test.ts
  • tests/unit/telemetry.test.ts
  • tests/unit/try-command.test.ts
  • tests/unit/why-madar-doc.test.ts

Comment on lines +13 to +17
func NewSessionService(repository *repository.SessionRepository, audit *audit.LoginAudit) *SessionService {
return &SessionService{
repository: repository,
audit: audit,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on nil dependencies in constructor.

Line 13 currently allows nil repository/audit, which causes runtime panics later in CreateLoginSession. Guarding at construction prevents invalid service state.

Proposed fix
 func NewSessionService(repository *repository.SessionRepository, audit *audit.LoginAudit) *SessionService {
+	if repository == nil {
+		panic("session repository is required")
+	}
+	if audit == nil {
+		panic("login audit is required")
+	}
 	return &SessionService{
 		repository: repository,
 		audit:      audit,
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewSessionService(repository *repository.SessionRepository, audit *audit.LoginAudit) *SessionService {
return &SessionService{
repository: repository,
audit: audit,
}
func NewSessionService(repository *repository.SessionRepository, audit *audit.LoginAudit) *SessionService {
if repository == nil {
panic("session repository is required")
}
if audit == nil {
panic("login audit is required")
}
return &SessionService{
repository: repository,
audit: audit,
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@docs/benchmarks/suite/fixtures/go-service/internal/service/session_service.go`
around lines 13 - 17, NewSessionService currently permits nil repository or
audit which leads to panics later in CreateLoginSession; update
NewSessionService to fail fast by validating its inputs (repository
*repository.SessionRepository and audit *audit.LoginAudit) and refusing to
construct an invalid SessionService—either return an error from
NewSessionService (change signature to (*SessionService, error)) or explicitly
panic with a clear message if either dependency is nil; ensure callers of
NewSessionService are updated to handle the new error or accept the panic and
include descriptive messages identifying the missing dependency.

Comment on lines +9 to +11
class LoginSessionRequest(BaseModel):
user_id: str
device_id: str
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject empty identifiers at the API boundary.

LoginSessionRequest currently accepts "" for both fields, which lets the service create session_id=":" and an audit event with unusable identifiers. Please validate user_id and device_id before calling the service.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/benchmarks/suite/fixtures/python-service/app/routes/session_routes.py`
around lines 9 - 11, LoginSessionRequest currently allows empty strings for
user_id and device_id; update the model (LoginSessionRequest) to validate that
both fields are non-empty (e.g., use pydantic validators or constrained types
like conint/constr with min_length=1, or add `@validator` methods for user_id and
device_id) and raise a clear validation error so callers are rejected at the API
boundary before invoking the session creation service.

Comment on lines +16 to +21
_default_repository = SessionRepository()
_default_audit_log = AuditLog()


def get_session_service() -> SessionService:
return SessionService(repository=_default_repository, audit_log=_default_audit_log)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not reuse a process-wide in-memory audit log here.

get_session_service() reuses _default_audit_log, so every request appends to the same events list forever. In a benchmark fixture that leaks state across runs and makes memory/latency drift upward with request volume. Build the default service with a fresh AuditLog per request, or swap the default sink to a stateless implementation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@docs/benchmarks/suite/fixtures/python-service/app/services/session_service.py`
around lines 16 - 21, get_session_service currently reuses the process-wide
_default_audit_log causing unbounded event accumulation; change it to
instantiate a fresh AuditLog for each call (or swap the default sink to a
stateless implementation) so SessionService is created with a per-request audit
log. Update get_session_service to construct a new AuditLog() and pass it to
SessionService(repository=_default_repository, audit_log=new_audit_log) instead
of using _default_audit_log; you can keep _default_repository but do not reuse
_default_audit_log.


This guide is the user-facing install matrix for the current CLI contract. It focuses on the commands users actually run, the files each path writes, how to verify the result, and the most important behavior or limitation to know before choosing a surface.

If you want one copy/paste walkthrough per main agent target, use the [agent quickstarts](https://github.com/mohanagy/madar/blob/next/docs/tutorials/agent-quickstarts.md) after you pick a row from this matrix.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a repo-relative doc link instead of a branch-pinned GitHub URL.

This hardcodes next and can go stale or break in forks/releases. Prefer a relative link so docs stay portable.

Suggested fix
-If you want one copy/paste walkthrough per main agent target, use the [agent quickstarts](https://github.com/mohanagy/madar/blob/next/docs/tutorials/agent-quickstarts.md) after you pick a row from this matrix.
+If you want one copy/paste walkthrough per main agent target, use the [agent quickstarts](../tutorials/agent-quickstarts.md) after you pick a row from this matrix.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
If you want one copy/paste walkthrough per main agent target, use the [agent quickstarts](https://github.com/mohanagy/madar/blob/next/docs/tutorials/agent-quickstarts.md) after you pick a row from this matrix.
If you want one copy/paste walkthrough per main agent target, use the [agent quickstarts](../tutorials/agent-quickstarts.md) after you pick a row from this matrix.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/integrations/compatibility.md` at line 5, Replace the branch-pinned
GitHub URL in the sentence pointing to the agent quickstarts with a
repo-relative Markdown link so it doesn't hardcode "next"; update the link
target from
"https://github.com/mohanagy/madar/blob/next/docs/tutorials/agent-quickstarts.md"
to a relative path like "docs/tutorials/agent-quickstarts.md" while keeping the
link text "[agent quickstarts]" unchanged to preserve readability and
portability.


Use this page when you already know which agent you want to wire and you want one verified install path plus one copy/paste Smoke test.

Before any agent-specific install, start with the [getting started tutorial](https://github.com/mohanagy/madar/blob/next/docs/tutorials/getting-started.md) and its one-command trial flow so the repo has a graph and you know Madar itself is behaving locally:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid branch-pinned GitHub docs links in user docs.

This hardcodes /blob/next/, which can drift after merges/releases. Use a relative docs link so navigation stays correct across branches.

Proposed fix
-Before any agent-specific install, start with the [getting started tutorial](https://github.com/mohanagy/madar/blob/next/docs/tutorials/getting-started.md) and its one-command trial flow so the repo has a graph and you know Madar itself is behaving locally:
+Before any agent-specific install, start with the [getting started tutorial](./getting-started.md) and its one-command trial flow so the repo has a graph and you know Madar itself is behaving locally:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Before any agent-specific install, start with the [getting started tutorial](https://github.com/mohanagy/madar/blob/next/docs/tutorials/getting-started.md) and its one-command trial flow so the repo has a graph and you know Madar itself is behaving locally:
Before any agent-specific install, start with the [getting started tutorial](./getting-started.md) and its one-command trial flow so the repo has a graph and you know Madar itself is behaving locally:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/tutorials/agent-quickstarts.md` at line 5, The link in the
agent-quickstarts.md content uses a branch-pinned GitHub URL
("https://github.com/.../blob/next/docs/tutorials/getting-started.md") which can
break after merges; replace that absolute branch-specific URL with a relative
docs link (e.g., "../getting-started.md" or "./getting-started.md") for the
"[getting started tutorial]" link so it resolves correctly across branches,
verify the Markdown link text remains unchanged and test local navigation in the
docs site after updating.

Comment on lines +287 to +314
function selectedContextSourceFilesFromCachedExplainPayload(payload: CachedExplainContextPackPayload): string[] {
const expandableFocusFiles = (payload.expandable ?? [])
.flatMap((entry) => entry.follow_up.focus_files)
.filter((sourceFile, index, all) => sourceFile.trim().length > 0 && all.indexOf(sourceFile) === index)
if (expandableFocusFiles.length > 0) {
return expandableFocusFiles
}

if (!isObjectRecord(payload.pack)) {
return []
}

const pack = payload.pack
const selection: {
matched_nodes: RetrieveResult['matched_nodes']
execution_slice?: NonNullable<RetrieveResult['execution_slice']>
slice?: NonNullable<RetrieveResult['slice']>
} = {
matched_nodes: Array.isArray(pack.matched_nodes) ? pack.matched_nodes as RetrieveResult['matched_nodes'] : [],
}
if (isObjectRecord(pack.execution_slice)) {
selection.execution_slice = pack.execution_slice as unknown as NonNullable<RetrieveResult['execution_slice']>
}
if (isObjectRecord(pack.slice)) {
selection.slice = pack.slice as unknown as NonNullable<RetrieveResult['slice']>
}
return selectedContextSourceFilesFromRetrieveResult(selection as Pick<RetrieveResult, 'matched_nodes' | 'execution_slice' | 'slice'>)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't let cached expandable handles replace the actual selected context.

Returning early on expandable.follow_up.focus_files under-approximates the cached selection: those files describe follow-up handles, not the full set of files still present in pack.matched_nodes / execution_slice / slice. On cache hits, require_fresh_context can therefore miss stale files that are still visible in the payload. Please union the expandable focus files with the pack-derived files instead of treating them as a replacement.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/stdio/tools.ts` around lines 287 - 314, In
selectedContextSourceFilesFromCachedExplainPayload, don't return early when
expandableFocusFiles exists; instead compute the file list from the pack (using
payload.pack -> selection built from pack.matched_nodes, pack.execution_slice,
pack.slice) via selectedContextSourceFilesFromRetrieveResult and then union
(dedupe) that result with expandableFocusFiles so the final returned array
contains both the pack-derived files and any expandable.follow_up.focus_files;
use the existing symbols expandableFocusFiles, payload.pack, selection, and
selectedContextSourceFilesFromRetrieveResult to locate and implement the union.

Comment on lines +31 to +42
if (!isAbsolute(trimmed)) {
return trimmed.replaceAll('\\', '/')
}

const relativePath = relative(resolve(rootPath), resolve(trimmed))
if (relativePath === '' || relativePath === '.') {
return ''
}
if (relativePath === '..' || relativePath.startsWith(`..${sep}`) || isAbsolute(relativePath)) {
return ''
}
return relativePath.replaceAll('\\', '/')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject relative paths that escape rootPath.

normalizeSourceFile() only root-checks absolute inputs. A relative source_file like ../outside.ts currently survives normalization, and the filesystem fallback will hash a file outside the workspace. Resolve relative inputs against rootPath before applying the boundary check.

Proposed fix
 function normalizeSourceFile(rootPath: string, sourceFile: string): string {
   const trimmed = sourceFile.trim()
   if (trimmed.length === 0) {
     return ''
   }

-  if (!isAbsolute(trimmed)) {
-    return trimmed.replaceAll('\\', '/')
-  }
-
-  const relativePath = relative(resolve(rootPath), resolve(trimmed))
+  const resolvedRootPath = resolve(rootPath)
+  const candidatePath = isAbsolute(trimmed) ? resolve(trimmed) : resolve(resolvedRootPath, trimmed)
+  const relativePath = relative(resolvedRootPath, candidatePath)
   if (relativePath === '' || relativePath === '.') {
     return ''
   }
   if (relativePath === '..' || relativePath.startsWith(`..${sep}`) || isAbsolute(relativePath)) {
     return ''
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!isAbsolute(trimmed)) {
return trimmed.replaceAll('\\', '/')
}
const relativePath = relative(resolve(rootPath), resolve(trimmed))
if (relativePath === '' || relativePath === '.') {
return ''
}
if (relativePath === '..' || relativePath.startsWith(`..${sep}`) || isAbsolute(relativePath)) {
return ''
}
return relativePath.replaceAll('\\', '/')
const resolvedRootPath = resolve(rootPath)
const candidatePath = isAbsolute(trimmed) ? resolve(trimmed) : resolve(resolvedRootPath, trimmed)
const relativePath = relative(resolvedRootPath, candidatePath)
if (relativePath === '' || relativePath === '.') {
return ''
}
if (relativePath === '..' || relativePath.startsWith(`..${sep}`) || isAbsolute(relativePath)) {
return ''
}
return relativePath.replaceAll('\\', '/')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/shared/graph-build-freshness.ts` around lines 31 - 42,
normalizeSourceFile currently only boundary-checks absolute inputs, letting
relative inputs like "../outside.ts" escape rootPath; change the logic to
resolve relative inputs against rootPath first. Specifically, use
resolve(rootPath, trimmed) (or compute a resolvedPath variable via resolve when
isAbsolute(trimmed) is false) and then compute relativePath =
relative(resolve(rootPath), resolvedPath); keep the existing checks (empty/'.'
=> '', '..' or startsWith(`..${sep}`) or isAbsolute(relativePath) => '') and
finally return the normalized path with backslashes replaced; reference symbols:
normalizeSourceFile, trimmed, rootPath, resolve, relative, sep.

Comment on lines +135 to +147
if (candidate.strategy === 'git') {
return !!candidate.git
&& typeof candidate.git.head_sha === 'string'
&& Array.isArray(candidate.git.dirty_files)
&& !!candidate.git.dirty_file_fingerprints
&& typeof candidate.git.dirty_file_fingerprints === 'object'
&& !Array.isArray(candidate.git.dirty_file_fingerprints)
}

return !!candidate.filesystem
&& !!candidate.filesystem.file_fingerprints
&& typeof candidate.filesystem.file_fingerprints === 'object'
&& !Array.isArray(candidate.filesystem.file_fingerprints)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten the nested checks in the runtime type guard.

This returns true for payloads like { dirty_files: [123], dirty_file_fingerprints: { a: 1 } }. That defeats the new runtime contract and makes downstream freshness handling trust malformed metadata.

Proposed fix
   if (candidate.strategy === 'git') {
     return !!candidate.git
       && typeof candidate.git.head_sha === 'string'
       && Array.isArray(candidate.git.dirty_files)
+      && candidate.git.dirty_files.every((filePath) => typeof filePath === 'string')
       && !!candidate.git.dirty_file_fingerprints
       && typeof candidate.git.dirty_file_fingerprints === 'object'
       && !Array.isArray(candidate.git.dirty_file_fingerprints)
+      && Object.values(candidate.git.dirty_file_fingerprints).every((fingerprint) => typeof fingerprint === 'string')
   }

   return !!candidate.filesystem
     && !!candidate.filesystem.file_fingerprints
     && typeof candidate.filesystem.file_fingerprints === 'object'
     && !Array.isArray(candidate.filesystem.file_fingerprints)
+    && Object.values(candidate.filesystem.file_fingerprints).every((fingerprint) => typeof fingerprint === 'string')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/shared/graph-build-freshness.ts` around lines 135 - 147, The runtime type
guard is too loose: tighten checks in the git branch (candidate.strategy ===
'git') to ensure candidate.git.dirty_files is an array of strings (use
Array.isArray(candidate.git.dirty_files) && candidate.git.dirty_files.every(i =>
typeof i === 'string')) and ensure candidate.git.dirty_file_fingerprints is a
plain object whose values are strings
(Object.values(candidate.git.dirty_file_fingerprints).every(v => typeof v ===
'string')). Do the same stricter value-type check for
candidate.filesystem.file_fingerprints (ensure it's an object and all values are
strings) so malformed payloads like dirty_files: [123] are rejected.

}

for (const match of current.matchAll(/'([A-Za-z0-9+/=]{40,})'/g)) {
for (const match of current.matchAll(/(?:['\"])?([A-Za-z0-9+/=]{40,})(?:['\"])?/g)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Constrain base64 matching to avoid decoding arbitrary long tokens.

At Line 201, the new pattern can match any long alphanumeric chunk (not just intentional payloads), which can add noisy decodes and make downstream assertions flaky.

Suggested tightening
-    for (const match of current.matchAll(/(?:['\"])?([A-Za-z0-9+/=]{40,})(?:['\"])?/g)) {
+    for (const match of current.matchAll(/(?:^|[^A-Za-z0-9+/=])(?:['"])?([A-Za-z0-9+/]{40,}={0,2})(?:['"])?(?=$|[^A-Za-z0-9+/=])/g)) {
       const value = match[1]
       if (typeof value !== 'string' || seen.has(value)) {
         continue
       }
+
+      // Skip non-canonical base64-like matches.
+      const normalized = value.replace(/=+$/, '')
+      const roundTrip = Buffer.from(value, 'base64').toString('base64').replace(/=+$/, '')
+      if (roundTrip !== normalized) {
+        continue
+      }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/install.test.ts` at line 201, The regex used in current.matchAll
(the line with /(?:['\"])?([A-Za-z0-9+/=]{40,})(?:['\"])?/g) is too permissive
and will match arbitrary long alphanumeric runs; narrow it to only match
intended base64-like tokens by requiring context and stricter structure — for
example, require a prefix or surrounding quotes (e.g. lookbehind for
"token[:=]\s*" or require quotes), restrict the length range (e.g. {40,88}), and
require base64 padding rules (use [A-Za-z0-9+/]{40,88}={0,2}) so only realistic
base64 payloads are captured; update the regex used in current.matchAll
accordingly.


expect(headingIndex).toBeGreaterThanOrEqual(0)

const renameSection = readme.slice(headingIndex, readme.indexOf('\n## ', headingIndex + RENAME_NOTE_HEADING.length))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard the “last section in file” case when slicing the rename block.

If readme.indexOf('\n## ', ...) returns -1, slice(headingIndex, -1) drops the last character and can make this assertion block flaky for end-of-file sections.

Proposed fix
-    const renameSection = readme.slice(headingIndex, readme.indexOf('\n## ', headingIndex + RENAME_NOTE_HEADING.length))
+    const nextHeadingIndex = readme.indexOf('\n## ', headingIndex + RENAME_NOTE_HEADING.length)
+    const renameSection =
+      nextHeadingIndex === -1 ? readme.slice(headingIndex) : readme.slice(headingIndex, nextHeadingIndex)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const renameSection = readme.slice(headingIndex, readme.indexOf('\n## ', headingIndex + RENAME_NOTE_HEADING.length))
const nextHeadingIndex = readme.indexOf('\n## ', headingIndex + RENAME_NOTE_HEADING.length)
const renameSection =
nextHeadingIndex === -1 ? readme.slice(headingIndex) : readme.slice(headingIndex, nextHeadingIndex)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/rebrand-surface.test.ts` at line 110, The slice that computes
renameSection uses readme.indexOf('\n## ', headingIndex +
RENAME_NOTE_HEADING.length) which can return -1 for the final section and cause
slice(headingIndex, -1) to drop the last character; fix by computing an endIndex
variable (e.g., const endIndex = readme.indexOf('\n## ', headingIndex +
RENAME_NOTE_HEADING.length); if endIndex === -1 set endIndex = readme.length)
and then use readme.slice(headingIndex, endIndex) when building renameSection so
the final-section case is guarded; update references around renameSection,
headingIndex and RENAME_NOTE_HEADING.

mohanagy and others added 13 commits June 3, 2026 22:58
Write separate baseline and Madar prompt files for native-agent compare
runs, preserve the legacy prompt path as the Madar prompt for compatibility,
and update report paths/tests to track both artifacts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix: split compare prompt artifacts by run mode
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chore: prepare 0.27.9-next.4 release
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t-followups

fix: harden native-agent compare prompts and timeouts
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chore: prepare 0.27.9-next.5 release
Switch Windows compare, benchmark, and review exec runners to cmd.exe-compatible shell selection and quoting, and keep benchmark suite workspace wrapping on the same contract.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make quote-shape assertions explicit about POSIX behavior or portable across the new cmd.exe Windows contract so Windows CI checks the intended behavior instead of the old PowerShell quoting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix: align Windows exec runners with cmd.exe
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mohanagy mohanagy merged commit 01268c3 into main Jun 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant