Skip to content

Align spec structure and fix health report findings#2941

Open
xrajesh wants to merge 2 commits into
openshift:mainfrom
xrajesh:xav/spec-first
Open

Align spec structure and fix health report findings#2941
xrajesh wants to merge 2 commits into
openshift:mainfrom
xrajesh:xav/spec-first

Conversation

@xrajesh
Copy link
Copy Markdown
Contributor

@xrajesh xrajesh commented May 29, 2026

Summary

  • Align .ai/spec/ structure with spec-first conventions: absorb layer READMEs into main README, add cross-reference table, complete conventions section, add spec pointer in CLAUDE.md
  • Create ARCHITECTURE.md with Mermaid diagrams for human onboarding
  • Fix all findings from spec health evaluation:
    • Update how/query-pipeline.md for the LLMExecutionAgent refactoring
    • Remove [PLANNED] markers from shipped Google Vertex providers (Gemini + Anthropic/Claude)
    • Create how/auth.md and how/quota.md for undocumented implementation patterns
    • Add LLMExecutionAgent to how/project-structure.md module map
    • Fix minor boundary violation in what/observability.md

Test plan

  • Verify all file paths referenced in spec files exist in the codebase
  • Confirm Mermaid diagrams in ARCHITECTURE.md render correctly on GitHub
  • Review behavioral rules in what/ files match current code behavior

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Enhanced specification structure with comprehensive cross-references and updated guidelines
    • Added detailed architecture documentation for authentication, quota systems, and request pipelines
    • Published spec health report identifying current coverage gaps
    • Updated roadmap including Anthropic provider support and additional platform initiatives

xrajesh and others added 2 commits May 29, 2026 15:33
- Absorb what/README.md and how/README.md indexes into main .ai/spec/README.md
- Add cross-reference table mapping what/ to how/ files
- Complete conventions section (rule numbering, authority, constraints, new-file guidance)
- Add ## Specs pointer in CLAUDE.md/AGENTS.md
- Create ARCHITECTURE.md with Mermaid diagrams for human onboarding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stale:
- how/query-pipeline.md: document LLMExecutionAgent refactoring (tool-calling
  loop moved from DocsSummarizer to separate class)
- what/system-overview.md: remove [PLANNED] from shipped Google Vertex providers
  (Gemini and Anthropic/Claude)
- what/llm-providers.md: clarify Anthropic is available via Vertex, direct
  provider still planned

Missing:
- Create how/auth.md: strategy pattern, K8sClientSingleton, TokenReview/SAR flow
- Create how/quota.md: limiter abstraction, PostgreSQL schema, scheduler daemon
- how/project-structure.md: add LLMExecutionAgent to module map and request flow

Structural:
- what/observability.md: remove constants.py file reference (boundary violation)
- README.md: add new how/ files to index and cross-reference table
- health-report.md: initial health evaluation results

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci openshift-ci Bot requested review from blublinsky and tisnik May 29, 2026 20:42
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xrajesh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR adds comprehensive architecture documentation for OpenShift LightSpeed (OLS) and reorganizes existing specifications to document a new architectural split: LLMExecutionAgent now handles multi-round tool-calling while DocsSummarizer orchestrates pipeline stages 1–5. New auth and quota architecture specifications are introduced, and all specs are cross-referenced and structured under a health-assessed directory.

Changes

Architecture and Specification Documentation

Layer / File(s) Summary
High-level architecture overview and entry points
ARCHITECTURE.md, AGENTS.md
New ARCHITECTURE.md provides system context, component relationships, request-flow walkthrough for /v1/query, key abstractions (AppConfig, LLM registry, budget tracking, cache), and deployment constraints. AGENTS.md updated to point to .ai/spec/ for specifications.
Specification organization and health assessment
.ai/spec/README.md, .ai/spec/health-report.md
.ai/spec/README.md expanded with Structure table enumerating what/how specs, new Cross-Reference mapping, and updated Conventions. .ai/spec/health-report.md added with dated status assessment covering stale items, missing coverage (Auth/Quota docs), structural concerns, and alignment confirmations.
LLMExecutionAgent and query pipeline architecture
.ai/spec/how/project-structure.md, .ai/spec/how/query-pipeline.md
project-structure module map updated to introduce LLMExecutionAgent and describe DocsSummarizer delegation; POST /v1/query diagram revised. query-pipeline documentation reorganized to split responsibility between DocsSummarizer (stages 1–5, including RAG and history) and LLMExecutionAgent (tool-calling loop), with Stage 6 explicitly documenting streaming interleaving, exit conditions, and round handling.
New architecture specifications: auth and quota
.ai/spec/how/auth.md, .ai/spec/how/quota.md
auth.md introduces FastAPI strategy-pattern auth factory, K8s TokenReview+SubjectAccessReview flow, return tuple semantics, virtual-path RBAC, and error mappings. quota.md documents PostgreSQL-backed per-user and per-cluster token limits, limiter factory, request-time enforcement, background replenishment, schema, and integration points.
Behavioral specifications and planned work
.ai/spec/what/llm-providers.md, .ai/spec/what/observability.md, .ai/spec/what/system-overview.md
llm-providers.md updated for OLS-2776 direct Anthropic provider. observability.md clarified header redaction constraint. system-overview.md rephrased LLM Providers section and reorganized Planned Changes table with expanded Jira items (OLS-1680 onward, including BYOK, MCP enhancements, compliance policies, etc.).

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title "Align spec structure and fix health report findings" accurately reflects the main objectives: restructuring the .ai/spec/ directory, consolidating documentation, and addressing health report findings.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.ai/spec/health-report.md:
- Line 35: The sentence "All 8 provider files in code match the spec (openai,
azure_openai, watsonx, rhoai_vllm, rhelai_vllm, google_vertex, fake_provider)"
is inconsistent: either change the count from 8 to 7 or add the missing provider
name to the parenthetical list; edit the string in .ai/spec/health-report.md so
the numeric count and the listed provider identifiers (openai, azure_openai,
watsonx, rhoai_vllm, rhelai_vllm, google_vertex, fake_provider) match exactly.

In @.ai/spec/how/auth.md:
- Around line 19-26: The fenced code blocks showing the authentication
pseudocode (including symbols like get_auth_dependency, k8s.AuthDependency,
noop.AuthDependency, noop_with_token.AuthDependency, _extract_bearer_token,
get_user_info, kubernetes.AuthenticationV1Api.create_token_review,
V1TokenReview, SubjectAccessReview,
kubernetes.AuthorizationV1Api.create_subject_access_review, and
V1SubjectAccessReview) are missing language identifiers; update both fenced
blocks to include a language tag (e.g., ```text) so the blocks become ```text
... ``` and comply with MD040 and the markdown linter.

In @.ai/spec/how/query-pipeline.md:
- Around line 129-151: The fenced code block showing the DocsSummarizer pipeline
is missing a language identifier causing MD040 lint errors; update the opening
fence to include a language token (e.g., change ``` to ```text) for the block
that begins with "DocsSummarizer delegates to self._llm_agent.execute():" so the
markdown linter recognizes it, leaving the block content (including references
to _iterate_with_tools, _collect_round_llm_chunks,
_process_tool_calls_for_round, and Stage 7: Finalization) unchanged.

In @.ai/spec/how/quota.md:
- Around line 22-39: The fenced flow diagrams (e.g., the blocks showing
process_request(), start_quota_scheduler(), quota_scheduler(),
token_usage_history.consume_tokens(), limiter.ensure_available_quota(),
limiter.consume_tokens(), and limiter.available_quota()) are missing language
identifiers and trigger MD040; update each triple-backtick fence to include a
language tag such as "text" (or "mermaid" if appropriate) so they comply with
the linter (e.g., change ``` to ```text for the blocks around process_request()
and start_quota_scheduler()), and apply the same fix to the other affected
fences referenced in the comment (lines 43-56).

In `@ARCHITECTURE.md`:
- Around line 77-82: The loop in the sequence diagram incorrectly shows
DocsSummarizer performing the multi-round tool-calling loop; move ownership to
LLMExecutionAgent by changing the loop block to surround LLMExecutionAgent and
update the message flow so the LLM sends a tool-call request to
LLMExecutionAgent (e.g., "LLM-->>LLMExecutionAgent: Tool call request"),
LLMExecutionAgent executes the tool via DocsSummarizer
("LLMExecutionAgent->>DocsSummarizer: Execute tool"), DocsSummarizer returns
results to LLMExecutionAgent ("DocsSummarizer-->>LLMExecutionAgent: Tool
result"), and LLMExecutionAgent feeds results back to LLM and re-invokes the
loop (loop up to max_iterations).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 49df8348-ceee-445c-9afa-74fbf0baa3dd

📥 Commits

Reviewing files that changed from the base of the PR and between 79c7197 and 8b21e29.

📒 Files selected for processing (13)
  • .ai/spec/README.md
  • .ai/spec/health-report.md
  • .ai/spec/how/README.md
  • .ai/spec/how/auth.md
  • .ai/spec/how/project-structure.md
  • .ai/spec/how/query-pipeline.md
  • .ai/spec/how/quota.md
  • .ai/spec/what/README.md
  • .ai/spec/what/llm-providers.md
  • .ai/spec/what/observability.md
  • .ai/spec/what/system-overview.md
  • AGENTS.md
  • ARCHITECTURE.md
💤 Files with no reviewable changes (2)
  • .ai/spec/what/README.md
  • .ai/spec/how/README.md

Comment thread .ai/spec/health-report.md

## No issues

- All 8 provider files in code match the spec (openai, azure_openai, watsonx, rhoai_vllm, rhelai_vllm, google_vertex, fake_provider)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Provider count and list are inconsistent.

Line 35 says “8 provider files” but the list includes 7 entries. Please either update the count or include the missing provider name.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/health-report.md at line 35, The sentence "All 8 provider files in
code match the spec (openai, azure_openai, watsonx, rhoai_vllm, rhelai_vllm,
google_vertex, fake_provider)" is inconsistent: either change the count from 8
to 7 or add the missing provider name to the parenthetical list; edit the string
in .ai/spec/health-report.md so the numeric count and the listed provider
identifiers (openai, azure_openai, watsonx, rhoai_vllm, rhelai_vllm,
google_vertex, fake_provider) match exactly.

Comment thread .ai/spec/how/auth.md
Comment on lines +19 to +26
```
olsconfig.yaml: authentication_config.module = "k8s" | "noop" | "noop-with-token"
-> get_auth_dependency(ols_config, virtual_path="/ols-access")
match module:
"k8s" -> k8s.AuthDependency(virtual_path)
"noop" -> noop.AuthDependency(virtual_path)
"noop-with-token" -> noop_with_token.AuthDependency(virtual_path)
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

Both fenced blocks are missing a language tag, which triggers MD040 and breaks consistent markdown lint compliance.

Suggested diff
-```
+```text
 olsconfig.yaml: authentication_config.module = "k8s" | "noop" | "noop-with-token"
   -> get_auth_dependency(ols_config, virtual_path="/ols-access")
        match module:
          "k8s"             -> k8s.AuthDependency(virtual_path)
          "noop"            -> noop.AuthDependency(virtual_path)
          "noop-with-token" -> noop_with_token.AuthDependency(virtual_path)

@@
- +text
HTTP request with Authorization: Bearer
-> _extract_bearer_token(header) -> token string
-> get_user_info(token)
kubernetes.AuthenticationV1Api.create_token_review(V1TokenReview(token))
-> if authenticated: return V1TokenReviewStatus (uid, username, groups)
-> if not authenticated: raise HTTPException(403)
-> SubjectAccessReview
kubernetes.AuthorizationV1Api.create_subject_access_review(
V1SubjectAccessReview(user, groups, non_resource_attributes={
path: virtual_path, # "/ols-access" or "/ols-metrics-access"
verb: "get"
})
)
-> if allowed: return (uid, username, False, token)
-> if denied: raise HTTPException(403)

Also applies to: 32-48

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 19-19: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/how/auth.md around lines 19 - 26, The fenced code blocks showing
the authentication pseudocode (including symbols like get_auth_dependency,
k8s.AuthDependency, noop.AuthDependency, noop_with_token.AuthDependency,
_extract_bearer_token, get_user_info,
kubernetes.AuthenticationV1Api.create_token_review, V1TokenReview,
SubjectAccessReview, kubernetes.AuthorizationV1Api.create_subject_access_review,
and V1SubjectAccessReview) are missing language identifiers; update both fenced
blocks to include a language tag (e.g., ```text) so the blocks become ```text
... ``` and comply with MD040 and the markdown linter.

Comment on lines +129 to 151
```
DocsSummarizer delegates to self._llm_agent.execute():

Stage 6: Tool-calling loop (_iterate_with_tools)
for round 1..max_rounds:
_collect_round_llm_chunks() -> text/reasoning chunks + tool_call chunks
yield text/reasoning StreamedChunks
exit conditions:
- LLM emits finish_reason="stop" (should_stop flag)
- final round reached (i == max_rounds)
- no tool calls in response
- tool execution exception
_process_tool_calls_for_round():
_resolve_tool_call_definitions() (skip duplicates, missing, ambiguous)
yield TOOL_CALL chunks
execute tools within round budget (enforce_tool_token_budget)
yield TOOL_RESULT chunks
append AI message + tool messages to prompt for next round
charge AI_ROUND and TOOL_RESULT

Stage 7: Finalization
yield END chunk with rag_chunks, truncated flag, token_counter
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block.

This block is missing a fenced-code language tag (MD040).

Suggested fix
-```
+```text
 DocsSummarizer delegates to self._llm_agent.execute():
 ...
 Stage 7: Finalization
   yield END chunk with rag_chunks, truncated flag, token_counter
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 129-129: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/how/query-pipeline.md around lines 129 - 151, The fenced code block
showing the DocsSummarizer pipeline is missing a language identifier causing
MD040 lint errors; update the opening fence to include a language token (e.g.,
change totext) for the block that begins with "DocsSummarizer delegates
to self._llm_agent.execute():" so the markdown linter recognizes it, leaving the
block content (including references to _iterate_with_tools,
_collect_round_llm_chunks, _process_tool_calls_for_round, and Stage 7:
Finalization) unchanged.


</details>

<!-- fingerprinting:phantom:triton:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment thread .ai/spec/how/quota.md
Comment on lines +22 to +39
```
process_request()
-> check_tokens_available(config.quota_limiters, user_id)
for each limiter:
limiter.ensure_available_quota(subject_id=user_id)
SELECT available FROM quota_limits WHERE id=user_id AND subject='u'
if available <= 0: raise QuotaExceedError -> HTTP 500
... LLM processing ...
-> consume_tokens(config.quota_limiters, config.token_usage_history, ...)
token_usage_history.consume_tokens(user_id, provider, model, in, out) [if enabled]
INSERT ... ON CONFLICT DO UPDATE SET input_tokens=input_tokens+N
for each limiter:
limiter.consume_tokens(input_tokens, output_tokens, subject_id=user_id)
UPDATE quota_limits SET available=available-(in+out) WHERE id=user_id AND subject='u'
-> get_available_quotas(config.quota_limiters, user_id)
for each limiter: limiter.available_quota(user_id) -> int
returns {"UserQuotaLimiter": 450, "ClusterQuotaLimiter": 8950}
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify fence languages for flow diagrams.

These code fences are missing language identifiers and trigger MD040.

Suggested diff
-```
+```text
 process_request()
   -> check_tokens_available(config.quota_limiters, user_id)
        for each limiter:
          limiter.ensure_available_quota(subject_id=user_id)
            SELECT available FROM quota_limits WHERE id=user_id AND subject='u'
            if available <= 0: raise QuotaExceedError -> HTTP 500
   ... LLM processing ...
   -> consume_tokens(config.quota_limiters, config.token_usage_history, ...)
        token_usage_history.consume_tokens(user_id, provider, model, in, out)  [if enabled]
          INSERT ... ON CONFLICT DO UPDATE SET input_tokens=input_tokens+N
        for each limiter:
          limiter.consume_tokens(input_tokens, output_tokens, subject_id=user_id)
            UPDATE quota_limits SET available=available-(in+out) WHERE id=user_id AND subject='u'
   -> get_available_quotas(config.quota_limiters, user_id)
        for each limiter: limiter.available_quota(user_id) -> int
        returns {"UserQuotaLimiter": 450, "ClusterQuotaLimiter": 8950}

@@
- +text
start_quota_scheduler(config) [called from runner.py at startup]
-> Thread(target=quota_scheduler, daemon=True).start()
while True:
for each limiter in config:
quota_revocation(connection, name, limiter)
if increase_by configured:
UPDATE quota_limits SET available=available+N
WHERE subject='u'|'c' AND revoked_at < NOW() - INTERVAL period
if initial_quota configured:
UPDATE quota_limits SET available=initial_quota
WHERE subject='u'|'c' AND revoked_at < NOW() - INTERVAL period
sleep(config.scheduler.period)

Also applies to: 43-56

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 22-22: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/how/quota.md around lines 22 - 39, The fenced flow diagrams (e.g.,
the blocks showing process_request(), start_quota_scheduler(),
quota_scheduler(), token_usage_history.consume_tokens(),
limiter.ensure_available_quota(), limiter.consume_tokens(), and
limiter.available_quota()) are missing language identifiers and trigger MD040;
update each triple-backtick fence to include a language tag such as "text" (or
"mermaid" if appropriate) so they comply with the linter (e.g., change ``` to
```text for the blocks around process_request() and start_quota_scheduler()),
and apply the same fix to the other affected fences referenced in the comment
(lines 43-56).

Comment thread ARCHITECTURE.md
Comment on lines +77 to +82
loop Tool-calling rounds (up to max_iterations)
LLM-->>DocsSummarizer: Tool call request
DocsSummarizer->>MCP: Execute tool
MCP-->>DocsSummarizer: Tool result
DocsSummarizer->>LLM: Feed result, re-invoke
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move tool-calling loop ownership from DocsSummarizer to LLMExecutionAgent in this diagram.

This sequence still shows DocsSummarizer executing the multi-round tool loop, which conflicts with the documented refactor where that loop lives in LLMExecutionAgent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ARCHITECTURE.md` around lines 77 - 82, The loop in the sequence diagram
incorrectly shows DocsSummarizer performing the multi-round tool-calling loop;
move ownership to LLMExecutionAgent by changing the loop block to surround
LLMExecutionAgent and update the message flow so the LLM sends a tool-call
request to LLMExecutionAgent (e.g., "LLM-->>LLMExecutionAgent: Tool call
request"), LLMExecutionAgent executes the tool via DocsSummarizer
("LLMExecutionAgent->>DocsSummarizer: Execute tool"), DocsSummarizer returns
results to LLMExecutionAgent ("DocsSummarizer-->>LLMExecutionAgent: Tool
result"), and LLMExecutionAgent feeds results back to LLM and re-invokes the
loop (loop up to max_iterations).

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 29, 2026

@xrajesh: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant