Skip to content

test(e2e): add e2e test suite for knowledge and skills components#232

Draft
Iftach-Shoham wants to merge 3 commits into
mainfrom
test/206-e2e-test-for-knowledge-and-skills
Draft

test(e2e): add e2e test suite for knowledge and skills components#232
Iftach-Shoham wants to merge 3 commits into
mainfrom
test/206-e2e-test-for-knowledge-and-skills

Conversation

@Iftach-Shoham
Copy link
Copy Markdown
Collaborator

@Iftach-Shoham Iftach-Shoham commented May 17, 2026

Summary

Closes #206.

Adds a new tests/e2e/ directory with full end-to-end test coverage for the knowledge and skills components as requested in the issue. Tests are structured in two tiers per component: Tier 1 exercises the component API directly (no LLM, no graph — fast and surgical), Tier 2 runs the real compiled CugaLiteGraph with a hermetic mock LLM and asserts on what actually reached the model.


What's covered

Knowledge (test_knowledge_e2e.py) — 12 tests

Class What it tests
TestKnowledgeEngineLifecycle ingest → search → delete → re-ingest (stale chunk purge)
TestKnowledgeAwareness get_knowledge_summary() output; format_knowledge_context() naming and special-char sanitization
TestKnowledgeRagPath real knowledge_search_knowledge tool via RealSearchKnowledgeToolProvider calls engine.search() and returns ingested content; stale chunk not returned after replace
TestKnowledgeCugaLiteIntegration agent-scoped knowledge in system prompt; session-scoped knowledge in system prompt; knowledge + skills coexistence — all assert both filename and document content appear

Skills (test_skills_e2e.py) — 15 tests + 1 skip

Class What it tests
TestSkillDiscovery discover_skills() finds single/multiple skills, preserves description, parses pip and npm requirements
TestSkillRegistry load_skill() body order, uv pip install / npm install commands, mixed deps (STEP 1 before STEP 2), normalization hint
TestSkillToolsAndBlock create_skill_tools() returns load_skill; block contains names and descriptions
TestSkillsCugaLiteIntegration skill name in system prompt; load_skill bound to model in native tool-calling mode
TestSkillsBlockedPaths test_skill_executed_via_sub_agent — skipped, blocked on #199

Shared infrastructure (conftest.py)

  • CaptureChatModel — hermetic mock LLM: records every message list and tool binding, replays scripted responses, raises on queue exhaustion (catches unexpected graph loops)
  • KnowledgeToolProvider — stub knowledge_search_knowledge tool (triggers the startswith("knowledge_") detection in cuga_lite_graph.py:1493) without requiring a running MCP server
  • RealSearchKnowledgeToolProvider — real knowledge_search_knowledge tool backed by engine.search(), covers the RAG retrieval path
  • knowledge_engine fixture — isolated fastembed + sqlite-vec engine in tmp_path, monkeypatched off any real DB

Test plan

Summary by CodeRabbit

  • Tests
    • Added end-to-end tests for knowledge ingestion, search, deletion, re-ingest, summaries, and RAG retrieval via tool boundaries.
    • Added end-to-end tests for skills discovery, requirement parsing, load-order, registry/tool behavior, and graph integration asserting system prompts include discovered skills.
    • Added shared e2e fixtures/helpers for isolated engine setup, task polling, skill file creation, and a mock chat model to capture system prompts.

Review Change Stack

Implements the full test coverage requested in issue #206.

**What's covered**

Knowledge (`tests/e2e/test_knowledge_e2e.py`):
- Tier 1 – KnowledgeEngine lifecycle: ingest → search → delete → replace cycle
- Tier 1 – Awareness path: get_knowledge_summary() output and format_knowledge_context() collection naming / sanitization
- Tier 1 – RAG retrieval path: RealSearchKnowledgeToolProvider calls engine.search() directly, asserts ingested content is returned by the tool
- Tier 2 – CugaLiteGraph integration: agent-scoped and session-scoped knowledge summaries injected into the system prompt; knowledge + skills coexistence

Skills (`tests/e2e/test_skills_e2e.py`):
- Tier 1 – Discovery: discover_skills() finds all skills, preserves description, parses pip and npm requirements
- Tier 1 – Registry: load_skill() body ordering, uv/npm install commands, mixed deps, normalization hint
- Tier 1 – Tools/block: create_skill_tools() returns load_skill; format_available_skills_block() lists names and descriptions
- Tier 2 – CugaLiteGraph integration: skills block appears in system prompt; load_skill bound to model in native tool-calling mode
- Blocked: test_skill_executed_via_sub_agent skipped pending #199

**Shared infrastructure (`tests/e2e/conftest.py`)**
- CaptureChatModel: hermetic mock LLM that records inputs and replays scripted responses
- KnowledgeToolProvider: stub search tool (triggers awareness detection path)
- RealSearchKnowledgeToolProvider: real search tool backed by KnowledgeEngine (covers RAG path)
- knowledge_engine fixture: isolated fastembed + sqlite-vec engine in tmp_path

Closes #206
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

📝 Walkthrough

Walkthrough

Adds an end-to-end test harness (mock chat model, tool providers, helpers, fixture) and comprehensive E2E tests for knowledge ingest/search and skills discovery/integration, exercising both component behaviors and CugaLite graph prompt injection.

Changes

Knowledge and Skills E2E Tests

Layer / File(s) Summary
E2E test harness: mock LLM and tool providers
tests/e2e/conftest.py
Adds CaptureChatModel to record/replay LLM messages and tool-provider classes (MinimalToolProvider, KnowledgeToolProvider, RealSearchKnowledgeToolProvider) for stubbed and real knowledge search tooling.
E2E test utilities and fixture
tests/e2e/conftest.py
Adds write_skill, poll_task, extract_system_content, and an async knowledge_engine pytest fixture that provisions an isolated sqlite-backed KnowledgeEngine and performs warmup/teardown.
Knowledge component end-to-end tests
tests/e2e/test_knowledge_e2e.py
Tier 1 tests for ingest/search/delete/replace, knowledge-summary and collection formatting; RAG/tool-boundary tests via RealSearchKnowledgeToolProvider; Tier 2 CugaLite integration tests asserting agent- and session-scoped knowledge (and combined knowledge+skill) are injected into system prompts.
Skills component end-to-end tests
tests/e2e/test_skills_e2e.py
Tier 1 discovery and registry tests for .cuga skills, requirement parsing (pip/npm), load_skill output and formatting; Tier 2 CugaLite integration asserting skill names and native load_skill bindings appear in captured system prompts; includes a skipped placeholder test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through tests with tiny paws,

Mocked replies and neat applause,
Ingested notes and skills that sing,
Prompts arranged for everything,
A cheerful rabbit signs the clause.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly summarizes the main change: adding end-to-end tests for knowledge and skills components, matching the changeset's three test files and shared fixtures.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #206: hermetic e2e test suites for knowledge and skills, knowledge lifecycle testing with isolated fixture, RAG retrieval validation, skill discovery/loading verification, shared test infrastructure, and handling of blocked features.
Out of Scope Changes check ✅ Passed All changes are within scope of issue #206: test infrastructure (conftest.py) and two test modules (knowledge and skills e2e tests) with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/206-e2e-test-for-knowledge-and-skills

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/e2e/test_knowledge_e2e.py (2)

4-7: ⚡ Quick win

Replace EN DASH characters to clear Ruff warnings.

Line 4, Line 6, Line 53, Line 107, Line 150, and Line 204 use , which triggers RUF002/RUF003. Replace with plain - to keep lint clean.

Proposed diff
-  Tier 1 – component-level (no LLM, no graph): exercises KnowledgeEngine's public
+  Tier 1 - component-level (no LLM, no graph): exercises KnowledgeEngine's public
-  Tier 2 – graph-level (CaptureChatModel): runs CugaLite with a mock LLM and asserts
+  Tier 2 - graph-level (CaptureChatModel): runs CugaLite with a mock LLM and asserts
-# Tier 1 – KnowledgeEngine lifecycle
+# Tier 1 - KnowledgeEngine lifecycle
-# Tier 1 – Knowledge awareness (prompt-injection path)
+# Tier 1 - Knowledge awareness (prompt-injection path)
-# Tier 1 – RAG retrieval path (tool boundary)
+# Tier 1 - RAG retrieval path (tool boundary)
-# Tier 2 – CugaLite graph integration
+# Tier 2 - CugaLite graph integration

Also applies to: 53-53, 107-107, 150-150, 204-204

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/test_knowledge_e2e.py` around lines 4 - 7, Replace all EN DASH
characters (–) with ASCII hyphens (-) in the test docstrings/descriptions to
silence Ruff RUF002/RUF003 warnings: update the strings such as "Tier 1 –
component-level (no LLM, no graph)" and "Tier 2 – graph-level
(CaptureChatModel)" and any other occurrences on the noted lines so they use "-"
instead of "–". Search for the literal "–" in tests/e2e/test_knowledge_e2e.py
(e.g., the Tier lines and the other flagged lines) and perform a simple
character substitution, then run the linter to confirm warnings are cleared.

172-175: ⚡ Quick win

Avoid positional tool selection in RAG tests.

Using tools[0] makes these tests order-dependent. Select the knowledge tool by name to prevent brittle failures if provider ordering changes.

Proposed diff
         provider = RealSearchKnowledgeToolProvider(engine, _COLLECTION)
         tools = await provider.get_all_tools()
-        knowledge_tool = tools[0]
+        knowledge_tool = next(
+            (t for t in tools if getattr(t, "name", "") == "knowledge_search_knowledge"),
+            None,
+        )
+        assert knowledge_tool is not None, "knowledge_search_knowledge tool not found"
@@
         provider = RealSearchKnowledgeToolProvider(engine, _COLLECTION)
         tools = await provider.get_all_tools()
-        knowledge_tool = tools[0]
+        knowledge_tool = next(
+            (t for t in tools if getattr(t, "name", "") == "knowledge_search_knowledge"),
+            None,
+        )
+        assert knowledge_tool is not None, "knowledge_search_knowledge tool not found"

Also applies to: 193-195

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/test_knowledge_e2e.py` around lines 172 - 175, The test currently
selects a knowledge tool by position (tools[0]) which is order-dependent;
instead, locate the correct tool returned by
RealSearchKnowledgeToolProvider.get_all_tools() by matching its name/identifier
(e.g., filter for tool.name or tool.id that matches the expected knowledge tool)
and assign that result to knowledge_tool; make the same change for the other
occurrence that uses positional selection so both tests pick the tool by
explicit name rather than by index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/conftest.py`:
- Around line 236-238: The teardown should guarantee engine.shutdown() runs even
if await engine.aclose() raises: wrap the await engine.aclose() call in a
try/finally (or call engine.shutdown() from a finally block) so that
engine.shutdown() is always invoked; if you need to preserve the original
exception, re-raise it after shutdown or log the aclose() error before
re-raising so resources are not leaked (refer to the engine.aclose() and
engine.shutdown() calls).

---

Nitpick comments:
In `@tests/e2e/test_knowledge_e2e.py`:
- Around line 4-7: Replace all EN DASH characters (–) with ASCII hyphens (-) in
the test docstrings/descriptions to silence Ruff RUF002/RUF003 warnings: update
the strings such as "Tier 1 – component-level (no LLM, no graph)" and "Tier 2 –
graph-level (CaptureChatModel)" and any other occurrences on the noted lines so
they use "-" instead of "–". Search for the literal "–" in
tests/e2e/test_knowledge_e2e.py (e.g., the Tier lines and the other flagged
lines) and perform a simple character substitution, then run the linter to
confirm warnings are cleared.
- Around line 172-175: The test currently selects a knowledge tool by position
(tools[0]) which is order-dependent; instead, locate the correct tool returned
by RealSearchKnowledgeToolProvider.get_all_tools() by matching its
name/identifier (e.g., filter for tool.name or tool.id that matches the expected
knowledge tool) and assign that result to knowledge_tool; make the same change
for the other occurrence that uses positional selection so both tests pick the
tool by explicit name rather than by index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 364c908c-72b0-4424-b9c4-177311503a5e

📥 Commits

Reviewing files that changed from the base of the PR and between 5a38799 and bc13f53.

📒 Files selected for processing (4)
  • tests/e2e/__init__.py
  • tests/e2e/conftest.py
  • tests/e2e/test_knowledge_e2e.py
  • tests/e2e/test_skills_e2e.py

Comment thread tests/e2e/conftest.py Outdated
Iftach Shoham added 2 commits May 17, 2026 14:16
- conftest.py: wrap engine.aclose() in try/finally so shutdown() is always
  called even if aclose() raises (resource leak fix)
- test_knowledge_e2e.py: replace EN dash characters with ASCII hyphens in
  docstrings/comments to silence Ruff RUF002/RUF003
- test_knowledge_e2e.py: select knowledge_search_knowledge tool by name instead
  of positional index tools[0] to avoid order-dependent failures
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/test_skills_e2e.py`:
- Line 99: Replace hardcoded "/tmp" string literals in the test fixtures with a
temporary directory fixture; locate instances like
source=f"/tmp/{name}/SKILL.md" in tests/e2e/test_skills_e2e.py and change them
to use the pytest tmp_path (or tempfile.gettempdir/tmp_path_factory) to
construct paths (e.g., str(tmp_path / name / "SKILL.md")), and update the other
occurrences on the same file (lines referencing "/tmp/...") similarly so all
test paths are built from the tmp_path/tempfile API instead of a hardcoded
"/tmp".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba2ee880-4eb2-40ad-af65-317306217a57

📥 Commits

Reviewing files that changed from the base of the PR and between f433c8b and 7e2734f.

📒 Files selected for processing (3)
  • tests/e2e/conftest.py
  • tests/e2e/test_knowledge_e2e.py
  • tests/e2e/test_skills_e2e.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/e2e/test_knowledge_e2e.py
  • tests/e2e/conftest.py

name=name,
description=f"{name} description",
body=body,
source=f"/tmp/{name}/SKILL.md",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace hardcoded /tmp literals to avoid Ruff S108 CI failures.

Line 99, Line 164, Line 176, Line 177, and Line 187 use hardcoded /tmp/... paths in test fixtures. Ruff flags these as errors (S108), which can block CI even though these are test-only strings.

Proposed patch
-            source=f"/tmp/{name}/SKILL.md",
+            source=f"skills/{name}/SKILL.md",
-            source="/tmp/SKILL.md",
+            source="skills/SKILL.md",
-            SkillEntry("alpha", "Alpha skill", "## Body", "/tmp/a/SKILL.md", ()),
-            SkillEntry("beta", "Beta skill", "## Body", "/tmp/b/SKILL.md", ()),
+            SkillEntry("alpha", "Alpha skill", "## Body", "skills/a/SKILL.md", ()),
+            SkillEntry("beta", "Beta skill", "## Body", "skills/b/SKILL.md", ()),
-        entry = SkillEntry("gamma", "Gamma makes reports", "## Body", "/tmp/g/SKILL.md", ())
+        entry = SkillEntry("gamma", "Gamma makes reports", "## Body", "skills/g/SKILL.md", ())

Also applies to: 164-164, 176-177, 187-187

🧰 Tools
🪛 Ruff (0.15.12)

[error] 99-99: Probable insecure usage of temporary file or directory: "/tmp/"

(S108)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/test_skills_e2e.py` at line 99, Replace hardcoded "/tmp" string
literals in the test fixtures with a temporary directory fixture; locate
instances like source=f"/tmp/{name}/SKILL.md" in tests/e2e/test_skills_e2e.py
and change them to use the pytest tmp_path (or
tempfile.gettempdir/tmp_path_factory) to construct paths (e.g., str(tmp_path /
name / "SKILL.md")), and update the other occurrences on the same file (lines
referencing "/tmp/...") similarly so all test paths are built from the
tmp_path/tempfile API instead of a hardcoded "/tmp".

@sami-marreed sami-marreed marked this pull request as draft May 17, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Test] Add e2e tests for skills components

1 participant