test(e2e): add e2e test suite for knowledge and skills components#232
test(e2e): add e2e test suite for knowledge and skills components#232Iftach-Shoham wants to merge 3 commits into
Conversation
Implements the full test coverage requested in issue #206. **What's covered** Knowledge (`tests/e2e/test_knowledge_e2e.py`): - Tier 1 – KnowledgeEngine lifecycle: ingest → search → delete → replace cycle - Tier 1 – Awareness path: get_knowledge_summary() output and format_knowledge_context() collection naming / sanitization - Tier 1 – RAG retrieval path: RealSearchKnowledgeToolProvider calls engine.search() directly, asserts ingested content is returned by the tool - Tier 2 – CugaLiteGraph integration: agent-scoped and session-scoped knowledge summaries injected into the system prompt; knowledge + skills coexistence Skills (`tests/e2e/test_skills_e2e.py`): - Tier 1 – Discovery: discover_skills() finds all skills, preserves description, parses pip and npm requirements - Tier 1 – Registry: load_skill() body ordering, uv/npm install commands, mixed deps, normalization hint - Tier 1 – Tools/block: create_skill_tools() returns load_skill; format_available_skills_block() lists names and descriptions - Tier 2 – CugaLiteGraph integration: skills block appears in system prompt; load_skill bound to model in native tool-calling mode - Blocked: test_skill_executed_via_sub_agent skipped pending #199 **Shared infrastructure (`tests/e2e/conftest.py`)** - CaptureChatModel: hermetic mock LLM that records inputs and replays scripted responses - KnowledgeToolProvider: stub search tool (triggers awareness detection path) - RealSearchKnowledgeToolProvider: real search tool backed by KnowledgeEngine (covers RAG path) - knowledge_engine fixture: isolated fastembed + sqlite-vec engine in tmp_path Closes #206
📝 WalkthroughWalkthroughAdds an end-to-end test harness (mock chat model, tool providers, helpers, fixture) and comprehensive E2E tests for knowledge ingest/search and skills discovery/integration, exercising both component behaviors and CugaLite graph prompt injection. ChangesKnowledge and Skills E2E Tests
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/e2e/test_knowledge_e2e.py (2)
4-7: ⚡ Quick winReplace EN DASH characters to clear Ruff warnings.
Line 4, Line 6, Line 53, Line 107, Line 150, and Line 204 use
–, which triggersRUF002/RUF003. Replace with plain-to keep lint clean.Proposed diff
- Tier 1 – component-level (no LLM, no graph): exercises KnowledgeEngine's public + Tier 1 - component-level (no LLM, no graph): exercises KnowledgeEngine's public - Tier 2 – graph-level (CaptureChatModel): runs CugaLite with a mock LLM and asserts + Tier 2 - graph-level (CaptureChatModel): runs CugaLite with a mock LLM and asserts -# Tier 1 – KnowledgeEngine lifecycle +# Tier 1 - KnowledgeEngine lifecycle -# Tier 1 – Knowledge awareness (prompt-injection path) +# Tier 1 - Knowledge awareness (prompt-injection path) -# Tier 1 – RAG retrieval path (tool boundary) +# Tier 1 - RAG retrieval path (tool boundary) -# Tier 2 – CugaLite graph integration +# Tier 2 - CugaLite graph integrationAlso applies to: 53-53, 107-107, 150-150, 204-204
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/test_knowledge_e2e.py` around lines 4 - 7, Replace all EN DASH characters (–) with ASCII hyphens (-) in the test docstrings/descriptions to silence Ruff RUF002/RUF003 warnings: update the strings such as "Tier 1 – component-level (no LLM, no graph)" and "Tier 2 – graph-level (CaptureChatModel)" and any other occurrences on the noted lines so they use "-" instead of "–". Search for the literal "–" in tests/e2e/test_knowledge_e2e.py (e.g., the Tier lines and the other flagged lines) and perform a simple character substitution, then run the linter to confirm warnings are cleared.
172-175: ⚡ Quick winAvoid positional tool selection in RAG tests.
Using
tools[0]makes these tests order-dependent. Select the knowledge tool by name to prevent brittle failures if provider ordering changes.Proposed diff
provider = RealSearchKnowledgeToolProvider(engine, _COLLECTION) tools = await provider.get_all_tools() - knowledge_tool = tools[0] + knowledge_tool = next( + (t for t in tools if getattr(t, "name", "") == "knowledge_search_knowledge"), + None, + ) + assert knowledge_tool is not None, "knowledge_search_knowledge tool not found" @@ provider = RealSearchKnowledgeToolProvider(engine, _COLLECTION) tools = await provider.get_all_tools() - knowledge_tool = tools[0] + knowledge_tool = next( + (t for t in tools if getattr(t, "name", "") == "knowledge_search_knowledge"), + None, + ) + assert knowledge_tool is not None, "knowledge_search_knowledge tool not found"Also applies to: 193-195
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/test_knowledge_e2e.py` around lines 172 - 175, The test currently selects a knowledge tool by position (tools[0]) which is order-dependent; instead, locate the correct tool returned by RealSearchKnowledgeToolProvider.get_all_tools() by matching its name/identifier (e.g., filter for tool.name or tool.id that matches the expected knowledge tool) and assign that result to knowledge_tool; make the same change for the other occurrence that uses positional selection so both tests pick the tool by explicit name rather than by index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/conftest.py`:
- Around line 236-238: The teardown should guarantee engine.shutdown() runs even
if await engine.aclose() raises: wrap the await engine.aclose() call in a
try/finally (or call engine.shutdown() from a finally block) so that
engine.shutdown() is always invoked; if you need to preserve the original
exception, re-raise it after shutdown or log the aclose() error before
re-raising so resources are not leaked (refer to the engine.aclose() and
engine.shutdown() calls).
---
Nitpick comments:
In `@tests/e2e/test_knowledge_e2e.py`:
- Around line 4-7: Replace all EN DASH characters (–) with ASCII hyphens (-) in
the test docstrings/descriptions to silence Ruff RUF002/RUF003 warnings: update
the strings such as "Tier 1 – component-level (no LLM, no graph)" and "Tier 2 –
graph-level (CaptureChatModel)" and any other occurrences on the noted lines so
they use "-" instead of "–". Search for the literal "–" in
tests/e2e/test_knowledge_e2e.py (e.g., the Tier lines and the other flagged
lines) and perform a simple character substitution, then run the linter to
confirm warnings are cleared.
- Around line 172-175: The test currently selects a knowledge tool by position
(tools[0]) which is order-dependent; instead, locate the correct tool returned
by RealSearchKnowledgeToolProvider.get_all_tools() by matching its
name/identifier (e.g., filter for tool.name or tool.id that matches the expected
knowledge tool) and assign that result to knowledge_tool; make the same change
for the other occurrence that uses positional selection so both tests pick the
tool by explicit name rather than by index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 364c908c-72b0-4424-b9c4-177311503a5e
📒 Files selected for processing (4)
tests/e2e/__init__.pytests/e2e/conftest.pytests/e2e/test_knowledge_e2e.pytests/e2e/test_skills_e2e.py
- conftest.py: wrap engine.aclose() in try/finally so shutdown() is always called even if aclose() raises (resource leak fix) - test_knowledge_e2e.py: replace EN dash characters with ASCII hyphens in docstrings/comments to silence Ruff RUF002/RUF003 - test_knowledge_e2e.py: select knowledge_search_knowledge tool by name instead of positional index tools[0] to avoid order-dependent failures
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/test_skills_e2e.py`:
- Line 99: Replace hardcoded "/tmp" string literals in the test fixtures with a
temporary directory fixture; locate instances like
source=f"/tmp/{name}/SKILL.md" in tests/e2e/test_skills_e2e.py and change them
to use the pytest tmp_path (or tempfile.gettempdir/tmp_path_factory) to
construct paths (e.g., str(tmp_path / name / "SKILL.md")), and update the other
occurrences on the same file (lines referencing "/tmp/...") similarly so all
test paths are built from the tmp_path/tempfile API instead of a hardcoded
"/tmp".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ba2ee880-4eb2-40ad-af65-317306217a57
📒 Files selected for processing (3)
tests/e2e/conftest.pytests/e2e/test_knowledge_e2e.pytests/e2e/test_skills_e2e.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/e2e/test_knowledge_e2e.py
- tests/e2e/conftest.py
| name=name, | ||
| description=f"{name} description", | ||
| body=body, | ||
| source=f"/tmp/{name}/SKILL.md", |
There was a problem hiding this comment.
Replace hardcoded /tmp literals to avoid Ruff S108 CI failures.
Line 99, Line 164, Line 176, Line 177, and Line 187 use hardcoded /tmp/... paths in test fixtures. Ruff flags these as errors (S108), which can block CI even though these are test-only strings.
Proposed patch
- source=f"/tmp/{name}/SKILL.md",
+ source=f"skills/{name}/SKILL.md",- source="/tmp/SKILL.md",
+ source="skills/SKILL.md",- SkillEntry("alpha", "Alpha skill", "## Body", "/tmp/a/SKILL.md", ()),
- SkillEntry("beta", "Beta skill", "## Body", "/tmp/b/SKILL.md", ()),
+ SkillEntry("alpha", "Alpha skill", "## Body", "skills/a/SKILL.md", ()),
+ SkillEntry("beta", "Beta skill", "## Body", "skills/b/SKILL.md", ()),- entry = SkillEntry("gamma", "Gamma makes reports", "## Body", "/tmp/g/SKILL.md", ())
+ entry = SkillEntry("gamma", "Gamma makes reports", "## Body", "skills/g/SKILL.md", ())Also applies to: 164-164, 176-177, 187-187
🧰 Tools
🪛 Ruff (0.15.12)
[error] 99-99: Probable insecure usage of temporary file or directory: "/tmp/"
(S108)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/e2e/test_skills_e2e.py` at line 99, Replace hardcoded "/tmp" string
literals in the test fixtures with a temporary directory fixture; locate
instances like source=f"/tmp/{name}/SKILL.md" in tests/e2e/test_skills_e2e.py
and change them to use the pytest tmp_path (or
tempfile.gettempdir/tmp_path_factory) to construct paths (e.g., str(tmp_path /
name / "SKILL.md")), and update the other occurrences on the same file (lines
referencing "/tmp/...") similarly so all test paths are built from the
tmp_path/tempfile API instead of a hardcoded "/tmp".
Summary
Closes #206.
Adds a new
tests/e2e/directory with full end-to-end test coverage for the knowledge and skills components as requested in the issue. Tests are structured in two tiers per component: Tier 1 exercises the component API directly (no LLM, no graph — fast and surgical), Tier 2 runs the real compiledCugaLiteGraphwith a hermetic mock LLM and asserts on what actually reached the model.What's covered
Knowledge (
test_knowledge_e2e.py) — 12 testsTestKnowledgeEngineLifecycleTestKnowledgeAwarenessget_knowledge_summary()output;format_knowledge_context()naming and special-char sanitizationTestKnowledgeRagPathknowledge_search_knowledgetool viaRealSearchKnowledgeToolProvidercallsengine.search()and returns ingested content; stale chunk not returned after replaceTestKnowledgeCugaLiteIntegrationSkills (
test_skills_e2e.py) — 15 tests + 1 skipTestSkillDiscoverydiscover_skills()finds single/multiple skills, preserves description, parses pip and npm requirementsTestSkillRegistryload_skill()body order,uv pip install/npm installcommands, mixed deps (STEP 1 before STEP 2), normalization hintTestSkillToolsAndBlockcreate_skill_tools()returnsload_skill; block contains names and descriptionsTestSkillsCugaLiteIntegrationload_skillbound to model in native tool-calling modeTestSkillsBlockedPathstest_skill_executed_via_sub_agent— skipped, blocked on #199Shared infrastructure (
conftest.py)CaptureChatModel— hermetic mock LLM: records every message list and tool binding, replays scripted responses, raises on queue exhaustion (catches unexpected graph loops)KnowledgeToolProvider— stubknowledge_search_knowledgetool (triggers thestartswith("knowledge_")detection incuga_lite_graph.py:1493) without requiring a running MCP serverRealSearchKnowledgeToolProvider— realknowledge_search_knowledgetool backed byengine.search(), covers the RAG retrieval pathknowledge_enginefixture — isolated fastembed + sqlite-vec engine intmp_path, monkeypatched off any real DBTest plan
uv run pytest tests/e2e/test_knowledge_e2e.py -v— 12 passeduv run pytest tests/e2e/test_skills_e2e.py -v— 15 passed, 1 skipped (expected, blocked on [Feature] Skills should be able to spawn sub-agents at runtime via use_sub_agents #199)Summary by CodeRabbit