feat(mcp): Add Kapa knowledge search tool#1033
feat(mcp): Add Kapa knowledge search tool#1033Aaron ("AJ") Steers (aaronsteers) wants to merge 2 commits into
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1779842438-kapa-replication-mcp' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1779842438-kapa-replication-mcp'PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful ResourcesCommunity SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
| os.getenv("KAPA_API_KEY") | ||
| or os.getenv("KAPA_DOCS_MCP_BEARER_TOKEN") | ||
| or os.getenv("KAPA_BEARER_TOKEN") |
There was a problem hiding this comment.
We already have proper helpers and patterns for getting secrets from env vars (and other sources). Use existing code paths.
There was a problem hiding this comment.
Updated in commit 7ed7611 to route Kapa config reads through PyAirbyte's existing secret helper path instead of direct os.getenv() access.
| def _kapa_auth_headers() -> dict[str, str]: | ||
| api_key = (os.getenv("KAPA_API_KEY") or "").strip() | ||
| if api_key: | ||
| return {"X-API-KEY": api_key} | ||
|
|
||
| bearer_token = ( | ||
| (os.getenv("KAPA_DOCS_MCP_BEARER_TOKEN") or os.getenv("KAPA_BEARER_TOKEN")) or "" | ||
| ).strip() | ||
| if bearer_token: | ||
| return {"Authorization": f"Bearer {bearer_token}"} | ||
|
|
||
| raise ValueError( | ||
| "Kapa docs search is not configured. Set KAPA_API_KEY, " | ||
| "KAPA_DOCS_MCP_BEARER_TOKEN, or KAPA_BEARER_TOKEN." | ||
| ) |
There was a problem hiding this comment.
Keep this DRY and call the other helper, or take its output.
There was a problem hiding this comment.
Updated in commit 7ed7611. _kapa_auth_headers() now uses the same _kapa_config_value() helper as registration and payload construction, so the credential lookup and fallback order are centralized.
📝 WalkthroughWalkthroughAdds a Kapa-backed MCP tool that reads credentials and project configuration from environment variables, builds auth headers and a POST payload, calls Kapa’s retrieval API with a fixed timeout, normalizes the JSON response to a list of {source_url, content} records, and conditionally registers the tool with the MCP server. Tests cover auth, request/response, and registration behavior. ChangesKapa Search Tool
Kapa Tool Test Suite
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unit_tests/test_mcp_kapa.py (1)
88-98: ⚡ Quick winCould we parametrize the positive registration test across all supported credential env vars (Line 88), not just
KAPA_API_KEY, to lock in the full contract from_kapa_credentials_configured()?That would catch regressions if one credential path stops enabling registration. wdyt?
Suggested patch
-def test_register_kapa_tools_registers_when_credentials_are_configured( - monkeypatch: pytest.MonkeyPatch, -) -> None: +@pytest.mark.parametrize( + "env_name", + ["KAPA_API_KEY", "KAPA_DOCS_MCP_BEARER_TOKEN", "KAPA_BEARER_TOKEN"], +) +def test_register_kapa_tools_registers_when_credentials_are_configured( + monkeypatch: pytest.MonkeyPatch, + env_name: str, +) -> None: """Test that Kapa tools are visible when credentials are configured.""" app = MagicMock() - monkeypatch.setenv("KAPA_API_KEY", "secret") + monkeypatch.setenv(env_name, "secret") with patch("airbyte.mcp.kapa.register_mcp_tools") as register: kapa.register_kapa_tools(app)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit_tests/test_mcp_kapa.py` around lines 88 - 98, Update the test_register_kapa_tools_registers_when_credentials_are_configured to parametrize over all credential environment variable names used by the module instead of only KAPA_API_KEY: call or import the helper that lists required env vars (e.g. _kapa_credentials_configured or its underlying constant) and use pytest.mark.parametrize on those names, then for each param set monkeypatch.setenv(var, "secret") before calling kapa.register_kapa_tools(app) and assert register_mcp_tools was called with (app, mcp_module="airbyte.mcp.kapa"); this ensures register_kapa_tools and register_mcp_tools are exercised for every supported credential env var.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@airbyte/mcp/kapa.py`:
- Around line 109-112: The register_kapa_tools function currently registers Kapa
tools when _kapa_credentials_configured() is true but still errors later if
KAPA_PROJECT_ID is missing; change register_kapa_tools to require both
credentials and a configured project id (check KAPA_PROJECT_ID via the same
config/env accessor used elsewhere or os.getenv("KAPA_PROJECT_ID")/a helper)
before calling register_mcp_tools; reference register_kapa_tools,
_kapa_credentials_configured, and _kapa_retrieval_url so you add the extra
project-id guard in the same function to avoid registering a tool that will fail
at runtime.
---
Nitpick comments:
In `@tests/unit_tests/test_mcp_kapa.py`:
- Around line 88-98: Update the
test_register_kapa_tools_registers_when_credentials_are_configured to
parametrize over all credential environment variable names used by the module
instead of only KAPA_API_KEY: call or import the helper that lists required env
vars (e.g. _kapa_credentials_configured or its underlying constant) and use
pytest.mark.parametrize on those names, then for each param set
monkeypatch.setenv(var, "secret") before calling kapa.register_kapa_tools(app)
and assert register_mcp_tools was called with (app,
mcp_module="airbyte.mcp.kapa"); this ensures register_kapa_tools and
register_mcp_tools are exercised for every supported credential env var.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 07354652-62f8-4b0c-9c09-e18d218656ea
📒 Files selected for processing (3)
airbyte/mcp/kapa.pyairbyte/mcp/server.pytests/unit_tests/test_mcp_kapa.py
|
CodeRabbit's remaining docstring coverage item is a non-blocking warning from its optional pre-merge checks, not a concrete behavioral issue. I’m leaving the helper functions terse to match the repo’s style and avoid adding boilerplate docstrings. |
|
Correction to the local verification list: after the review feedback update, the targeted Kapa suite has 8 tests due to the credential-registration parametrization. The PR description was edited independently, so I did not overwrite it. |
|
Addressed the CodeRabbit nitpick about registration coverage in commit 7ed7611. The positive registration test is now parametrized over all supported Kapa credential env vars. I couldn’t reply directly to the outdated inline thread because GitHub no longer exposes the parent comment. |
Summary
Adds the
search_airbyte_knowledge_sources(query: str)MCP tool to the Airbyte Replication MCP server using the strict one-argument Kapa MCP-compatible signature approved by AJ Steers.The tool is registered only when both a Kapa credential env var is present (
KAPA_API_KEY,KAPA_DOCS_MCP_BEARER_TOKEN, orKAPA_BEARER_TOKEN) andKAPA_PROJECT_IDis configured. It wraps the Kapa Retrieval REST API, optionally passesKAPA_INTEGRATION_ID, and normalizes responses to[{"source_url": ..., "content": ...}].The Kapa configuration reads through PyAirbyte's existing secret helper path rather than direct environment access, so env vars,
.env, and registered secret managers follow the same lookup behavior as the rest of PyAirbyte.Review & Testing Checklist for Human
KAPA_PROJECT_IDconfigured, verifysearch_airbyte_knowledge_sourcesappears in the Replication MCP tool list with only thequeryparameter.KAPA_PROJECT_IDunset, verify the tool is absent from the Replication MCP tool list.Notes
Local verification run:
uv run ruff format airbyte/mcp/kapa.py tests/unit_tests/test_mcp_kapa.pyuv run pytest tests/unit_tests/test_mcp_kapa.py(8 passed)uv run poe checkRequested by AJ Steers.
Link to Devin session: https://app.devin.ai/sessions/8ed989bc34a840a081d0c94eae01d26c
Requested by: Aaron ("AJ") Steers (@aaronsteers)