A small, independent Python CLI for searching public research sources from the shell.
It does not require a paid API, cloud account, hosted LLM, or project-specific backend. The CLI uses provider-specific adapters for supported research websites and normalizes results into a consistent text or JSON format.
- Python 3.10 or newer
uvfor installation as a standalone tool
After the package is published to PyPI, install the command-line tool with:
uv tool install query-kitConfirm the executable is available:
query-cli --helpUse it as a Python library in another project with:
uv add query-kitThe PyPI distribution is named query-kit; the import package remains
query_cli, and the console command remains query-cli.
uv tool install --force . installs the project from the current directory only. If you do not already have the repository locally, clone it first, then run the install command from inside the checkout.
git clone git@github.com:neeraj1909/query-kit.git
cd query-kit
uv tool install --force .uv does not have a uv tool reinstall command. To refresh an existing local install after pulling changes, run the same install command again from the repository root:
uv tool install --force .After installation, confirm the executable is available:
query-cli --helpSearch ACL Anthology directly:
query-cli search "xai driven nlp" --provider acl --limit 5Search arXiv Atom API directly:
query-cli search "explainable NLP" --provider arxiv --limit 5Search arXiv's public web search page when the Atom API is unavailable or rate-limited:
query-cli search "Devanagari OCR" --provider arxiv-web --limit 5Search PubMed directly:
query-cli search "explainable NLP" --provider pubmed --limit 5Search Semantic Scholar directly through the official Graph API:
query-cli search "explainable NLP" --provider semantic-scholar --limit 5Search Semantic Scholar's public web search endpoint as an ordinary-HTTP browser-as-API probe:
query-cli search "Hindi OCR" --provider semantic-scholar-web --limit 5Search OpenReview directly:
query-cli search "explainable NLP" --provider openreview --limit 5Search more than one provider by repeating --provider:
query-cli search "explainable NLP" --provider acl --provider arxiv --provider pubmed --limit 10Search all launch-ready providers:
query-cli search "explainable NLP" --provider all --limit 10Filter by publication year when the provider supports it:
query-cli search "explainable NLP" --provider all --since-year 2024 --limit 10Return normalized JSON results:
query-cli search "explainable NLP" --provider acl --limit 5 --format jsonThe earlier ACL Anthology query should be run with search:
query-cli search "list down xai driven nlp research papers in last 1 year" --provider acl --limit 5Results are normalized into the same shape across providers, deduplicated by title/link, and limited after merging.
Synchronous Python callers can use the same service that powers the CLI:
from query_cli.application.services import search_research
from query_cli.bootstrap import get_search_providers
providers = get_search_providers(["arxiv", "pubmed"], timeout=30)
results = search_research("explainable nlp", providers, limit=10)Async applications should use the native async service instead of calling the sync wrapper inside an existing event loop:
from query_cli.application.services import search_research_async
from query_cli.bootstrap import get_search_providers
providers = get_search_providers(["arxiv", "pubmed"], timeout=30)
results = await search_research_async("explainable nlp", providers, limit=10)Built-in providers expose both search(...) and search_async(...). The async
service calls provider search_async(...) methods concurrently and falls back to
running sync-only third-party providers in a worker thread.
When you run:
query-cli search "<keyword>" --provider <provider-name>The CLI passes through these phases:
- Parse command:
argparsereads the query text, provider names, limit, timeout, format, and verbose flag. - Resolve configuration: the CLI resolves
--timeoutorQUERY_CLI_TIMEOUT, then defaults to30seconds if neither is set. - Select providers:
src/query_cli/bootstrap.pymaps provider names such asacl,arxiv,pubmed,semantic-scholar,semantic-scholar-web,openreview, orallto concrete provider adapters. - Build domain query: the application service creates a
SearchQuerywith the keyword text, optional--since-year, and result limit, then validates that the query is not empty and the limit is positive. - Call provider adapters: each selected adapter performs provider-specific HTTP and parsing work. The sync CLI enters one service-level event loop, then the service runs async-capable providers concurrently while preserving provider order for merging:
acldownloads ACL Anthology's public BibTeX export with abstracts and matches query terms against paper metadata.arxivcalls the public arXiv Atom API and parses the XML feed.arxiv-webcalls arXiv's public HTML search page with ordinary HTTP and parses visible result metadata/full abstracts where the page provides them.pubmedcalls NCBI E-utilities ESearch and EFetch for PubMed records.semantic-scholarcalls the Semantic Scholar Graph API paper search endpoint.semantic-scholar-webcalls the public Semantic Scholar web search endpoint with ordinary HTTP and parses the browser-visible JSON response when the endpoint is not WAF-challenged.openreviewcalls the OpenReview API 2 notes search endpoint.
- Normalize results: provider-specific records are converted into shared
SearchResultobjects with fields such as title, URL, source, authors, year, venue, and abstract. - Merge and deduplicate: the service merges results from all selected providers, deduplicates by normalized title/link, and applies the global
--limit. - Format output: the CLI prints readable text by default, or normalized JSON when
--format jsonis passed.
Provider failures are isolated. If one provider fails but another returns results, the CLI still prints the successful results; if all selected providers fail, it returns a clear error.
| Provider | Status | Notes |
|---|---|---|
acl |
Supported | Searches ACL Anthology using its public BibTeX export, preferring the abstracts export and falling back to the plain BibTeX export. Best for NLP and computational linguistics papers. |
arxiv |
Supported | Searches the public arXiv Atom API, sorted by last updated date. Best for broad CS, AI, ML, and NLP preprints. |
arxiv-web |
Supported | Browser-as-API style provider over arXiv's public HTML search page using normal HTTP. Useful fallback when the Atom API is rate-limited; preserves full public abstracts present in the page. |
pubmed |
Supported | Searches PubMed through NCBI E-utilities. Best for biomedical and clinical NLP queries. |
semantic-scholar |
Supported | Searches Semantic Scholar's official Graph API. Best for broad academic metadata. |
semantic-scholar-web |
Conditional | Browser-as-API style provider over Semantic Scholar's public web search endpoint using normal HTTP. It preserves public abstracts/TLDRs when available, but Semantic Scholar may WAF-challenge non-browser HTTP; query-kit reports that as a provider limitation instead of bypassing it. |
openreview |
Supported | Searches public OpenReview API 2 notes. Best for ML conference and workshop submissions visible through public search. |
- The CLI does not use paid APIs by default.
- Public providers may rate-limit requests or change response formats.
- Live search results can vary because they come from external websites.
--timeoutapplies per provider request.- Use
--verboseto print selected providers and result counts to stderr. arxivandarxiv-webenforce a 3-second minimum interval between repeated arXiv requests in the same process.arxiv-webandsemantic-scholar-webare the safe browser-as-API pattern: ordinary HTTP over public pages/endpoints, no cdp/Chrome runtime dependency, no copied cookies/auth headers, and no forged browser fingerprint headers. SetQUERY_CLI_USER_AGENTto your project-specific User-Agent if needed.pubmedenforces NCBI's default 3 requests/second limit without an API key and 10 requests/second with an API key.- NCBI asks software developers to register a tool name and email with NCBI; passing
QUERY_CLI_NCBI_TOOLandQUERY_CLI_NCBI_EMAILis not a substitute for registration. semantic-scholarmay return HTTP 429 without an API key. SetQUERY_CLI_SEMANTIC_SCHOLAR_API_KEYif you have one. Error messages include safe upstream diagnostics such as JSONmessage/code,Retry-After,x-amzn-errortype, orx-amzn-waf-actionwhen present.semantic-scholarqueries replace hyphens with spaces before calling the Graph API because the official docs say hyphenated query terms yield no matches.semantic-scholar-webuses the browser-observed public search payload shape, but only through ordinary HTTP. If Semantic Scholar returnsx-amzn-waf-action: challengeto non-browser HTTP, query-kit surfaces that safe diagnostic and relies on other providers for partial results; it does not copy browser cookies or try to bypass the challenge.openreviewuses API 2 public search. Older API 1 venue-specific retrieval is not implemented in this generic search adapter.- The HTTP client only sends a custom User-Agent when
QUERY_CLI_USER_AGENTis set.
Flags take precedence over environment variables.
| Flag | Environment variable | Description |
|---|---|---|
--timeout |
QUERY_CLI_TIMEOUT |
Optional request timeout in seconds. Defaults to 30. |
--since-year |
N/A | Optional publication-year lower bound. Providers apply it through source-specific filters or post-filtering when year metadata is available. |
--format |
N/A | Output format: text or json. Defaults to text. |
--verbose |
N/A | Print request diagnostics to stderr. |
| N/A | QUERY_CLI_USER_AGENT |
Optional User-Agent value sent with provider HTTP requests. |
| N/A | QUERY_CLI_NCBI_API_KEY |
Optional NCBI API key for PubMed E-utilities. Raises the default NCBI limit from 3 requests/second to 10 requests/second. |
| N/A | QUERY_CLI_NCBI_TOOL |
Optional NCBI tool parameter. Register this value with NCBI for production use. |
| N/A | QUERY_CLI_NCBI_EMAIL |
Optional NCBI email parameter. Register this value with NCBI for production use. |
| N/A | QUERY_CLI_SEMANTIC_SCHOLAR_API_KEY |
Optional Semantic Scholar API key. |
Normal tests use mocked HTTP responses. After installing the CLI, you can run live smoke checks manually:
query-cli search "explainable nlp" --provider arxiv --limit 2 --format json
query-cli search "Devanagari OCR" --provider arxiv-web --limit 2 --format json
query-cli search "explainable nlp" --provider pubmed --limit 2 --format json
query-cli search "explainable nlp" --provider semantic-scholar --limit 2 --format json
query-cli search "Hindi OCR" --provider semantic-scholar-web --limit 2 --format json
query-cli search "explainable nlp" --provider openreview --limit 2 --format json
query-cli search "explainable nlp" --provider all --limit 10 --format jsonProvider integrations follow a small ports-and-adapters shape:
- Implement the
SearchProviderprotocol fromsrc/query_cli/application/ports.py. - Implement
AsyncSearchProvider.search_async(...)for providers that perform network I/O. Sync-only providers still work through the service compatibility bridge, but native async implementations avoid blocking event-loop-owned applications. - Return normalized
SearchResultobjects fromsrc/query_cli/domain/model.py. - Register the provider in
src/query_cli/bootstrap.py. - Add mocked HTTP adapter tests for both
search(...)andsearch_async(...), plus CLI/service tests. - Document provider limits and examples here.
Keep website-specific HTTP and parsing code inside src/query_cli/adapters/ so the CLI and service layer stay reusable.
uv tool reinstall --force . is not a valid uv command. Use this instead from the repository root:
uv tool install --force .If query-cli --help only shows {ask}, the installed executable is stale. Pull the latest code and reinstall from the repository root:
git pull
uv tool install --force .
query-cli --helpThe help output should include the search command.
If uv tool install --force . reports success but query-cli --help still shows
old commands, force uv to refresh its cached build:
uv tool install --force --reinstall --refresh .
query-cli --help| Code | Meaning |
|---|---|
0 |
Success. |
1 |
User/configuration error, such as an invalid provider or limit. |
2 |
Network or provider request error. |