Cognitor

Cognitor is an open-source semantic search engine and vector database which automatically chunks, embeds and indexes the entire content of a target folder (and its subfolders), making it easily searchable by both AI agents and humans. It provides a simple API to query the indexed data via natural language, and can be used as a standalone semantic search engine, a vector database, or as a backend for your applications.

Cognitor runs in a Docker container, making it easy to use and deploy on any system, including your local machine for maximum privacy and control over your data.

How does it work?

Cognitor consists of two main components:

Search engine (this repository): a vector database which stores document embeddings, full text and metadata, and provides a simple REST API to query the indexed information.
Worker: a background process that monitors a specified folder for changes, automatically chunks and embeds the content of the files, and updates the vector database accordingly.

How to use

Similarly to other vector databases, Cognitor organizes data into documents and collections.

document: a piece of content that you want to be searchable. It usually corresponds to a chunk of text extracted from a file (not the entire file). Each chunk extracted by the worker is stored as a separate document in the database, along with its embedding and metadata.
collection: a group of related documents. Collections help organize and manage your data within Cognitor. Think of a collection as a table in a traditional database, or as a folder in a file system.

Clone the repo

git clone https://github.com/tanaos/cognitor.git
cd cognitor

Use search engine + worker

Configure the following environment variables in your .env file (at the root of the project):

DOCS_FOLDER: folder that the worker will keep synchronized with a Cognitor collection.
COGNITOR_COLLECTION_NAME: name of the collection that the worker will use to store the indexed documents.

# Absolute path on your host machine to ingest
DOCS_FOLDER=/path/to/your/docs
# Name of the collection in which the worker will store the indexed documents
COGNITOR_COLLECTION_NAME=cognitor-worker-documents

Start both the search engine and the worker with

docker compose --profile worker up -d

Once the search engine's GET /health/ready endpoint returns "ready" (indicating that the initial setup is complete), the worker will automatically start indexing the content of the specified folder and keep it up to date with any changes. Use docker logs cognitor-worker to check the indexing status and see which files have been processed.

Note

Check out the worker repository to see which file types are currently supported (we will be adding more soon). Keep in mind that file types that are not supported will be ignored by the worker, but you can still index their content manually through the API.

You can interact with the search engine's REST API through the Swagger UI, the Python or TypeScript SDKs, or directly through HTTP requests.

Stop both the search engine and the worker with

docker compose --profile worker down --remove-orphans

Use the search engine only

If you prefer to index documents manually through the API instead of using the worker, you can simply start the search engine without the worker:

docker compose up -d

Keep in mind that in this case, document chunking, embedding and indexing will not happen automatically, and you will need to handle that yourself (e.g. by using the SDKs or implementing your own background process).

Stop the search engine with:

docker compose down

Integrate with your applications

When its docker container is running, Cognitor exposes a REST API at http://localhost:7530 which you can use to query the indexed data, manage collections and index more documents. You can visit the Swagger UI at http://localhost:7530/docs. We provide client libraries for

Below is an example of how to search for documents in a collection using the Python SDK:

Install the SDK:

pip install cognitor

Use it in your code:

from cognitor import Cognitor

with Cognitor("http://localhost:7530") as client:
    # Check if the search engine is ready to accept requests
    print(client.health_ready())  # "ready" or "loading"

    # Search by text query
    response = client.search("my-collection", query_text="Hello", top_k=10)
    print(response)

See the Python SDK page for more examples and documentation.

No data? No problem.

If you don't have your own data to test with, you can use the included script to seed the database with a sample e-commerce products collection:

python scripts/dev/seed_ecommerce.py

Contributing

We welcome contributions of any kind! If you want to contribute, please read our contributing guidelines and feel free to open an issue or a pull request.

Security & Privacy

Telemetry

By default, we gather a small amount of anonymous usage data which helps us improve Cognitor. This does not include any personally identifiable information (PII) or sensitive data. You can inspect the exact fields we collect from this file.

If you wish to opt out of telemetry, you can do so by setting the TELEMETRY_ENABLED=false environment variable.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github		.github
assets		assets
scripts		scripts
src		src
stubs/faiss		stubs/faiss
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.worker-local.yml		docker-compose.worker-local.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cognitor

How does it work?

How to use

Clone the repo

Use search engine + worker

Use the search engine only

Integrate with your applications

No data? No problem.

Contributing

Security & Privacy

Telemetry

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Cognitor

How does it work?

How to use

Clone the repo

Use search engine + worker

Use the search engine only

Integrate with your applications

No data? No problem.

Contributing

Security & Privacy

Telemetry

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages