A small Python workspace for machine learning and document-processing experiments. The project includes notebooks for loading documents, sample text/PDF data, and dependencies for LangChain, ChromaDB, FAISS, and sentence transformers.
.
+-- data/
| +-- pdf/ # Sample PDF files
| +-- text_files/ # Sample text documents
+-- notebook/ # Jupyter notebooks and local ChromaDB data
+-- src/ # Python package source
+-- main.py # Basic Python entry point
+-- pyproject.toml # Project metadata and dependencies
+-- requirements.txt # Dependency list
Create and activate a virtual environment:
python -m venv .venv
.\.venv\Scripts\Activate.ps1Install dependencies:
pip install -r requirements.txtOr, if you use uv:
uv syncRun the basic entry point:
python main.pyOpen the notebooks in Jupyter or VS Code:
jupyter notebookThe main notebook currently in use is:
notebook/pdf_loader.ipynb
- Keep API keys and local secrets in
.env. - Generated vector database files are stored under
notebook/chroma_db/. - Sample documents live under
data/.