Feature/5 embedding and indexing#6
Merged
Merged
Conversation
Added - python commands to run chunker and embedding modules - other relevant info about these modules in their sections in the README
Modified package path from `atlas.core.ingest` to `atlas.core.ingester`
Added configuration file for the encoder. Allows to change the encoder used and its settings. Added associated dataclass and configuration loading function.
Converted certain abstract methods to implementation methods in abstract base class and removed those methods from implementation class.
Exploring SentenceTransformer's `all-MiniLM-L6-v2` encoder model
Introduced a BaseEmbedder interface to standardize the embedding workflow and added a concrete SentenceTransformerEmbedder implementation. This provides a clean abstraction for embedding generation, simplifies future embedder extensions.
…ests Added conftest.py to save global pytest fixtures Updated existing unit tests appropriately Added unit tests for newly added embedding module
Set device to cuda or cpu based on presence of pytorch. Fixes CUDA related error on Github CI
Updated minor changes to architecture diagram
Added FAISS library as dependency to be installed via conda Changed numpy to be installed via conda instead of pip because otherwise numpy pip version does not play well with FAISS conda
Added missing docstrings and logging statements
FAISS vector store class - build index - searches an input query in the index - saves and loads the index and metadata file
This just reads the json with all the chunk embedddings, builds the index and saves it
Made the architecture diagram path branch invariant
Owner
Author
|
Results in |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6 +/- ##
==========================================
+ Coverage 82.14% 89.14% +7.00%
==========================================
Files 7 15 +8
Lines 280 525 +245
Branches 36 49 +13
==========================================
+ Hits 230 468 +238
- Misses 41 43 +2
- Partials 9 14 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#5
Add embedding and indexing module
Added