-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix(qdrant): add pandas dep, fix cross-repo ID collisions, fix comment indexing #2323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -445,7 +445,7 @@ def _update_index_with_issues(self, issues_list, repo_name_for_index, upsert=Fal | |
| if len(comment_body) < 8000 or \ | ||
| self.token_handler.count_tokens(comment_body) < MAX_TOKENS[MODEL]: | ||
| comment_record = Record( | ||
| id=issue_key + ".comment_" + str(j + 1), | ||
| id=issue_key + ".comment_" + str(j), | ||
| text=comment_body, | ||
| metadata=Metadata(repo=repo_name_for_index, | ||
| username=username, # use issue username for all comments | ||
|
|
@@ -541,7 +541,7 @@ def _update_table_with_issues(self, issues_list, repo_name_for_index, ingest=Fal | |
| if len(comment_body) < 8000 or \ | ||
| self.token_handler.count_tokens(comment_body) < MAX_TOKENS[MODEL]: | ||
| comment_record = Record( | ||
| id=issue_key + ".comment_" + str(j + 1), | ||
| id=issue_key + ".comment_" + str(j), | ||
| text=comment_body, | ||
| metadata=Metadata(repo=repo_name_for_index, | ||
| username=username, # use issue username for all comments | ||
|
|
@@ -639,7 +639,7 @@ def _update_qdrant_with_issues(self, issues_list, repo_name_for_index, ingest=Fa | |
| if len(comment_body) < 8000 or \ | ||
| self.token_handler.count_tokens(comment_body) < MAX_TOKENS[MODEL]: | ||
| comment_record = Record( | ||
| id=issue_key + ".comment_" + str(j + 1), | ||
| id=issue_key + ".comment_" + str(j), | ||
| text=comment_body, | ||
| metadata=Metadata(repo=repo_name_for_index, | ||
| username=username, | ||
|
|
@@ -673,7 +673,7 @@ def _update_qdrant_with_issues(self, issues_list, repo_name_for_index, ingest=Fa | |
| points = [] | ||
| for row in df.to_dict(orient="records"): | ||
| points.append( | ||
| PointStruct(id=uuid.uuid5(uuid.NAMESPACE_DNS, row["id"]).hex, vector=row["vector"], payload={"id": row["id"], "text": row["text"], "metadata": row["metadata"]}) | ||
| PointStruct(id=uuid.uuid5(uuid.NAMESPACE_DNS, f"{repo_name_for_index}:{row['id']}").hex, vector=row["vector"], payload={"id": row["id"], "text": row["text"], "metadata": row["metadata"]}) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 3. Qdrant upsert creates duplicates Changing Qdrant PointStruct.id generation to include repo_name_for_index alters point IDs for existing repos, so re-ingesting into an existing collection will add a second copy of each point instead of overwriting the old ones. This can reduce result diversity (duplicates consume top_k) because querying/parsing uses payload["id"], not the Qdrant point ID. Agent Prompt
|
||
| ) | ||
| self.qdrant.upsert(collection_name=self.index_name, points=points) | ||
| get_logger().info('Done') | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,6 +39,7 @@ giteapy==1.0.8 | |
| # pinecone-datasets @ git+https://github.com/mrT23/pinecone-datasets.git@main | ||
| # lancedb==0.5.1 | ||
| # qdrant-client==1.15.1 | ||
| # pandas # required by qdrant indexing path | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 1. requirements.txt adds commented pandas The PR introduces a new commented-out dependency line for pandas, which is inactive code and violates the no-commented-out-code requirement. This can also cause runtime failures for users who enable Qdrant without actually installing pandas. Agent Prompt
|
||
| # uncomment this to support language LangChainOpenAIHandler | ||
| # langchain==0.2.0 | ||
| # langchain-core==0.2.28 | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.