GSOC-Week1-Module_A#920
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
Summary by CodeRabbit
WalkthroughAdds a harvester configuration subsystem: Pydantic schemas, YAML loader with error mapping, semantic validators for duplicates, package exports, example repo configs and exclude patterns, and pytest fixtures/tests covering valid and many invalid cases. ChangesRepository Configuration Harvester Framework
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
ParthAggarwal16#3 (comment) |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
application/tests/harvester_test/test_config_loader.py (1)
43-45: ⚡ Quick winUse Path object for consistency.
Line 45 passes a string literal while all other tests use
Pathobjects fromFIXTURES_DIR. For consistency and clarity, consider usingFIXTURES_DIR / "does_not_exist.yaml"or another Path-based approach.♻️ Proposed fix for consistency
def test_missing_config_file(): - with pytest.raises(FileNotFoundError): - load_repo_config("does_not_exist.yaml") + config_path = FIXTURES_DIR / "does_not_exist.yaml" + with pytest.raises(FileNotFoundError): + load_repo_config(config_path)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/tests/harvester_test/test_config_loader.py` around lines 43 - 45, Replace the string literal path in test_missing_config_file with a Path-based fixture to match the rest of the tests: call load_repo_config with FIXTURES_DIR / "does_not_exist.yaml" instead of "does_not_exist.yaml" so the test uses a Path object (refer to test_missing_config_file, load_repo_config and FIXTURES_DIR).application/tests/harvester_test/test_repos_validator.py (1)
14-45: ⚡ Quick winConsider testing case-insensitive duplicate detection.
The validator uses
casefold()for case-insensitive comparison (per the relevant code snippet), but there's no test verifying that repository IDs like"ASVS"and"asvs"are correctly identified as duplicates. Adding such a test would verify this important edge case.🔤 Proposed case-insensitive duplicate test
Create a fixture
duplicate_repo_ids_case_insensitive.yaml:repositories: - id: ASVS type: github owner: OWASP repo: ASVS paths: include: - "**/*.md" chunking: strategy: markdown max_tokens: 1200 polling: mode: incremental interval_minutes: 60 - id: asvs type: github owner: OWASP repo: CheatSheetSeries paths: include: - "**/*.md" chunking: strategy: markdown max_tokens: 1200 polling: mode: incremental interval_minutes: 60Then add the test:
def test_duplicate_repository_ids_case_insensitive(): config_path = FIXTURES_DIR / "duplicate_repo_ids_case_insensitive.yaml" config = load_repo_config(config_path) with pytest.raises( RepositoryValidationError, match="Duplicate repository id", ): validate_repositories(config)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/tests/harvester_test/test_repos_validator.py` around lines 14 - 45, Add a case-insensitive duplicate-id test: create the fixture duplicate_repo_ids_case_insensitive.yaml with two repositories whose ids differ only by case (e.g., "ASVS" and "asvs"), then add a test function test_duplicate_repository_ids_case_insensitive() that loads the fixture via load_repo_config and asserts validate_repositories(config) raises RepositoryValidationError with a "Duplicate repository id" match; this ensures the validate_repositories code path that uses casefold() is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/tests/harvester_test/test_repos_validator.py`:
- Around line 14-45: Add a new success-case test to
application/tests/harvester_test/test_repos_validator.py that ensures
validate_repositories(config) does not raise for a known-good config: create a
test function (e.g., test_validate_valid_repositories) that loads FIXTURES_DIR /
"valid_repos.yaml" via load_repo_config and calls validate_repositories(config)
(no exception assertion needed); reference existing helpers load_repo_config and
the function validate_repositories so the test mirrors the failure tests' style.
In `@application/utils/harvester/__init__.py`:
- Around line 17-27: The __all__ export list is unsorted; reorder the entries
alphabetically so it meets linting standards by sorting the names:
ChunkingConfig, ConfigLoaderError, PathRules, PollingConfig, RepositoryConfig,
RepositoriesValidationError (note actual names below), ReposFile,
load_repo_config, RepositoryValidationError, validate_repositories —
specifically sort the existing symbols ("ChunkingConfig", "ConfigLoaderError",
"PathRules", "PollingConfig", "RepositoryConfig", "ReposFile",
"load_repo_config", "RepositoryValidationError", "validate_repositories") into
ascending alphabetical order and replace the current __all__ definition with
that sorted list.
---
Nitpick comments:
In `@application/tests/harvester_test/test_config_loader.py`:
- Around line 43-45: Replace the string literal path in test_missing_config_file
with a Path-based fixture to match the rest of the tests: call load_repo_config
with FIXTURES_DIR / "does_not_exist.yaml" instead of "does_not_exist.yaml" so
the test uses a Path object (refer to test_missing_config_file, load_repo_config
and FIXTURES_DIR).
In `@application/tests/harvester_test/test_repos_validator.py`:
- Around line 14-45: Add a case-insensitive duplicate-id test: create the
fixture duplicate_repo_ids_case_insensitive.yaml with two repositories whose ids
differ only by case (e.g., "ASVS" and "asvs"), then add a test function
test_duplicate_repository_ids_case_insensitive() that loads the fixture via
load_repo_config and asserts validate_repositories(config) raises
RepositoryValidationError with a "Duplicate repository id" match; this ensures
the validate_repositories code path that uses casefold() is exercised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 9f37ad90-960e-469e-bfbe-c6dda95e7b54
📒 Files selected for processing (21)
application/tests/harvester_test/__init__.pyapplication/tests/harvester_test/fixtures/duplicate_include_paths.yamlapplication/tests/harvester_test/fixtures/duplicate_repo_ids.yamlapplication/tests/harvester_test/fixtures/duplicate_repositories.yamlapplication/tests/harvester_test/fixtures/empty_include_paths.yamlapplication/tests/harvester_test/fixtures/empty_owner.yamlapplication/tests/harvester_test/fixtures/invalid_chunk_size.yamlapplication/tests/harvester_test/fixtures/invalid_chunking_strategy.yamlapplication/tests/harvester_test/fixtures/invalid_missing_id.yamlapplication/tests/harvester_test/fixtures/invalid_polling_interval.yamlapplication/tests/harvester_test/fixtures/invalid_polling_mode.yamlapplication/tests/harvester_test/fixtures/invalid_yaml.yamlapplication/tests/harvester_test/fixtures/valid_repos.yamlapplication/tests/harvester_test/test_config_loader.pyapplication/tests/harvester_test/test_repos_validator.pyapplication/utils/harvester/__init__.pyapplication/utils/harvester/config_loader.pyapplication/utils/harvester/exclude_patterns.txtapplication/utils/harvester/repos.yamlapplication/utils/harvester/repos_validator.pyapplication/utils/harvester/schemas.py
|
@coderabbitai |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
gh api repos/OWASP/OpenCRE/commits/0be6df87c7642908c76dc915c48b54c30227acff --jq '.commit.message + "\n---\n" + (.files // [] | map(" " + .filename + " [+" + (.additions|tostring) + "/-" + (.deletions|tostring) + "]\n patch: " + (.patch // "")) | join("\n"))'Length of output: 1054
Both look good. Thanks for the fix! (^‿^)🐇 If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
Summary
This PR adds the initial repository configuration loading and validation layer for the harvester pipeline.
Added
Validation Coverage
Config Loader Validation
Repository Semantic Validation
Test Plan
Executed:
bash python3 -m pytest application/tests/harvester_test -v python3 -m coverage report python3 -m black application/utils/harvester application/tests/harvester_test
All tests passing.
Notes for Reviewers
Validation is intentionally split into:
Repository uniqueness checks are case-insensitive.
exclude_patterns.txt is currently scaffolded for future integration into harvesting logic.
ParthAggarwal16#3 (comment)
this comment has some good coderabbit diagrams if you guys wanna have a look at it