Problem
First of all — thank you for building CocoIndex. It's a fantastic piece of engineering and we're using it extensively as the foundation for our enterprise code search and analysis platform. The performance, the incremental processing, and the Tree-sitter integration are excellent.
One thing we're running into: excluded_patterns always overrides included_patterns (documented behavior). This makes it impossible to express:
"Exclude all dot-directories, but include .github/workflows"
CI/CD workflow files (.github/workflows/*.yml) contain valuable context for code analysis and AI agents, but the only way to exclude all other dot-directories (.git, .vscode, .idea, .ruff_cache, etc.) while keeping .github is to enumerate every single dot-directory explicitly — currently ~30 patterns and growing with every new tool.
Current workaround
Replace the single **/.* exclusion with an explicit list of every dot-directory to exclude:
excluded_patterns=[
"**/.git",
"**/.vscode",
"**/.idea",
"**/.ruff_cache",
"**/.pytest_cache",
# ... 25+ more entries, grows over time
]
This is fragile and requires updating whenever a new dot-directory convention appears.
Desired behavior
Support ! negation prefix in excluded_patterns, following .gitignore semantics (patterns evaluated in order, last match wins):
cocoindex.sources.LocalFile(
path="/path/to/repo",
included_patterns=["*.py", "*.yml", "*.yaml"],
excluded_patterns=[
"**/.* ", # exclude all dot-directories
"!**/.github/**", # but allow .github through
],
)
Other use cases
This pattern is useful for any "exclude a category but keep specific exceptions":
# Exclude all test directories except integration tests
excluded_patterns=["**/test/**", "!**/test/integration/**"]
# Exclude vendor but keep a specific vendored library
excluded_patterns=["**/vendor/**", "!**/vendor/internal-lib/**"]
Problem
First of all — thank you for building CocoIndex. It's a fantastic piece of engineering and we're using it extensively as the foundation for our enterprise code search and analysis platform. The performance, the incremental processing, and the Tree-sitter integration are excellent.
One thing we're running into:
excluded_patternsalways overridesincluded_patterns(documented behavior). This makes it impossible to express:CI/CD workflow files (
.github/workflows/*.yml) contain valuable context for code analysis and AI agents, but the only way to exclude all other dot-directories (.git,.vscode,.idea,.ruff_cache, etc.) while keeping.githubis to enumerate every single dot-directory explicitly — currently ~30 patterns and growing with every new tool.Current workaround
Replace the single
**/.*exclusion with an explicit list of every dot-directory to exclude:This is fragile and requires updating whenever a new dot-directory convention appears.
Desired behavior
Support
!negation prefix inexcluded_patterns, following.gitignoresemantics (patterns evaluated in order, last match wins):Other use cases
This pattern is useful for any "exclude a category but keep specific exceptions":