Skip to content

Support negation in excluded_patterns to allow exceptions #1778

@petrarca

Description

@petrarca

Problem

First of all — thank you for building CocoIndex. It's a fantastic piece of engineering and we're using it extensively as the foundation for our enterprise code search and analysis platform. The performance, the incremental processing, and the Tree-sitter integration are excellent.

One thing we're running into: excluded_patterns always overrides included_patterns (documented behavior). This makes it impossible to express:

"Exclude all dot-directories, but include .github/workflows"

CI/CD workflow files (.github/workflows/*.yml) contain valuable context for code analysis and AI agents, but the only way to exclude all other dot-directories (.git, .vscode, .idea, .ruff_cache, etc.) while keeping .github is to enumerate every single dot-directory explicitly — currently ~30 patterns and growing with every new tool.

Current workaround

Replace the single **/.* exclusion with an explicit list of every dot-directory to exclude:

excluded_patterns=[
    "**/.git",
    "**/.vscode", 
    "**/.idea",
    "**/.ruff_cache",
    "**/.pytest_cache",
    # ... 25+ more entries, grows over time
]

This is fragile and requires updating whenever a new dot-directory convention appears.

Desired behavior

Support ! negation prefix in excluded_patterns, following .gitignore semantics (patterns evaluated in order, last match wins):

cocoindex.sources.LocalFile(
    path="/path/to/repo",
    included_patterns=["*.py", "*.yml", "*.yaml"],
    excluded_patterns=[
        "**/.* ",           # exclude all dot-directories
        "!**/.github/**",  # but allow .github through
    ],
)

Other use cases

This pattern is useful for any "exclude a category but keep specific exceptions":

# Exclude all test directories except integration tests
excluded_patterns=["**/test/**", "!**/test/integration/**"]

# Exclude vendor but keep a specific vendored library
excluded_patterns=["**/vendor/**", "!**/vendor/internal-lib/**"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions