Skip to content

Proposal: introduce A-* area labels with auto-labeler #6986

@wjones127

Description

@wjones127

Goal

Make it easier to filter open PRs by code area (e.g., "show me all open encoding PRs"). Today the area-ish labels in this repo (vector, indexes, file-storage, format, java, python, etc.) are unprefixed, inconsistent, and partly overlap — there's no clean filter for "PRs in my area."

Proposal

Adopt an A-* area-label convention (rust-lang style) plus an auto-labeler that applies labels based on changed paths.

Labels

Label Description Paths
A-java Java bindings + JNI java/**, rust/lance-jni/**
A-namespace Namespace impls rust/lance-namespace*/**
A-index Vector index, linalg, tokenizer rust/lance-index/**, rust/lance-linalg/**, rust/lance-tokenizer/**
A-encoding Encoding, IO, file format rust/lance-encoding/**, rust/lance-io/**, rust/lance-file/**, protos/**
A-python Python bindings python/** (excluding lockfiles)
A-docs Documentation docs/**, **/*.md
A-ci CI / build workflows .github/**
A-deps Dependency updates **/Cargo.lock, **/uv.lock, deny.toml, **/pyproject.toml
A-core General Rust core (catch-all) other rust/**, Cargo.toml

A PR can carry multiple A-* labels (e.g. a Java + namespace PR gets both).

Auto-labeler

Add .github/labeler.yml + a workflow using actions/labeler@v5 that runs on PR open/sync and applies labels based on changed paths.

Migration of existing labels

Rename in place via gh label edit. GitHub preserves the label on existing issues/PRs through rename, so nothing breaks:

  • vector, indexesA-index (merge)
  • file-storage, formatA-encoding
  • javaA-java
  • pythonA-python
  • ciA-ci
  • dependencies, python:uvA-deps
  • rust → unclear; possibly drop in favor of A-core

Labels not on the map (arrow, c++, wasm, PyTorch/Tensorflow, duckdb, ray, OS labels, etc.) stay as cross-cutting tags.

Why this matters

Anyone can bookmark a filter like is:pr is:open label:A-encoding to see only PRs in their area, instead of scanning the whole open-PR list.

Open questions

  1. A-core as an explicit catch-all, or should unlabeled-means-core?
  2. Is A-python worth having given Python-binding PRs are usually reviewed by whoever owns the underlying Rust change? (Filter ergonomics say yes; routing says no.)
  3. Are A-docs / A-ci / A-deps overkill, or do reviewers want to filter them in/out separately?

Feedback welcome before I roll this out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions