Skip to content

fix(python-extractor): capture type-annotated *args/**kwargs parameters#444

Open
tirth8205 wants to merge 2 commits into
Egonex-AI:mainfrom
tirth8205:fix/python-extractor-typed-varargs
Open

fix(python-extractor): capture type-annotated *args/**kwargs parameters#444
tirth8205 wants to merge 2 commits into
Egonex-AI:mainfrom
tirth8205:fix/python-extractor-typed-varargs

Conversation

@tirth8205

Copy link
Copy Markdown
Contributor

Problem

In extractParams (python-extractor.ts), the typed_parameter case assumed the parameter name was a direct identifier child: const ident = findChild(child, "identifier"). For type-annotated variadic parameters such as def f(*args: int, **kwargs: str), the tree-sitter-python grammar wraps the splat in a typed_parameter whose first child is a list_splat_pattern (for *args) or dictionary_splat_pattern (for **kwargs) — the identifier is nested inside that pattern, not a direct child. Because findChild is shallow (only scans direct children), it returned null and the parameter was silently dropped.

Verified by running the extractor: def f(a: int, *args: str, b: int = 0, **kwargs: bool) returned params ["a", "b"] instead of ["a", "*args", "b", "**kwargs"]. The untyped forms (def f(*args, **kwargs)) already worked because they parse as bare list_splat_pattern/dictionary_splat_pattern children of parameters. The gap was specifically typed variadics — a very common pattern in typed Python (*args: Any, **kwargs: object).

Fix

In the typed_parameter case, check for a nested list_splat_pattern / dictionary_splat_pattern first and prefix the captured name with * / ** respectively, falling back to the existing plain-identifier lookup otherwise. The change is confined to understand-anything-plugin/packages/core/src/plugins/extractors/python-extractor.ts.

Testing

  • Added test extracts type-annotated *args and **kwargs to python-extractor.test.ts, asserting def f(a: int, *args: str, b: int = 0, **kwargs: bool) yields params ["a", "*args", "b", "**kwargs"].
  • The new test fails before the fix (expected [ 'a', 'b' ] to deeply equal [ 'a', '*args', 'b', '**kwargs' ]) and passes after.
  • Full core vitest suite is green (693 passing, +1 from this test) with no regressions.
  • tsc --noEmit (core) exits 0 and eslint is clean on both changed files.

🤖 Generated with Claude Code

The `typed_parameter` case in extractParams assumed the parameter name
was a direct `identifier` child. For type-annotated variadic parameters
such as `def f(*args: int, **kwargs: str)`, tree-sitter-python wraps the
splat in a `typed_parameter` whose first child is a `list_splat_pattern`
(`*args`) or `dictionary_splat_pattern` (`**kwargs`), with the identifier
nested inside. Since findChild is shallow, it returned null and the
parameter was silently dropped.

Now the `typed_parameter` case checks for a nested splat pattern first and
prefixes the name with `*`/`**` accordingly, falling back to the plain
identifier lookup otherwise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@thejesh23 thejesh23 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns before this lands.

1. findChild is shallow — fix is grammar-version-fragile. The new branches assume list_splat_pattern / dictionary_splat_pattern are direct children of typed_parameter. That holds in current tree-sitter-python, but if a future grammar revision wraps the splat (e.g., inside a pattern node) the lookup silently returns null again and we regress to the original bug with no test signal. Worth either a recursive search or a comment pinning the assumption to a grammar version.

2. Adjacent gaps still silently dropped. Same extractParams switch has no arm for positional_separator (/), keyword_separator (bare *,), or PEP 695 type_parameter lists (def f[T](x: T) — the [T] lives on function_definition, not parameters, so untouched here, but the bare *// separators do appear in the parameters node and are dropped). Untyped *args followed by keyword-only args already exercises this. Probably out of scope for the fix, but a TODO would help — and the same shape recurs for Dart's {named}/[positional] markers in #435.

3. Test gaps. The new test exercises only a top-level def. Not covered: (a) method form (def m(self, *args: int)) — the new splat branches skip the self/cls filter the plain-identifier branch applies, so a typed *self would leak through (degenerate, but documents intent); (b) async def — should be equivalent but no assertion; (c) typed **kwargs without a preceding typed *args in the same signature, to lock the dict-splat path independently.

Nit: the function-level JSDoc on extractParams (lines 5–10) still reads as if untyped splats are the only splat path — worth one line noting typed variadics are now handled too.

…t-splat path

Address review on Egonex-AI#444:
- Expand extractParams JSDoc to note typed variadics (*args: T, **kwargs: T)
  are resolved inside the typed_parameter case, and that bare */ separators
  and PEP 695 type params are intentionally not emitted as params.
- Pin the grammar assumption (tree-sitter-python ^0.25.0: splat patterns are
  direct children of typed_parameter) with a comment above the splat lookups.
- Add tests locking the dict-splat and list-splat typed paths independently,
  plus an async def variant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tirth8205

Copy link
Copy Markdown
Contributor Author

Addressed point by point.

1. Grammar-version fragility of the shallow findChild. Added a comment above the splat lookups pinning the assumption: in tree-sitter-python ^0.25.0 the list_splat_pattern / dictionary_splat_pattern is a direct child of typed_parameter, which is what the findChild relies on. I kept the shallow lookup rather than switching to a recursive search — a recursive search risks matching an identifier nested inside the type annotation rather than the splat target. The comment also notes that the extracts type-annotated *args and **kwargs test would fail loudly if a future grammar revision wraps the splat, so the regression you describe would not be silent.

2. Bare * / / separators and PEP 695 type params. Documented in the extractParams JSDoc rather than changed. The bare * / / separators are syntactic boundary markers, not named parameters, so excluding them from the param-name list is the desired output (you would not want * or / appearing in params[]). PEP 695 [T] lists live on function_definition, not on the parameters node this function walks, so they are genuinely out of scope here. The JSDoc now states both explicitly so the omission is intentional, not an oversight.

3a. Typed *self / **self leaking through. Not changed. self is never a variadic, so *self / **self is degenerate, non-meaningful Python. The pre-existing untyped splat arms (case "list_splat_pattern" / case "dictionary_splat_pattern") also omit the self/cls filter, so the new typed branches are consistent with established behavior in the same file — adding a guard would be dead code defending against syntactically invalid input.

3b. async def. Added a test (async def fetch(a: int, *args: str, **kwargs: bool)) asserting ["a", "*args", "**kwargs"]. As expected the path is identical — async def produces a function_definition node and the splat code does not branch on sync vs async — but it is now locked.

3c. Typed **kwargs without a preceding typed *args. Added a test for def g(x: int, **kwargs: bool)["x", "**kwargs"] and a companion def h(*args: int)["*args"], so the dict-splat and list-splat arms are each exercised independently of the other.

Nit (JSDoc). Updated the extractParams JSDoc to state that typed variadics (*args: T, **kwargs: T) are now resolved inside the typed_parameter case via the nested splat patterns.

Full core suite green (696 passed), tsc --noEmit clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants