[PERF] Pytest Discovery - highly parameterized tests lead to inefficient test node creation in vscode_pytest

I'm raising this as a new issue, since #25348 didn't address the majority of the performance concern and I dug deeper to identify a variety of specific action items that I'm confident will help. Thanks to @eleanorjboyd for working with me on this in the past!

---
Unfortunately, #25658 doesn't appear to have made much improvement for my suite of tests. Possibly a little bit, but not the order of magnitude improvement I was expecting to see.

**Note that an effective reproduction will require  the presence of highly parameterized test functions (e.g. 10,000+ parameterize cases per test, 100,000+ total test cases in the suite).**

Collecting at the command line (without `vscode_pytest`):
```
328987 tests collected in 10.43s
```

Collecting using VSCode Test Explorer (tested with `ms-python.python` versions `2026.0.0` and `2026.1.2026010901`):
```
328987 tests collected in 66.07s (0:01:06)
```

Is there any logging/tracing within the `build_test_tree` logic that I can turn on to help you investigate the source of delay that aren't scaling well to hundreds of thousands of tests?

I can see from #25658 that you suspected the use of `list` and duplicate checks to be a significant contributor, so I'll point out that there are other such uses (`"children"` key) that may also require attention. E.g.

https://github.com/microsoft/vscode-python/blob/e2681d5925fb8ef6cb810d191048bd56f56b3e3e/python_files/vscode_pytest/__init__.py#L648-L649

---
I did some deeper testing with `vscode_pytest` locally and found that the key inefficiencies all seem to stem from either A) avoiding duplicates in a `list` (as you suspected) and B) performing redundant computations for every parameterized `Item` of a single function. Here's what I would suggest (prioritized by the impact on runtime performance):

**Key Performance Issues:**
1. Continue avoiding inefficient deduplication any time a `list` is used, esp. for the test cases of a parameterized function (just remove the first duplicate check in `process_parameterized_test`?).
2. `create_test_node` redundantly extracts the line number from every parameterized instance of a test function
    - This could be cached by the parent function id rather than the parameterized item ID

**Performance-improving Refactors:**
3. Alghough this will be more effort, I think that the payload size could be drastically reduced if you changed the JSON schema sent back to the `TEST_RUN_PIPE` to avoid duplicating the absolute path so many times.
    - If you store the root package folder once and allow all test paths/IDs to remain relative it would save both computation time and payload size (memory and copy/transfer speed).
    - If you store path, class+function name, and parameterized ID separately, then many fields of the test nodes themselves will be much smaller.
4. It might help to use `cached_fspath` any time that you need `os.fspath` to optimize for caching.
5. I would suggest extracting and consolidating the parent/module/file node creation logic that's duplicated for multiple branches of `build_test_tree` and `process_parameterized_test`
    - The way I see it, the first two conditional branches of `build_test_tree` should result in the identification of a top-level function/class node (and the creation of all its children class/test nodes). Then, as a separate logical section within the `for` loop of `build_test_tree`, the file-level node can be added to `file_nodes_dict` (if not present) and the top-level item added to its `children`.
    - Once you do that, I think it some additional short-circuiting opportunities will become obvious since many functions will share the same parent module/file.
6. Generally speaking, I think there may be some unnecessary extraction/computation/conversion of file paths (and possibly some IO cost that could be avoided).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF] Pytest Discovery - highly parameterized tests lead to inefficient test node creation in vscode_pytest #25948

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	if test_node not in function_test_node["children"]:
	function_test_node["children"].append(test_node)

[PERF] Pytest Discovery - highly parameterized tests lead to inefficient test node creation in vscode_pytest #25948

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions