Skip to content

[PERF] Pytest Discovery - highly parameterized tests lead to inefficient test node creation in vscode_pytest #25948

@tboddyspargo

Description

@tboddyspargo

I'm raising this as a new issue, since #25348 didn't address the majority of the performance concern and I dug deeper to identify a variety of specific action items that I'm confident will help. Thanks to @eleanorjboyd for working with me on this in the past!


Unfortunately, #25658 doesn't appear to have made much improvement for my suite of tests. Possibly a little bit, but not the order of magnitude improvement I was expecting to see.

Note that an effective reproduction will require the presence of highly parameterized test functions (e.g. 10,000+ parameterize cases per test, 100,000+ total test cases in the suite).

Collecting at the command line (without vscode_pytest):

328987 tests collected in 10.43s

Collecting using VSCode Test Explorer (tested with ms-python.python versions 2026.0.0 and 2026.1.2026010901):

328987 tests collected in 66.07s (0:01:06)

Is there any logging/tracing within the build_test_tree logic that I can turn on to help you investigate the source of delay that aren't scaling well to hundreds of thousands of tests?

I can see from #25658 that you suspected the use of list and duplicate checks to be a significant contributor, so I'll point out that there are other such uses ("children" key) that may also require attention. E.g.

if test_node not in function_test_node["children"]:
function_test_node["children"].append(test_node)


I did some deeper testing with vscode_pytest locally and found that the key inefficiencies all seem to stem from either A) avoiding duplicates in a list (as you suspected) and B) performing redundant computations for every parameterized Item of a single function. Here's what I would suggest (prioritized by the impact on runtime performance):

Key Performance Issues:

  1. Continue avoiding inefficient deduplication any time a list is used, esp. for the test cases of a parameterized function (just remove the first duplicate check in process_parameterized_test?).
  2. create_test_node redundantly extracts the line number from every parameterized instance of a test function
    • This could be cached by the parent function id rather than the parameterized item ID

Performance-improving Refactors:
3. Alghough this will be more effort, I think that the payload size could be drastically reduced if you changed the JSON schema sent back to the TEST_RUN_PIPE to avoid duplicating the absolute path so many times.
- If you store the root package folder once and allow all test paths/IDs to remain relative it would save both computation time and payload size (memory and copy/transfer speed).
- If you store path, class+function name, and parameterized ID separately, then many fields of the test nodes themselves will be much smaller.
4. It might help to use cached_fspath any time that you need os.fspath to optimize for caching.
5. I would suggest extracting and consolidating the parent/module/file node creation logic that's duplicated for multiple branches of build_test_tree and process_parameterized_test
- The way I see it, the first two conditional branches of build_test_tree should result in the identification of a top-level function/class node (and the creation of all its children class/test nodes). Then, as a separate logical section within the for loop of build_test_tree, the file-level node can be added to file_nodes_dict (if not present) and the top-level item added to its children.
- Once you do that, I think it some additional short-circuiting opportunities will become obvious since many functions will share the same parent module/file.
6. Generally speaking, I think there may be some unnecessary extraction/computation/conversion of file paths (and possibly some IO cost that could be avoided).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions