Skip to content

Enable test concurrency by default#4313

Draft
ScottDugas wants to merge 54 commits into
FoundationDB:mainfrom
ScottDugas:test-concurrency-all
Draft

Enable test concurrency by default#4313
ScottDugas wants to merge 54 commits into
FoundationDB:mainfrom
ScottDugas:test-concurrency-all

Conversation

@ScottDugas

Copy link
Copy Markdown
Collaborator

No description provided.

JUnit 6.1 introduced a new executor service to be used for parallel
testing that does not use a ForkJoinPool. The issue with the
ForkJoinPool, was that tasks waiting to join futures do not count
towards the concurrency. This means that as soon as a test waited
on a future, another test would start, causing more load than FDB
could handle. The new service does not suffer from this issue.
This protects multiple instances in the same JVM from trying to
initialize the keyspace/domain concurrently, which could conflict.
This shouldn't be a real problem in production environments, but is
low cost, so adding it in support of concurrent tests provides
real value.

There are probably additional issues that need to be resolved
around multiple FRLs registering a driver in the same instance...
And it may make sense to have some sort of registry so that we can
better control for this, and reduce the number of tests depending on
the registered driver.
This allows for running multiple tests in parallel within the same
JVM, without risk of having multiple threads trying to start multiple
servers against the same port. This replaces the previous
per-ExternalServer approach to protecting port conflicts.
This does not protect as well for concurrent ExternalServers in
different processes, but currently we don't run concurrently in
multiple JVMs, so this should be sufficient.
SnakeYaml is not threadsafe, so reusing the same parser when we are
running tests in parallel causes bugs
This is temporary. Notably, the reason we have to lock the schema
setup is because dropping a database calls deleteStore which always
bumps the metadata version, even though the stores that we dropping
don't cache the store header. This means that deleting a database/schema
will conflict with creating a database/schema (or any other operation
involving the catalog, I think).
This should be irrelevant for production, but can be quite a
problem when tests are running in parallel.
Running more tests in parallel, on the same JVM we can run into
cache evictions, which causes the tests to fail when the expect
plans to be cached.
JUnit 6.1 introduced a new executor service to be used for parallel
testing that does not use a ForkJoinPool. The issue with the
ForkJoinPool, was that tasks waiting to join futures do not count
towards the concurrency. This means that as soon as a test waited
on a future, another test would start, causing more load than FDB
could handle. The new service does not suffer from this issue.
This protects multiple instances in the same JVM from trying to
initialize the keyspace/domain concurrently, which could conflict.
This shouldn't be a real problem in production environments, but is
low cost, so adding it in support of concurrent tests provides
real value.

There are probably additional issues that need to be resolved
around multiple FRLs registering a driver in the same instance...
And it may make sense to have some sort of registry so that we can
better control for this, and reduce the number of tests depending on
the registered driver.
This allows for running multiple tests in parallel within the same
JVM, without risk of having multiple threads trying to start multiple
servers against the same port. This replaces the previous
per-ExternalServer approach to protecting port conflicts.
This does not protect as well for concurrent ExternalServers in
different processes, but currently we don't run concurrently in
multiple JVMs, so this should be sufficient.
SnakeYaml is not threadsafe, so reusing the same parser when we are
running tests in parallel causes bugs
TODO move this to testing.gradle so all modules take advantage
TODO do this for other modules that are creating using the default
scheduled executor
In parallel runs, cachedVersionMaintenanceOnReadsTest fails because
the previous version may be more than 2s ago
Since indexing tests do a fair amount of constant work (building indexes),
and do their work at batch priority, they are more susceptible to
load on the system due to concurrent transactions. Give all of them
a ResourceLock so that we don't have multiple tests of this style
running concurrently.
TODO check if we can change this to the same resource lock as the
indexing tests instead
Since createPath now evaluates the path, it can bump the cached
read version, which breaks the expectation of the test

TODO can we decrease the 60s back to 2s like it was before
This still runs into DeadlineExceededException, more investigation
is probably needed, but this gets the tests to pass better.
JUnit 6.1 introduced a new executor service to be used for parallel
testing that does not use a ForkJoinPool. The issue with the
ForkJoinPool, was that tasks waiting to join futures do not count
towards the concurrency. This means that as soon as a test waited
on a future, another test would start, causing more load than FDB
could handle. The new service does not suffer from this issue.
This protects multiple instances in the same JVM from trying to
initialize the keyspace/domain concurrently, which could conflict.
This shouldn't be a real problem in production environments, but is
low cost, so adding it in support of concurrent tests provides
real value.

There are probably additional issues that need to be resolved
around multiple FRLs registering a driver in the same instance...
And it may make sense to have some sort of registry so that we can
better control for this, and reduce the number of tests depending on
the registered driver.
This allows for running multiple tests in parallel within the same
JVM, without risk of having multiple threads trying to start multiple
servers against the same port. This replaces the previous
per-ExternalServer approach to protecting port conflicts.
This does not protect as well for concurrent ExternalServers in
different processes, but currently we don't run concurrently in
multiple JVMs, so this should be sufficient.
SnakeYaml is not threadsafe, so reusing the same parser when we are
running tests in parallel causes bugs
These overwhelm FDB, so mark them as SAME-THREAD
This is temporary. Notably, the reason we have to lock the schema
setup is because dropping a database calls deleteStore which always
bumps the metadata version, even though the stores that we dropping
don't cache the store header. This means that deleting a database/schema
will conflict with creating a database/schema (or any other operation
involving the catalog, I think).
This should be irrelevant for production, but can be quite a
problem when tests are running in parallel.
TODO move this to testing.gradle so all modules take advantage
Registering and de-regestering FRL causes issues with running the
tests in parallel. Instead have the relational extension provide a
driver for tests and other extensions to use.
The catalog tests have to wipe and recreate the catalog, so those
must be isolated from all other relational tests.
Tests that assert about logs must be run by themselves, because
the log is shared.
This ends up being a lot, it may make more sense to fix the production
behavior so that deleting a schema doesn't cause problems.
Or, change our tests so that they don't clean up anything until after
all tests are done (or ever).
The switch to not registering the driver caused it to directly call
connect() on the driver, but the contract for that is different,
namely, the driver is supposed to return null for something it
doesn't understand, and the DriverManager converts that to an
exception. Update the test to expect the actual contract.
@ScottDugas ScottDugas added the build improvement Improvement to the build system label Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build improvement Improvement to the build system

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant