HDFS-17927. Make TestDiskError.testShutdown use deterministic volume failure injection#8517
HDFS-17927. Make TestDiskError.testShutdown use deterministic volume failure injection#8517smengcl wants to merge 2 commits into
Conversation
…failure injection
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Pull request overview
This PR makes TestDiskError.testShutdown() deterministic by replacing repeated file creation with direct DataNode volume failure injection and a bounded wait for shutdown.
Changes:
- Uses
DataNodeTestUtils.injectDataDirFailure()on both DataNode storage dirs. - Calls
dn.checkDiskError()directly and waits up to 30 seconds for the DataNode to report down. - Cleans up injected failures in a
finallyblock and updates imports accordingly.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return; | ||
| } | ||
| // Bring up two more datanodes | ||
| cluster.startDataNodes(conf, 2, true, null, null); |
There was a problem hiding this comment.
Are we sure the 2 additional DNs are not required neither in this test case nor to keep the following test cases stable (in the setup phase we are creating a single DN cluster)?
| DataNodeTestUtils.injectDataDirFailure(dir1, dir2); | ||
| dn.checkDiskError(); | ||
| GenericTestUtils.waitFor(() -> !dn.isDatanodeUp(), 100, 30000); | ||
| assertFalse(dn.isDatanodeUp(), |
There was a problem hiding this comment.
Is it possible to reach this line?
I mean if dn.isDatanodeUp() is true, then GenericTestUtils.waitFor(() will throw an exception, so if GenericTestUtils.waitFor() did not throw exception that means the DN is down.
|
Hi @smengcl, |
Description of PR
TestDiskError.testShutdown()currently drives DataNode shutdown by repeatedly creating files untildn.isDatanodeUp()becomes false. If shutdown does not happen promptly, the test can loop for a long time and generate excessive logs.This change replaces the file-create loop with deterministic failure injection for both DataNode data dirs, then explicitly runs
dn.checkDiskError()and waits boundedly for the DataNode to exit.How was this patch tested?
TestDiskError.testShutdownitself.For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?AI Tooling
If an AI tool was used:
where is the name of the AI tool used.
https://www.apache.org/legal/generative-tooling.html