Found and fixed 8 flaky tests in `hbase-handler` by yesh385 · Pull Request #2 · yesh385/hive

yesh385 · 2023-11-14T18:22:15Z

Created this PR to fix 8 flaky tests in TestHBaseSerDe which can be found here.

How was this test identified as flaky?
This test was identifies as flaky by using an open-source research tool named NonDex which is responsible for finding and diagnosing non-deterministic runtime exceptions in Java programs.
What do the tests do?

testHBaseSerDeWithTimestamp
Tests the serialization and deserialization of data with timestamps. It involves creating a test scenario with specific column families, qualifiers, and data types, then sorting and comparing the results. The test checks if the serialized and deserialized data matches the expected fields data.
testHBaseSerDeWithColumnPrefixes
Focuses on serialization and deserialization with column prefixes. It sets up a test scenario with specific column families, qualifiers, and data, then checks if the serialized and deserialized data matches the expected fields data. The test also verifies the handling of unwanted columns and ensures that the column prefixes are appropriately considered in the process.
testHBaseSerDeCompositeKeyWithoutSeparator
Focuses on serialization and deserialization of data with a composite key that lacks separators. It sets up a scenario with a composite key, a specific column family, qualifier, and test data. The test checks if the serialized and deserialized data match the expected fields, taking into account the absence of separators in the composite key.
testHBaseSerDeCustomStructValue
Focuses on the serialization and deserialization of data with a custom struct value. It sets up a scenario with a specific column family, qualifier, and test data represented by a custom struct TestStruct. The test checks if the serialized and deserialized data match the expected fields, taking into account automatic insertion of separators between different fields in the struct during serialization.
testHBaseSerDeII
Focuses on the serialization and deserialization of data with various data types and values. It sets up a test scenario with specific column families, qualifiers, and test data, then checks if the serialized and deserialized data match the expected fields data. The test covers a range of data types including byte, short, int, long, float, double, string, and boolean.
testHBaseSerDeCompositeKeyWithSeparator
Focuses on the serialization and deserialization of data with a composite key that includes separators. It sets up a scenario with a specific column family, qualifier, and test data represented by a custom struct TestStruct. The test checks if the serialized and deserialized data match the expected fields, considering the automatic insertion of separators between different fields in the struct during serialization.
testHBaseSerDeI
Focuses on the serialization and deserialization of data with various data types and values. It sets up a test scenario with specific column families, qualifiers, and test data, then checks if the serialized and deserialized data match the expected fields data. The test covers a range of data types, including byte, short, int, long, float, double, string, and boolean. The scenario includes different configurations, verifying the SerDe functionality under various property settings.
testHBaseSerDeWithHiveMapToHBaseColumnFamilyII
Focuses on mapping Hive columns to HBase column families. It sets up a test scenario with specific HBase column families, qualifiers, and test data. The test checks if the serialized and deserialized data match the expected fields data and if the Hive columns are correctly mapped to the specified HBase column families.

Why do the tests fail?
All of the above tests fail because we are comparing the strings of 2 Put objects i.e. p.toString() and put.toString(). However, there is an order mismatch in the strings returned by the toString() method between the fields of the 2 Put object causing the assertions to fail.

The mismatch in the order of the fields happens because the toString()method of Put creates a Map<String, Object> which is then converted to a string using a JSONMapper. This Map<String, Object> does not guarantee the same order of the fields every time which causes the assertions to fail.

For example, in the test testHBaseSerDeCompositeKeyWithoutSeparator, the assertions which causes the test to fail is shown below:

hive/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java

Line 1052 in 17525f1

assertEquals("Serialized put:", p.toString(), put.toString());

How I fixed these tests?

This PR fixes the above tests by comparing the individual fields of the Put object instead of the strings of the Put objects.

You can run the following commands to run the tests using NonDex tool:

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithTimestamp

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithColumnPrefixes

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithoutSeparator

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCustomStructValue

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeII

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithSeparator

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeI

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithHiveMapToHBaseColumnFamilyII

(Optional) You can also run the following command to run the test:

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithTimestamp

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithColumnPrefixes

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithoutSeparator

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCustomStructValue

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeII

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithSeparator

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeI

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithHiveMapToHBaseColumnFamilyII

Test Environment:

java version "1.8.0_202"
Apache Maven 3.6.3

Kindly let me know if this fix is acceptable.

Thank you

Found and fixed 8 flaky tests

c881734

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Found and fixed 8 flaky tests in `hbase-handler`#2

Found and fixed 8 flaky tests in `hbase-handler`#2
yesh385 wants to merge 1 commit into
masterfrom
flaky-fix

yesh385 commented Nov 14, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yesh385 commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yesh385 commented Nov 14, 2023 •

edited

Loading