Skip to content

Found and fixed 8 flaky tests in hbase-handler#2

Open
yesh385 wants to merge 1 commit into
masterfrom
flaky-fix
Open

Found and fixed 8 flaky tests in hbase-handler#2
yesh385 wants to merge 1 commit into
masterfrom
flaky-fix

Conversation

@yesh385

@yesh385 yesh385 commented Nov 14, 2023

Copy link
Copy Markdown
Owner

Created this PR to fix 8 flaky tests in TestHBaseSerDe which can be found here.

  1. How was this test identified as flaky?
    This test was identifies as flaky by using an open-source research tool named NonDex which is responsible for finding and diagnosing non-deterministic runtime exceptions in Java programs.

  2. What do the tests do?

  • testHBaseSerDeWithTimestamp
    Tests the serialization and deserialization of data with timestamps. It involves creating a test scenario with specific column families, qualifiers, and data types, then sorting and comparing the results. The test checks if the serialized and deserialized data matches the expected fields data.
  • testHBaseSerDeWithColumnPrefixes
    Focuses on serialization and deserialization with column prefixes. It sets up a test scenario with specific column families, qualifiers, and data, then checks if the serialized and deserialized data matches the expected fields data. The test also verifies the handling of unwanted columns and ensures that the column prefixes are appropriately considered in the process.
  • testHBaseSerDeCompositeKeyWithoutSeparator
    Focuses on serialization and deserialization of data with a composite key that lacks separators. It sets up a scenario with a composite key, a specific column family, qualifier, and test data. The test checks if the serialized and deserialized data match the expected fields, taking into account the absence of separators in the composite key.
  • testHBaseSerDeCustomStructValue
    Focuses on the serialization and deserialization of data with a custom struct value. It sets up a scenario with a specific column family, qualifier, and test data represented by a custom struct TestStruct. The test checks if the serialized and deserialized data match the expected fields, taking into account automatic insertion of separators between different fields in the struct during serialization.
  • testHBaseSerDeII
    Focuses on the serialization and deserialization of data with various data types and values. It sets up a test scenario with specific column families, qualifiers, and test data, then checks if the serialized and deserialized data match the expected fields data. The test covers a range of data types including byte, short, int, long, float, double, string, and boolean.
  • testHBaseSerDeCompositeKeyWithSeparator
    Focuses on the serialization and deserialization of data with a composite key that includes separators. It sets up a scenario with a specific column family, qualifier, and test data represented by a custom struct TestStruct. The test checks if the serialized and deserialized data match the expected fields, considering the automatic insertion of separators between different fields in the struct during serialization.
  • testHBaseSerDeI
    Focuses on the serialization and deserialization of data with various data types and values. It sets up a test scenario with specific column families, qualifiers, and test data, then checks if the serialized and deserialized data match the expected fields data. The test covers a range of data types, including byte, short, int, long, float, double, string, and boolean. The scenario includes different configurations, verifying the SerDe functionality under various property settings.
  • testHBaseSerDeWithHiveMapToHBaseColumnFamilyII
    Focuses on mapping Hive columns to HBase column families. It sets up a test scenario with specific HBase column families, qualifiers, and test data. The test checks if the serialized and deserialized data match the expected fields data and if the Hive columns are correctly mapped to the specified HBase column families.
  1. Why do the tests fail?
    All of the above tests fail because we are comparing the strings of 2 Put objects i.e. p.toString() and put.toString(). However, there is an order mismatch in the strings returned by the toString() method between the fields of the 2 Put object causing the assertions to fail.

The mismatch in the order of the fields happens because the toString()method of Put creates a Map<String, Object> which is then converted to a string using a JSONMapper. This Map<String, Object> does not guarantee the same order of the fields every time which causes the assertions to fail.

For example, in the test testHBaseSerDeCompositeKeyWithoutSeparator, the assertions which causes the test to fail is shown below:

assertEquals("Serialized put:", p.toString(), put.toString());

  1. How I fixed these tests?

This PR fixes the above tests by comparing the individual fields of the Put object instead of the strings of the Put objects.

You can run the following commands to run the tests using NonDex tool:

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithTimestamp
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithColumnPrefixes
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithoutSeparator
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCustomStructValue
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeII
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithSeparator
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeI
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithHiveMapToHBaseColumnFamilyII

(Optional) You can also run the following command to run the test:

mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithTimestamp
mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithColumnPrefixes
mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithoutSeparator
mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCustomStructValue
mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeII
mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithSeparator
mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeI
mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithHiveMapToHBaseColumnFamilyII

Test Environment:

java version "1.8.0_202"
Apache Maven 3.6.3

Kindly let me know if this fix is acceptable.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant