Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Created this PR to fix 8 flaky tests in
TestHBaseSerDewhich can be found here.How was this test identified as flaky?
This test was identifies as flaky by using an open-source research tool named NonDex which is responsible for finding and diagnosing non-deterministic runtime exceptions in Java programs.
What do the tests do?
testHBaseSerDeWithTimestampTests the serialization and deserialization of data with timestamps. It involves creating a test scenario with specific column families, qualifiers, and data types, then sorting and comparing the results. The test checks if the serialized and deserialized data matches the expected fields data.
testHBaseSerDeWithColumnPrefixesFocuses on serialization and deserialization with column prefixes. It sets up a test scenario with specific column families, qualifiers, and data, then checks if the serialized and deserialized data matches the expected fields data. The test also verifies the handling of unwanted columns and ensures that the column prefixes are appropriately considered in the process.
testHBaseSerDeCompositeKeyWithoutSeparatorFocuses on serialization and deserialization of data with a composite key that lacks separators. It sets up a scenario with a composite key, a specific column family, qualifier, and test data. The test checks if the serialized and deserialized data match the expected fields, taking into account the absence of separators in the composite key.
testHBaseSerDeCustomStructValueFocuses on the serialization and deserialization of data with a custom struct value. It sets up a scenario with a specific column family, qualifier, and test data represented by a custom struct
TestStruct. The test checks if the serialized and deserialized data match the expected fields, taking into account automatic insertion of separators between different fields in the struct during serialization.testHBaseSerDeIIFocuses on the serialization and deserialization of data with various data types and values. It sets up a test scenario with specific column families, qualifiers, and test data, then checks if the serialized and deserialized data match the expected fields data. The test covers a range of data types including byte, short, int, long, float, double, string, and boolean.
testHBaseSerDeCompositeKeyWithSeparatorFocuses on the serialization and deserialization of data with a composite key that includes separators. It sets up a scenario with a specific column family, qualifier, and test data represented by a custom struct
TestStruct. The test checks if the serialized and deserialized data match the expected fields, considering the automatic insertion of separators between different fields in the struct during serialization.testHBaseSerDeIFocuses on the serialization and deserialization of data with various data types and values. It sets up a test scenario with specific column families, qualifiers, and test data, then checks if the serialized and deserialized data match the expected fields data. The test covers a range of data types, including byte, short, int, long, float, double, string, and boolean. The scenario includes different configurations, verifying the SerDe functionality under various property settings.
testHBaseSerDeWithHiveMapToHBaseColumnFamilyIIFocuses on mapping Hive columns to HBase column families. It sets up a test scenario with specific HBase column families, qualifiers, and test data. The test checks if the serialized and deserialized data match the expected fields data and if the Hive columns are correctly mapped to the specified HBase column families.
All of the above tests fail because we are comparing the strings of 2
Putobjects i.e.p.toString()andput.toString(). However, there is an order mismatch in the strings returned by thetoString()method between the fields of the 2Putobject causing the assertions to fail.The mismatch in the order of the fields happens because the
toString()method ofPutcreates aMap<String, Object>which is then converted to a string using a JSONMapper. ThisMap<String, Object>does not guarantee the same order of the fields every time which causes the assertions to fail.For example, in the test
testHBaseSerDeCompositeKeyWithoutSeparator, the assertions which causes the test to fail is shown below:hive/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
Line 1052 in 17525f1
This PR fixes the above tests by comparing the individual fields of the
Putobject instead of the strings of thePutobjects.You can run the following commands to run the tests using NonDex tool:
(Optional) You can also run the following command to run the test:
Test Environment:
Kindly let me know if this fix is acceptable.
Thank you