-
-
Notifications
You must be signed in to change notification settings - Fork 164
Performance
Though it may be controversial to say, performance is not a primary goal for Joinery at this point. First and foremost, our focus is on providing a complete and correct data frame implementation for Java. That said, we want our programs to complete in a timely fashion just like everyone else, so it is important to keep an eye on which operations are expensive and why.
Joinery includes several performance tests for common operations such as appending data, sorting, and aggregation. These tests run as integration tests during builds. However, simply running the tests doesn't tell us very much (I suppose it proves that the memory overhead imposed by a data frame is at least small enough that out of memory errors don't occur).
In order for the performance tests to be more useful, Joinery supports building with metrics enabled. Many data frame methods are decorated with annotations from the metrics package. When metrics is enabled for a build, AspectJ is used to wire up the appropriate tracking bits and print a report on exit. To see for yourself
$ mvn -Pbuild-metrics integration-testAs part of the output, you will have text metrics reports like the one below to review.
joinery.DataFrame.groupBy(Object[]).timer
count = 10
mean rate = 0.71 calls/second
1-minute rate = 1.84 calls/second
5-minute rate = 1.97 calls/second
15-minute rate = 1.99 calls/second
min = 32.14 milliseconds
max = 189.64 milliseconds
mean = 48.99 milliseconds
stddev = 46.09 milliseconds
median = 32.58 milliseconds
75% <= 33.78 milliseconds
95% <= 189.64 milliseconds
98% <= 189.64 milliseconds
99% <= 189.64 milliseconds
99.9% <= 189.64 milliseconds