Skip to content
Bryan Cardillo edited this page Mar 17, 2015 · 1 revision

Data frame performance

Though it may be controversial to say, performance is not a primary goal for Joinery at this point. First and foremost, our focus is on providing a complete and correct data frame implementation for Java. That said, we want our programs to complete in a timely fashion just like everyone else, so it is important to keep an eye on which operations are expensive and why.

Performance tests

Joinery includes several performance tests for common operations such as appending data, sorting, and aggregation. These tests run as integration tests during builds. However, simply running the tests doesn't tell us very much (I suppose it proves that the memory overhead imposed by a data frame is at least small enough that out of memory errors don't occur).

Profiling builds

In order for the performance tests to be more useful, Joinery supports building with metrics enabled. Many data frame methods are decorated with annotations from the metrics package. When metrics is enabled for a build, AspectJ is used to wire up the appropriate tracking bits and print a report on exit. To see for yourself

$ mvn -Pbuild-metrics integration-test

As part of the output, you will have text metrics reports like the one below to review.

joinery.DataFrame.groupBy(Object[]).timer
             count = 10
         mean rate = 0.71 calls/second
     1-minute rate = 1.84 calls/second
     5-minute rate = 1.97 calls/second
    15-minute rate = 1.99 calls/second
               min = 32.14 milliseconds
               max = 189.64 milliseconds
              mean = 48.99 milliseconds
            stddev = 46.09 milliseconds
            median = 32.58 milliseconds
              75% <= 33.78 milliseconds
              95% <= 189.64 milliseconds
              98% <= 189.64 milliseconds
              99% <= 189.64 milliseconds
            99.9% <= 189.64 milliseconds

Clone this wiki locally