feat: Linear sum assignment#85
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #85 +/- ##
==========================================
- Coverage 98.77% 93.61% -5.17%
==========================================
Files 17 15 -2
Lines 2703 3414 +711
==========================================
+ Hits 2670 3196 +526
- Misses 33 218 +185 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will improve performance by 22.57%
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance ChangesTip Investigate this regression by commenting Comparing Footnotes
|
|
hey |
I thought it would be cool to have some assignment functionality for when we need to perform matching between predictions and GTs for example, or assignment between current frame and previous frame ?
Anyway, I started with hungarian matching but it ended up being too slow when the number of boxes are above 1000. So I took inspiration from
scipy.optimize.linear_sum_assignmentto implement the shortest augmented path algorithm.Side note/digression: it's funny how all these ML papers claim to use hungarian matching while in fact they just use
linear_sum_assignementin their code, which is a different algorithm...To speed-up the computation on dense cost-matrices, I asked claude to help me write a SIMD implementation (using the
pulpcrate). However, when the cost matrix is sparse, in the sense that most boxes dont overlap, the non-SIMD (aka scalar) function is much faster due to the SIMD overhead of having to move the data back and forth between memory and vector registers.Also, I noticed that
parallel_iou_distance_slicewas missing, it was only implemented under thendarrayfeature. So I modified this to be able to use it inlsap_iou_slice.I did alot of testing to verify correctness and benchmarking against scipy but also lapjv and lap, and we are faster in many cases. When not faster, the speed is similar.
Using the
_random_xyxy_boxesfunction fromtest_torch.py. We can generate a very dense case:or very sparse case
And compare results of the different libs.
The bigger the image size, the more "sparse" is the cost matrix, ie more boxes dont overlap.