Conversation
Contributor
Author
|
Ok maybe I'm not getting exactly the same results. Gonna do some investigating. |
Contributor
Author
|
Ok fixed the bug (had multiplication instead of addition) and realized I forgot to put in one of the changes, which is now in there. The results match now and this is ready to review. New speed up number:
|
Contributor
|
Looks good to me |
samueleronchini
approved these changes
May 30, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On roar collab, since we're using above the 4 GB allowed on a single basic core we might as well request 2 cores at 4 GB each (or 3 cores for ofov jobs since it uses more memory). To take advantage of the extra cores numba functions can easily be parallelized inside of for loops. I've done this in
get_rate_dpis_from_photon_fluxes,multiply_resp_trans_dpis, and made a new functionadd_3d_arraysby changing the longest for loop to a parallel loop.get_rate_dpis_from_photon_fluxesis the most computationally expensive function and took ~40 ms to run. This is done twice for each spectral template (9 are used for each position). Using 2 cores with the new parallel form cuts this in half. Similar gains in speed are seen from the other changed functions.When tested on the full processing of a single square seed (140 positions) including the finer scan with a single time seed there was a ~30% speed up (399 s down to 280 s).
I also tested this on a full analysis where the results can be seen here https://guano.swift.psu.edu/trigger_report?id=770209140 . The results are the same besides an error making the skymap (requested too little memory for the manager). The time comparison from data found to last update is not reliable since it seems like for the original config 0 it didn't count uploading the skymap as an update, but it did for config 99 for some reason when it updated the infov results before trying to make the skymap.
But here's the actual times it took from job submission to the initial analysis being done and to the skymap scan being done
For why it's actually more than a 30% speed up may just be due to differences in queue times or different hardware, but it could also be due to other functions that can inherently use the extra core.
These changes should result in the same runtime when using a single core or at least they did when testing in a notebook.