Skip to content

response speed ups using multiple cores#92

Merged
jjd330 merged 2 commits into
mainfrom
numba_par
May 30, 2025
Merged

response speed ups using multiple cores#92
jjd330 merged 2 commits into
mainfrom
numba_par

Conversation

@jjd330

@jjd330 jjd330 commented May 30, 2025

Copy link
Copy Markdown
Contributor

On roar collab, since we're using above the 4 GB allowed on a single basic core we might as well request 2 cores at 4 GB each (or 3 cores for ofov jobs since it uses more memory). To take advantage of the extra cores numba functions can easily be parallelized inside of for loops. I've done this in get_rate_dpis_from_photon_fluxes, multiply_resp_trans_dpis, and made a new function add_3d_arrays by changing the longest for loop to a parallel loop. get_rate_dpis_from_photon_fluxes is the most computationally expensive function and took ~40 ms to run. This is done twice for each spectral template (9 are used for each position). Using 2 cores with the new parallel form cuts this in half. Similar gains in speed are seen from the other changed functions.

When tested on the full processing of a single square seed (140 positions) including the finer scan with a single time seed there was a ~30% speed up (399 s down to 280 s).

I also tested this on a full analysis where the results can be seen here https://guano.swift.psu.edu/trigger_report?id=770209140 . The results are the same besides an error making the skymap (requested too little memory for the manager). The time comparison from data found to last update is not reliable since it seems like for the original config 0 it didn't count uploading the skymap as an update, but it did for config 99 for some reason when it updated the infov results before trying to make the skymap.

But here's the actual times it took from job submission to the initial analysis being done and to the skymap scan being done

  • Initial analysis (36% faster)
    • Current (1 core per job): 44 minutes
    • New Branch (2 cores per ifov job): 28 minutes
  • Full sky scan (35% faster)
    • Current (1 core per job): 48 minutes
    • New Branch (2 cores per ifov job): 31 minutes

For why it's actually more than a 30% speed up may just be due to differences in queue times or different hardware, but it could also be due to other functions that can inherently use the extra core.

These changes should result in the same runtime when using a single core or at least they did when testing in a notebook.

@jjd330 jjd330 requested a review from samueleronchini May 30, 2025 15:04
@jjd330

jjd330 commented May 30, 2025

Copy link
Copy Markdown
Contributor Author

Ok maybe I'm not getting exactly the same results. Gonna do some investigating.

@jjd330

jjd330 commented May 30, 2025

Copy link
Copy Markdown
Contributor Author

Ok fixed the bug (had multiplication instead of addition) and realized I forgot to put in one of the changes, which is now in there.

The results match now and this is ready to review.

New speed up number:

  • Initial analysis (40% faster)
    • Current (1 core per job): 44 minutes
    • New Branch (2 cores per ifov job): 26 minutes
  • Full sky scan (42% faster)
    • Current (1 core per job): 48 minutes
    • New Branch (2 cores per ifov job): 28 minutes

@samueleronchini

Copy link
Copy Markdown
Contributor

Looks good to me

@jjd330 jjd330 merged commit 246f09f into main May 30, 2025
4 checks passed
@jjd330 jjd330 deleted the numba_par branch May 30, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants