response speed ups using multiple cores by jjd330 · Pull Request #92 · Swift-BAT/NITRATES

jjd330 · 2025-05-30T15:04:21Z

On roar collab, since we're using above the 4 GB allowed on a single basic core we might as well request 2 cores at 4 GB each (or 3 cores for ofov jobs since it uses more memory). To take advantage of the extra cores numba functions can easily be parallelized inside of for loops. I've done this in get_rate_dpis_from_photon_fluxes, multiply_resp_trans_dpis, and made a new function add_3d_arrays by changing the longest for loop to a parallel loop. get_rate_dpis_from_photon_fluxes is the most computationally expensive function and took ~40 ms to run. This is done twice for each spectral template (9 are used for each position). Using 2 cores with the new parallel form cuts this in half. Similar gains in speed are seen from the other changed functions.

When tested on the full processing of a single square seed (140 positions) including the finer scan with a single time seed there was a ~30% speed up (399 s down to 280 s).

I also tested this on a full analysis where the results can be seen here https://guano.swift.psu.edu/trigger_report?id=770209140 . The results are the same besides an error making the skymap (requested too little memory for the manager). The time comparison from data found to last update is not reliable since it seems like for the original config 0 it didn't count uploading the skymap as an update, but it did for config 99 for some reason when it updated the infov results before trying to make the skymap.

But here's the actual times it took from job submission to the initial analysis being done and to the skymap scan being done

Initial analysis (36% faster)
- Current (1 core per job): 44 minutes
- New Branch (2 cores per ifov job): 28 minutes
Full sky scan (35% faster)
- Current (1 core per job): 48 minutes
- New Branch (2 cores per ifov job): 31 minutes

For why it's actually more than a 30% speed up may just be due to differences in queue times or different hardware, but it could also be due to other functions that can inherently use the extra core.

These changes should result in the same runtime when using a single core or at least they did when testing in a notebook.

jjd330 · 2025-05-30T15:11:29Z

Ok maybe I'm not getting exactly the same results. Gonna do some investigating.

jjd330 · 2025-05-30T18:59:37Z

Ok fixed the bug (had multiplication instead of addition) and realized I forgot to put in one of the changes, which is now in there.

The results match now and this is ready to review.

New speed up number:

Initial analysis (40% faster)
- Current (1 core per job): 44 minutes
- New Branch (2 cores per ifov job): 26 minutes
Full sky scan (42% faster)
- Current (1 core per job): 48 minutes
- New Branch (2 cores per ifov job): 28 minutes

samueleronchini · 2025-05-30T19:45:48Z

Looks good to me

response speed ups using multiple cores

fb2d861

jjd330 requested a review from samueleronchini May 30, 2025 15:04

forgot to make func par, and the add func should add, not mult

5786ae7

samueleronchini approved these changes May 30, 2025

View reviewed changes

jjd330 merged commit 246f09f into main May 30, 2025
4 checks passed

jjd330 deleted the numba_par branch May 30, 2025 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

response speed ups using multiple cores#92

response speed ups using multiple cores#92
jjd330 merged 2 commits into
mainfrom
numba_par

jjd330 commented May 30, 2025

Uh oh!

jjd330 commented May 30, 2025

Uh oh!

jjd330 commented May 30, 2025

Uh oh!

samueleronchini commented May 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jjd330 commented May 30, 2025

Uh oh!

jjd330 commented May 30, 2025

Uh oh!

jjd330 commented May 30, 2025

Uh oh!

samueleronchini commented May 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants