Hello!
I am using future_pmap in a multicore setting in a machine running ubuntu to run atmospheric trajectories of air parcels (using HYSPLIT model and the related package splitr. I set up my code so that I map across each air parcel. I would add a reproducible example here but I can't believe of something so intensive.
I'm seeing this very weird behavior in the CPU usage: after a while using 100% of the vCPUs there's a drop which will follow until the end of the process. By the end of this domino effect, the CPU usage is close to 0% and the computation of each air parcel trajectory becomes very sluggish.
This picture might give a hint of what might be happening. There are two processes there -- and only them, I am not running anything else -- separated by a couple of hours. The first one runs well, but takes longer than necessary if all the vCPUs were being used. The second one didn't finish, maybe because of that big slump around hour 20. At that time, the code automatically downloaded a couple of files from NOAAs website that were missing in the directory. That is, it added another process (wget) to what's being run.

Each of the processes above is a year and to each year I do the following:
run_model_year = function(inp_data, yr){
#' Run HYSPLIT model in a yearly basis
#'
#' @param inp_data `data.frame` containing date, time, lat, long, BCSMASS, OCCMASS
#' @param yr `integer` of the year to run
#' @param mo `integer` of the month to run
df_run <- inp_data %>%
relocate(acq_date, acq_time, latitude, longitude, BCSMASS, OCCMASS) %>%
filter(year(acq_date) == yr)
tic() ## START
plan(multicore)
hysplit_runs <- future_pmap(df_run,
~hysplit_model(..1, ..2, ..3, ..4, ..5, ..6,
HALF_LIFE_BC, HALF_LIFE_OC),
.progress = TRUE)
toc() ## STOP
}
run_model_year(df,yr = 2016)
Where hysplit_model is just a wrapper of splitr::create_trajectory_model(add_trajectory_params(run_model(...)).
By the way, when I check what the machine is running at the moment future is not using 100% of the vCPUs with top, I see many of hysplit executables being "D", uninterruptible sleep .
I'll add a picture here when I get to that point again.
Hello!
I am using future_pmap in a multicore setting in a machine running ubuntu to run atmospheric trajectories of air parcels (using HYSPLIT model and the related package splitr. I set up my code so that I map across each air parcel. I would add a reproducible example here but I can't believe of something so intensive.
I'm seeing this very weird behavior in the CPU usage: after a while using 100% of the vCPUs there's a drop which will follow until the end of the process. By the end of this domino effect, the CPU usage is close to 0% and the computation of each air parcel trajectory becomes very sluggish.
This picture might give a hint of what might be happening. There are two processes there -- and only them, I am not running anything else -- separated by a couple of hours. The first one runs well, but takes longer than necessary if all the vCPUs were being used. The second one didn't finish, maybe because of that big slump around hour 20. At that time, the code automatically downloaded a couple of files from NOAAs website that were missing in the directory. That is, it added another process (wget) to what's being run.

Each of the processes above is a year and to each year I do the following:
Where hysplit_model is just a wrapper of splitr::create_trajectory_model(add_trajectory_params(run_model(...)).
By the way, when I check what the machine is running at the moment future is not using 100% of the vCPUs with top, I see many of hysplit executables being "D", uninterruptible sleep .
I'll add a picture here when I get to that point again.