Skip to content

Work is not distributed 'on the fly' #470

@frederikziebell

Description

@frederikziebell

Consider the following example where all except the first work package take the same amount of time.

library("tidyverse")
library("future")
library("future.apply")

future::plan("multisession", workers=4)
num_workers <- future::nbrOfWorkers()

f <- function(i){
  if(i==1){
    Sys.sleep(10)
  } else {
    Sys.sleep(1)
  }
  
  data.frame(
    i=i,
    pid=Sys.getpid(),
    timepoint=Sys.time()
  )
}

future_lapply(1:(5*num_workers), f)  %>% 
  bind_rows() %>% 
  mutate(worker_nr=map_dbl(pid, ~which(.x==unique(pid)))) %>% 
  arrange(timepoint)

The result is that all workers except the one evaluating f(1) finish almost simultaneously, while worker nr. 1 lags behind and it's remaining work is not distributed while all other workers are idle:

Bildschirmfoto vom 2021-02-19 20-53-38

Is this an issue or is there a way to specify that unfinished work should be distributed to all available workers? I observed the phenomenon with the future.apply and furrr package, so I think it's directly related to future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions