Roberto Villegas-Diaz
Data Manager @ University of Liverpool
In reality:
“The goal of {furrr} is to combine {purrr}’s family of mapping functions with {future}’s parallel processing capabilities.”
Shout out to Tom Smith @ Nottingham University Hospitals NHS Trust:
Note: Replacing a
map
function by its equivalentfuture_map
, does not auto-magically parallelise your code! 🥲
# Set a "plan" for how the code should run.
future::plan(future::multisession, workers = 2)
# This does run in parallel!
furrr::future_map(c("hello", "{purrr}!"), ~.x)
[[1]]
[1] "hello"
[[2]]
[1] "{purrr}!"
Other functions:
future_imap()
, future_imap_chr()
, …,
future_map2()
, future_map2_chr()
, …,
future_walk()
, future_map_chr()
, …, and more.
Reference: https://furrr.futureverse.org/reference
future::plan
ningsequential
: uses the current R processmultisession
: uses separate R sessionsmulticore
: uses separate forked R processescluster
: uses separate R sessions on one or more machinesReference: https://future.futureverse.org/reference/plan.html
For testing at home:
To find the available CPUs (i.e., max number of workers
for the plan
function):
future::availableCores()
To add progress bar, include .progress = TRUE
in the function call:
furrr::future_map(x, fx, .progress = TRUE)
⚠️ the documentation suggests shifting to the progressr
framework.
Imagine we want to compute some spatial indicator X
at UPRN (Unique Property Reference Number) level, how long will that take?
Some UPRN stats:
UPRNs are available under the Open Government License (OGL) from the Ordnance Survey Data Hub.
access_to_green_spaces <- function(uprn, ...) {
Sys.sleep(1E-3) # do your thing
return(uprn) # result
}
# Load datasets derived with the R/uprn_example.R script
ons_uprn_nw_cm_icb <-
readr::read_rds("../data/ons_uprn_nw_cm_icb.Rds")
sub_icb_boundaries_cm <-
readr::read_rds("../data/sub_icb_boundaries_cm.Rds")
Code: R/uprn_example.R
2672.98 sec elapsed
292.05 sec elapsed
[1] 0.2080856 0.2080856 0.2080856
[1] 0.1552317 0.4877356 0.5330014
user system elapsed
0.024 0.001 0.282
user system elapsed
0.342 0.502 1.191
A possible solution, instead of using an anonymous function within the environment of the “large” object, define the function separately:
user system elapsed
0.297 0.055 0.590