You are here |
adv-r.had.co.nz | ||
| | | |
privefl.github.io
|
|
| | | | In this post, I talk about loops in R, why they can be slow and when it is okay to use them. Don't grow objects Let us generate a matrix of uniform values (max changing for every column). gen_grow <- function(n = 1e3, max = 1:500) { mat <- NULL for (m in max) { mat <- cbind(mat, runif(n, max = m)) } mat } set.seed(1) system.time(mat1 <- gen_grow(max = 1:500)) ## user system elapsed ## 0.333 0.189 0.523 system.time(mat2 <- gen_grow(max = 1:2000)) ## user system elapsed ## 6.183 7.603 13.803 gen_sapply <- function(n = 1e3, max = 1:500) { sapply(max, function(m) runif(n, max = m)) } set.seed(1) system.time(mat3 <- gen_sapply(max = 1:500)) ## user system elapsed ## 0.026 0.005 0.030 identical(mat3, mat1) ## [1] TRUE system.time(mat4 <- gen_sapply(max = 1:2000)) ## user system elapsed ## 0.108 0.014 0.122 identical(mat4, mat2) ## [1] TRUE Wow, sapply() is so much faster than loops! Don't get this wrong, sapply() or lapply() is nothing but a loop internally, so sapply() shouldn't be any faster than a loop. Here, the problem is not with the loop, but what we do inside this loop. Indeed, in gen_grow(), at each iteration of the loop, we reallocate a new matrix with one more column, which takes time. Imagine you want to climb all those stairs, but you have to climb only stair 1, go to the bottom then climb the first 2 stairs, go to the bottom then climb the first three, and so on until you reach the top. This takes way more time than just climbing all stairs at once. This is basically what happens in function gen_grow() but instead of climbing more stairs, it allocates more memory, which also takes time. You have at least two solutions to this problem. The first solution is to pre-allocate the whole result once (if you know its size in advance) and just fill it: gen_prealloc <- function(n = 1e3, max = 1:500) { mat <- matrix(0, n, length(max)) for (i in seq_along(max)) { mat[, i] <- runif(n, max = max[i]) } mat } set.seed(1) system.time(mat5 <- gen_prealloc(max = 1:500)) ## user system elapsed ## 0.030 0.000 0.031 identical(mat5, mat1) ## [1] TRUE system.time(mat6 <- gen_prealloc(max = 1:2000)) ## user system elapsed ## 0.101 0.009 0.109 identical(mat6, mat2) ## [1] TRUE Another solution that can be really useful if you don't know the size of the result is to store the results in a list. A list, as opposed to a vector or a matrix, stores its elements in different places in memory (the elements don't have to be contiguously stored in memory) so that you can add one element to the list without copying the rest of the list. gen_list <- function(n = 1e3, max = 1:500) { l <- list() for (i in seq_along(max)) { l[[i]] <- runif(n, max = max[i]) } do.call("cbind", l) } set.seed(1) system.time(mat7 <- gen_list(max = 1:500)) ## user system elapsed ## 0.028 0.000 0.028 identical(mat7, mat1) ## [1] TRUE system.time(mat8 <- gen_list(max = 1:2000)) ## user system elapsed ## 0.098 0.006 0.105 identical(mat8, mat2) ## [1] TRUE Vectorization, why? I call vectorized a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C++ and Fortran). So, let me repeat myself: sapply() is not a vectorized function. Let's go back to vectorization, why is it so important in R? As an example, let's compute the sum of two vectors. add_loop_prealloc <- function(x, y) { res <- double(length(x)) for (i in seq_along(x)) { res[i] <- x[i] + y[i] } res } | |
| | | |
aosmith.rbind.io
|
|
| | | | In this post I delve into the details of the R functions I've been using in my simulation examples, focusing on the replicate() function and the map family of functions from the purrr package. I spend a little time showing the parallels between the replicate() function and a for() loop. | |
| | | |
bruceeckel.com
|
|
| | | | ||
| | | |
pmig96.wordpress.com
|
|
| | Having recently found the Zig programming language, I asked myself: what would it take to code a PumpkinOS application in Zig? Now, before we start I should warn you that I had never programmed in Zig before. What I will show here is a proof of concept containing a pretty simple Zig program integrated with... |