This repo was created with the purpose of studying performance of different functions that can be used to obtain the same result.
We are using "Melbourne Housing Market" public dataset provided by Kaggle with 60.7k observations and 13 columns to test some functions.
We run the selected function 1000 times over one column using microbenchmark R package and use summary() to analyse the difference in milliseconds between the functions.
Load a .csv file into an R object (data frame/tibble/etc.)
| function | times | unit | lq | uq | min | max | median | avg |
|---|---|---|---|---|---|---|---|---|
| fread | 100 | ms | 38.250 | 52.078 | 37.182 | 244.789 | 41.062 | 48.074 |
| import | 100 | ms | 39.385 | 55.187 | 38.124 | 92.511 | 46.186 | 49.661 |
| read.csv | 100 | ms | 508.648 | 585.146 | 488.136 | 1173.685 | 532.987 | 570.243 |
| read_csv | 100 | ms | 161.373 | 191.777 | 133.286 | 678.702 | 173.806 | 189.244 |
Apply a specific function to a list or vector.
| function | times | unit | lq | uq | min | max | median | avg |
|---|---|---|---|---|---|---|---|---|
| lapply | 1000 | ms | 12.205 | 13.952 | 11.714 | 212.913 | 12.494 | 14.894 |
| sapply | 1000 | ms | 15.154 | 17.356 | 14.404 | 56.467 | 15.541 | 17.657 |
| map | 1000 | ms | 26.620 | 37.616 | 25.654 | 223.574 | 28.354 | 33.111 |
| map_chr | 1000 | ms | 27.417 | 38.271 | 26.093 | 225.453 | 29.188 | 33.753 |
Find a simple pattern in a string, and return the subset
| function | times | unit | lq | uq | min | max | median | avg |
|---|---|---|---|---|---|---|---|---|
| grep | 1000 | ms | 5.233 | 5.947 | 4.837 | 34.422 | 5.447 | 6.096 |
| regexpr | 1000 | ms | 8.224 | 9.052 | 7.715 | 52.361 | 8.467 | 9.568 |
| sqldf | 1000 | ms | 118.596 | 133.969 | 114.072 | 461.028 | 121.131 | 137.014 |
Subset column in dataframe
| function | times | unit | lq | uq | min | max | median | avg |
|---|---|---|---|---|---|---|---|---|
| base1 | 1000 | ms | 0.017 | 0.033 | 0.015 | 0.105 | 0.021 | 0.026 |
| base2 | 1000 | ms | 0.017 | 0.032 | 0.015 | 0.203 | 0.021 | 0.026 |
| select | 1000 | ms | 1.506 | 1.637 | 1.432 | 33.489 | 1.554 | 1.704 |
Convert column to date format
| function | times | unit | lq | uq | min | max | median | avg |
|---|---|---|---|---|---|---|---|---|
| lubridate | 1000 | ms | 7.370 | 9.153 | 6.963 | 202.276 | 7.705 | 9.787 |
| to.date | 1000 | ms | 14.633 | 16.840 | 14.011 | 57.368 | 15.083 | 17.058 |
| strptime | 1000 | ms | 212.153 | 236.698 | 205.226 | 464.125 | 216.453 | 234.386 |
Convert column as character
| function | times | unit | lq | uq | min | max | median | avg |
|---|---|---|---|---|---|---|---|---|
| as.character | 1000 | ms | 0.028 | 0.045 | 0.006 | 10.563 | 0.038 | 0.048 |
| formatC | 1000 | ms | 13.551 | 14.481 | 12.711 | 62.413 | 13.822 | 15.167 |
| paste | 1000 | ms | 14.609 | 15.289 | 14.030 | 207.284 | 14.801 | 16.277 |
| toString | 1000 | ms | 16.162 | 16.837 | 15.549 | 58.643 | 16.381 | 17.780 |
| sprintf | 1000 | ms | 18.049 | 18.847 | 17.456 | 58.991 | 18.274 | 19.786 |
r-functions-performance is MIT Licensed.