Hacker News new | past | comments | ask | show | jobs | submit login

magrittr is part of the tidyverse, but I agree that data.table is a comparably powerful and sometimes faster option versus dplyr.



magrittr existed before the tidyverse and can be used on standalone perfectly fine.

In all benchmarks I've seen data.table is faster than dplyr on all tasks. Curious to see other results.


At the scale of what I'm doing the benchmarks don't sway me, but I do like the syntax of data.table - it feels a bit like relational algebra.


So then I would assume you must be working with tables of less than 1000 rows, because thats pretty much the only case where it doesn't matter. At anything more than 1k rows, the differences are substantial.


Hundreds of rows is about usual for me. I do analysis on clinical studies with human participants. Nothing too tricky, most of my munging runs in effectively zero time.



I was going to make this point, but yeah. The only thing I think people have a bit of a time with is how you do operations in data.table. If you are coming from plyr/dplyr, the transition can be difficult. However, I've found that the more I do, the more I prefer it, inspite of the fact that the main reason I use dt over tidy is the phenomenal performance gain.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: