The data.table
package implements faster melt/dcast
functions (in C). It also has additional features by allowing to melt and cast multiple columns. Please see the new Efficient reshaping using data.tables on Github.
melt/dcast functions for data.table have been available since v1.9.0 and the features include:
-
There is no need to load
reshape2
package prior to casting. But if you want it loaded for other operations, please load it before loadingdata.table
. -
dcast
is also a S3 generic. No moredcast.data.table()
. Just usedcast()
. -
melt
:-
is capable of melting on columns of type ‘list’.
-
gains
variable.factor
andvalue.factor
which by default areTRUE
andFALSE
respectively for compatibility withreshape2
. This allows for directly controlling the output type ofvariable
andvalue
columns (as factors or not). -
melt.data.table
‘sna.rm = TRUE
parameter is internally optimised to remove NAs directly during melting and is therefore much more efficient. -
NEW:
melt
can accept a list formeasure.vars
and columns specified in each element of the list will be combined together. This is faciliated further through the use ofpatterns()
. See vignette or?melt
.
-
-
dcast
:-
accepts multiple
fun.aggregate
and multiplevalue.var
. See vignette or?dcast
. -
use
rowid()
function directly in formula to generate an id-column, which is sometimes required to identify the rows uniquely. See ?dcast.
-
-
Old benchmarks:
melt
: 10 million rows and 5 columns, 61.3 seconds reduced to 1.2 seconds.dcast
: 1 million rows and 4 columns, 192 seconds reduced to 3.6 seconds.
Reminder of Cologne (Dec 2013) presentation slide 32 : Why not submit a dcast
pull request to reshape2
?