Melt using patterns when variable names contain string information - avoid coercion to numeric

From data.table 1.14.1 (in development; installation), the new function measure makes it much easier to melt data with concatenated variable names to a desired format (see ?measure.

The separator argument is used to create different groups of measure.vars. In the ... argument, we further specify the fate of the values corresponding to the groups generated by sep.

In OP, the variable names are of the form species_number, e.g. dog_one. Thus, we need two symbols in ... to specify how groups before and after the separator should be treated, one for the species (dog or cat) and one for the numbers (one-three).

If a symbol in ... is set to value.name, then “melt returns multiple value columns (with names defined by the unique values in that group)”. Thus, because you want multiple columns for each species, the first group defined by the separator, the first symbol in ... should be value.name.

The second group, after the separator, are the numbers, so this is specified as the second symbol in .... We want in a single value column for the numbers, so here we specify the desired column name of the output variable, e.g. “nr”.

melt(B, measure.vars = measure(value.name, nr, sep = "_"))

      idcol    nr dog cat
#  1:     1   one   1 101
#  2:     2   one   2 102
#  3:     3   one   3 103
#  4:     4   one   4 104
#  5:     5   one   5 105
#  6:     1   two   6 106
#  7:     2   two   7 107
#  8:     3   two   8 108
#  9:     4   two   9 109
# 10:     5   two  10 110
# 11:     1 three  11 111
# 12:     2 three  12 112
# 13:     3 three  13 113
# 14:     4 three  14 114
# 15:     5 three  15 115

Pre data.table 1.14.1

There might be easier ways, but this seems to work:

# grab suffixes of 'variable' names
suff <- unique(sub('^.*_', '', names(B[ , -1])))
# suff <- unique(tstrsplit(names(B[, -1]), "_")[[2]])

# melt
B2 <- melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat"))
   
# replace factor levels in 'variable' with the suffixes
setattr(B2$variable, "levels", suff)

B2
#     idcol variable dog cat
# 1:      1      one   1 101
# 2:      2      one   2 102
# 3:      3      one   3 103
# 4:      4      one   4 104
# 5:      5      one   5 105
# 6:      1      two   6 106
# 7:      2      two   7 107
# 8:      3      two   8 108
# 9:      4      two   9 109
# 10:     5      two  10 110
# 11:     1    three  11 111
# 12:     2    three  12 112
# 13:     3    three  13 113
# 14:     4    three  14 114
# 15:     5    three  15 115

Two related data.table issues:

melt.data.table should offer variable to match on the name, rather than the number

FR: expansion of melt functionality for handling names of output.

This is one of the (rare) instances where I believe good’ol base::reshape is cleaner. Its sep argument comes in handy here — both the names of the ‘value’ column and the levels of the ‘variable’ columns are generated in one go:

reshape(data = B,
        varying = names(B[ , -1]),
        sep = "_",
        direction = "long")

Melt using patterns when variable names contain string information – avoid coercion to numeric

Leave a Comment Cancel reply

More Related Contents:

Leave a Comment Cancel reply