How to convert a factor to integer\numeric without loss of information?

See the Warning section of ?factor: In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)). The FAQ on R has similar advice. Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))? … Read more

Collapse / concatenate / aggregate a column to a single comma separated string within each group

Here are some options using toString, a function that concatenates a vector of strings using comma and space to separate components. If you don’t want commas, you can use paste() with the collapse argument instead. data.table # alternative using data.table library(data.table) as.data.table(data)[, toString(C), by = list(A, B)] aggregate This uses no packages: # alternative using … Read more

Split comma-separated strings in a column into separate rows

Several alternatives: 1) two ways with data.table: library(data.table) # method 1 (preferred) setDT(v)[, lapply(.SD, function(x) unlist(tstrsplit(x, “,”, fixed=TRUE))), by = AB ][!is.na(director)] # method 2 setDT(v)[, strsplit(as.character(director), “,”, fixed=TRUE), by = .(AB, director) ][,.(director = V1, AB)] 2) a dplyr / tidyr combination: library(dplyr) library(tidyr) v %>% mutate(director = strsplit(as.character(director), “,”)) %>% unnest(director) 3) with … Read more

How to sum a variable by group

Using aggregate: aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum) Category x 1 First 30 2 Second 5 3 Third 34 In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind: aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) … (embedding @thelatemail comment), aggregate has a formula interface too aggregate(Frequency … Read more

Why are these numbers not equal?

General (language agnostic) reason Since not all numbers can be represented exactly in IEEE floating point arithmetic (the standard that almost all computers use to represent decimal numbers and do math with them), you will not always get what you expected. This is especially true because some values which are simple, finite decimals (such as … Read more

Reshaping data.frame from wide to long format

Three alternative solutions: 1) With data.table: You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt-function from reshape2. You can for example also specify the name of the variable-column: library(data.table) long <- melt(setDT(wide), id.vars = c(“Code”,”Country”), … Read more