Deleting reversed duplicates with R

mydf <- read.table(text="gene_x    gene_y
AT1       AT2
AT3       AT4
AT1       AT2
AT1       AT3
AT2       AT1", header=TRUE, stringsAsFactors=FALSE)

Here’s one strategy using apply, sort, paste, and duplicated:

mydf[!duplicated(apply(mydf,1,function(x) paste(sort(x),collapse=""))),]
  gene_x gene_y
1    AT1    AT2
2    AT3    AT4
4    AT1    AT3

And here’s a slightly different solution:

mydf[!duplicated(lapply(as.data.frame(t(mydf), stringsAsFactors=FALSE), sort)),]
  gene_x gene_y
1    AT1    AT2
2    AT3    AT4
4    AT1    AT3

Leave a Comment