R strsplit with multiple unordered split arguments?

Actually strsplit uses grep patterns as well. (A comma is a regex metacharacter whereas a space is not; hence the need for double escaping the commas in the pattern argument. So the use of "\\s" would be more to improve readability than of necessity):

> strsplit(test_1, "\\, |\\,| ")  # three possibilities OR'ed
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_2, "\\, |\\,| ")
[[1]]
[1] "abc" "def" "ghi" "klm"

Without using both \\, and \\, (note extra space that SO does not show) you would have gotten some character(0) values. Might have been clearer if I had written:

> strsplit(test_2, "\\,\\s|\\,|\\s")
[[1]]
[1] "abc" "def" "ghi" "klm"

@Fojtasek is so right: Using character classes often simplifies the task because it creates an implicit logical OR:

> strsplit(test_2, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_1, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"

Leave a Comment