For reasonably modern versions of sed, edit the standard input to yield the standard output with
$ echo 'τέχνη βιβλίο γη κήπος' | sed -E -e 's/[[:blank:]]+/\n/g'
τέχνη
βιβλίο
γη
κήπος
If your vocabulary words are in files named lesson1
and lesson2
, redirect sed’s standard output to the file all-vocab
with
sed -E -e 's/[[:blank:]]+/\n/g' lesson1 lesson2 > all-vocab
What it means:
- The character class
[[:blank:]]
matches either a single space character or
a single tab character.- Use
[[:space:]]
instead to match any single whitespace character (commonly space, tab, newline, carriage return, form-feed, and vertical tab). - The
+
quantifier means match one or more of the previous pattern. - So
[[:blank:]]+
is a sequence of one or more characters that are all space or tab.
- Use
- The
\n
in the replacement is the newline that you want. - The
/g
modifier on the end means perform the substitution as many times as possible rather than just once. - The
-E
option tells sed to use POSIX extended regex syntax and in particular for this case the+
quantifier. Without-E
, your sed command becomessed -e 's/[[:blank:]]\+/\n/g'
. (Note the use of\+
rather than simple+
.)
Perl Compatible Regexes
For those familiar with Perl-compatible regexes and a PCRE-capable sed, use \s+
to match runs of at least one whitespace character, as in
sed -E -e 's/\s+/\n/g' old > new
or
sed -e 's/\s\+/\n/g' old > new
These commands read input from the file old
and write the result to a file named new
in the current directory.
Maximum portability, maximum cruftiness
Going back to almost any version of sed since Version 7 Unix, the command invocation is a bit more baroque.
$ echo 'τέχνη βιβλίο γη κήπος' | sed -e 's/[ \t][ \t]*/\
/g'
τέχνη
βιβλίο
γη
κήπος
Notes:
- Here we do not even assume the existence of the humble
+
quantifier and simulate it with a single space-or-tab ([ \t]
) followed by zero or more of them ([ \t]*
). - Similarly, assuming sed does not understand
\n
for newline, we have to include it on the command line verbatim.- The
\
and the end of the first line of the command is a continuation marker that escapes the immediately following newline, and the remainder of the command is on the next line.- Note: There must be no whitespace preceding the escaped newline. That is, the end of the first line must be exactly backslash followed by end-of-line.
- This error prone process helps one appreciate why the world moved to visible characters, and you will want to exercise some care in trying out the command with copy-and-paste.
- The
Note on backslashes and quoting
The commands above all used single quotes (''
) rather than double quotes (""
). Consider:
$ echo '\\\\' "\\\\"
\\\\ \\
That is, the shell applies different escaping rules to single-quoted strings as compared with double-quoted strings. You typically want to protect all the backslashes common in regexes with single quotes.