Whats the difference between sed -E and sed -e

From source code, -E is an undocumented option for compatibility with BSD sed. /* Undocumented, for compatibility with BSD sed. */ case ‘E’: case ‘r’: if (extended_regexp_flags) usage(4); extended_regexp_flags = REG_EXTENDED; break; And from manual, -E in BSD sed is used to support extended regular expressions.

How to delete rows from a csv file based on a list values from another file?

What about the following: awk -F, ‘(NR==FNR){a[$1];next}!($1 in a)’ blacklist.csv candidates.csv How does this work? An awk program is a series of pattern-action pairs, written as: condition { action } condition { action } … where condition is typically an expression and action a series of commands. Here, the first condition-action pairs read: (NR==FNR){a[$1];next} if … Read more

How to process huge text files that contain EOF / Ctrl-Z characters using Python on Windows?

It’s easy to use Python to delete the DOS EOF chars; for example, def delete_eof(fin, fout): BUFSIZE = 2**15 EOFCHAR = chr(26) data = fin.read(BUFSIZE) while data: fout.write(data.translate(None, EOFCHAR)) data = fin.read(BUFSIZE) import sys ipath = sys.argv[1] opath = ipath + “.new” with open(ipath, “rb”) as fin, open(opath, “wb”) as fout: delete_eof(fin, fout) That takes … Read more

Awk replace a column with its hash value

So, you don’t really want to be doing this with awk. Any of the popular high-level scripting languages — Perl, Python, Ruby, etc. — would do this in a way that was simpler and more robust. Having said that, something like this will work. Given input like this: this is a test (E.g., a row … Read more