Assuming you have your locale set to UTF-8 (see locale
output), this works well to recognize invalid UTF-8 sequences:
grep -axv '.*' file.txt
Explanation (from grep
man page):
- -a, –text: treats file as text, essential prevents grep to abort once finding an invalid byte sequence (not being utf8)
- -v, –invert-match: inverts the output showing lines not matched
- -x ‘.*’ (–line-regexp): means to match a complete line consisting of any utf8 character.
Hence, there will be output, which is the lines containing the invalid not utf8 byte sequence containing lines (since inverted -v)