How to extract data from html table in shell script?

Go with (g)awk, it’s capable :-), here is a solution, but please note: it’s only working with the exact html table format you had posted.

 awk -F "</*td>|</*tr>" '/<\/*t[rd]>.*[A-Z][A-Z]/ {print $3, $5, $7 }' FILE

Here you can see it in action: https://ideone.com/zGfLe

Some explanation:

  1. -F sets the input field separator to a regexp (any of tr‘s or td‘s opening or closing tag

  2. then works only on lines that matches those tags AND at least two upercasse fields

  3. then prints the needed fields.

HTH

Leave a Comment