Convert a HTML Table to JSON

Probably your data is something like:

html_data = """
<table>
  <tr>
    <td>Card balance</td>
    <td>$18.30</td>
  </tr>
  <tr>
    <td>Card name</td>
    <td>NAMEn</td>
  </tr>
  <tr>
    <td>Account holder</td>
    <td>NAME</td>
  </tr>
  <tr>
    <td>Card number</td>
    <td>1234</td>
  </tr>
  <tr>
    <td>Status</td>
    <td>Active</td>
  </tr>
</table>
"""

From which we can get your result as a list using this code:

from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")]
                         for row in BeautifulSoup(html_data)("tr")]

To convert the result to JSON, if you don’t care about the order:

import json
print json.dumps(dict(table_data))

Result:

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": "$18.30"
}

If you need the same order, use this:

from collections import OrderedDict
import json
print json.dumps(OrderedDict(table_data))

Which gives you:

{
    "Card balance": "$18.30",
    "Card name": "NAMEn",
    "Account holder": "NAME",
    "Card number": "1234",
    "Status": "Active"
}

Leave a Comment