How to extract tables from websites in Python

Pandas can do this right out of the box, saving you from having to parse the html yourself. to_html() extracts all tables from your html and puts them in a list of dataframes. to_csv() can be used to convert each dataframe to a csv file. For the web page in your example, the relevant table is the last one, which is why I used df_list[-1] in the code below.

import requests
import pandas as pd

url="http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500"
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv('my data.csv')

It’s simple enough to do in one line, if you prefer:

pd.read_html(requests.get(<url>).content)[-1].to_csv(<csv file>)

P.S. Just make sure you have lxml, html5lib, and BeautifulSoup4 packages installed in advance.

Leave a Comment