Google Search Web Scraping with Python

You can always directly scrape Google results. To do this, you can use the URL https://google.com/search?q=<Query> this will return the top 10 search results.

Then you can use lxml for example to parse the page. Depending on what you use, you can either query the resulting node tree via a CSS-Selector (.r a) or using a XPath-Selector (//h3[@class="r"]/a)

In some cases the resulting URL will redirect to Google. Usually it contains a query-parameter qwhich will contain the actual request URL.

Example code using lxml and requests:

from urllib.parse import urlencode, urlparse, parse_qs

from lxml.html import fromstring
from requests import get

raw = get("https://www.google.com/search?q=StackOverflow").text
page = fromstring(raw)

for result in page.cssselect(".r a"):
    url = result.get("href")
    if url.startswith("/url?"):
        url = parse_qs(urlparse(url).query)['q']
    print(url[0])

A note on google banning your IP: In my experience, google only bans
if you start spamming google with search requests. It will respond
with a 503 if Google thinks you are bot.

More Related Contents:

Leave a Comment Cancel reply