Extract the ‘src’ attribute from an ‘img’ tag using Beautiful Soup

You can use Beautiful Soup to extract the src attribute of an HTML img tag. In my example, the htmlText contains the img tag itself, but this can be used for a URL too, along with urllib2.

For URLs

from BeautifulSoup import BeautifulSoup as BSHTML
import urllib2
page = urllib2.urlopen('http://www.youtube.com/')
soup = BSHTML(page)
images = soup.findAll('img')
for image in images:
    # Print image source
    print(image['src'])
    # Print alternate text
    print(image['alt'])

For texts with the img tag

from BeautifulSoup import BeautifulSoup as BSHTML
htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """
soup = BSHTML(htmlText)
images = soup.findAll('img')
for image in images:
    print(image['src'])

Python 3:

from bs4 import BeautifulSoup as BSHTML
import urllib

page = urllib.request.urlopen('https://github.com/abushoeb/emotag')
soup = BSHTML(page)
images = soup.findAll('img')

for image in images:
    # Print image source
    print(image['src'])
    # Print alternate text
    print(image['alt'])

Install modules if needed

# Python 3
pip install beautifulsoup4
pip install urllib3

Leave a Comment