Regex to extract URLs from href attribute in HTML with Python [duplicate]

import re

url="<p>Hello World</p><a href="http://example.com">More Examples</a><a href="http://example2.com">Even More Examples</a>"

urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', url)

>>> print urls
['http://example.com', 'http://example2.com']

More Related Contents:

What’s the cleanest way to extract URLs from a string using Python?
Download all pdf files from a website using Python
Regular expression doesn't produce expected result
re.findall behaves weird
Do regular expressions from the re module support word boundaries (\b)?
Python extract pattern matches
Split a string by spaces — preserving quoted substrings — in Python
Remove all special characters, punctuation and spaces from string
How to validate a url in Python? (Malformed or not)
Reversing a regular expression in Python
re.findall not returning full match?
re.sub replace with matched content
Python regex, matching pattern over multiple lines.. why isn’t this working?
BeautifulSoup returns empty list when searching by compound class names
How to join components of a path when you are constructing a URL in Python
What does the “r” in pythons re.compile(r’ pattern flags’) mean?
Python regex to match dates
Do Python regular expressions have an equivalent to Ruby’s atomic grouping?
Determine complete Django url configuration
Finding words after keyword in python [duplicate]
Grep and Python
Regexp to remove specific number of occurrences of character only
re.search() only matches the first occurrence
How to get the first word in the string
Python: UserWarning: This pattern has match groups. To actually get the groups, use str.extract
Generate a String that matches a RegEx in Python [duplicate]
Split string at commas except when in bracket environment
Using ^ to match beginning of line in Python regex
How to split long regular expression rules to multiple lines in Python
Python regular expression not matching

More Related Contents:

Leave a Comment Cancel reply