How can I split a URL string up into separate parts in Python?

The urlparse module in Python 2.x (or urllib.parse in Python 3.x) would be the way to do it.

>>> from urllib.parse import urlparse
>>> url="http://example.com/random/folder/path.html"
>>> parse_object = urlparse(url)
>>> parse_object.netloc
'example.com'
>>> parse_object.path
'/random/folder/path.html'
>>> parse_object.scheme
'http'
>>>

If you wanted to do more work on the path of the file under the URL, you can use the posixpath module:

>>> from posixpath import basename, dirname
>>> basename(parse_object.path)
'path.html'
>>> dirname(parse_object.path)
'/random/folder'

After that, you can use posixpath.join to glue the parts together.

Note: Windows users will choke on the path separator in os.path. The posixpath module documentation has a special reference to URL manipulation, so all’s good.

More Related Contents:

Leave a Comment Cancel reply