Converting a String to a List of Words?

Try this:

import re

mystr="This is a string, with words!"
wordList = re.sub("[^\w]", " ",  mystr).split()

How it works:

From the docs :

re.sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function.

so in our case :

pattern is any non-alphanumeric character.

[\w] means any alphanumeric character and is equal to the character set
[a-zA-Z0-9_]

a to z, A to Z , 0 to 9 and underscore.

so we match any non-alphanumeric character and replace it with a space .

and then we split() it which splits string by space and converts it to a list

so ‘hello-world’

becomes ‘hello world’

with re.sub

and then [‘hello’ , ‘world’]

after split()

let me know if any doubts come up.

Leave a Comment