I think the safest approach here is to just replace any suspicious characters. So, I think you can just replace (or get rid of) anything that isn’t alphanumeric, -, _, a space, or a period. And here’s how you do that:
import re
re.sub(r'[^\w_. -]', '_', filename)
The above escapes every character that’s not a letter, '_'
, '-'
, '.'
or space with an '_'
. So, if you’re looking at an entire path, you’ll want to throw os.sep in the list of approved characters as well.
Here’s some sample output:
In [27]: re.sub(r'[^\w\-_\. ]', '_', r'some\*-file._n\\ame')
Out[27]: 'some__-file._n__ame'