Remove u202a from Python string

When you initially created your .py file, your text editor introduced a non-printing character.

Consider this line:

carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

Let’s carefully select the string, including the quotes, and copy-paste it into an interactive Python session:

$ python
Python 3.6.1 (default, Jul 25 2017, 12:45:09) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "‪H:\\7 - Script\\teste.csv"
'\u202aH:\\7 - Script\\teste.csv'
>>> 

As you can see, there is a character with codepoint U-202A immediately before the H.

As someone else pointed out, the character at codepoint U-202A is LEFT-TO-RIGHT EMBEDDING. Returning to our Python session:

>>> s = "‪H:\\7 - Script\\teste.csv"
>>> import unicodedata
>>> unicodedata.name(s[0])
'LEFT-TO-RIGHT EMBEDDING'
>>> unicodedata.name(s[1])
'LATIN CAPITAL LETTER H'
>>> 

This further confirms that the first character in your string is not H, but the non-printing LEFT-TO-RIGHT EMBEDDING character.

I don’t know what text editor you used to create your program. Even if I knew, I’m probably not an expert in that editor. Regardless, some text editor that you used inserted, unbeknownst to you, U+202A.

One solution is to use a text editor that won’t insert that character, and/or will highlight non-printing characters. For example, in vim that line appears like so:

carregar_uml("<202a>H:\\7 - Script\\teste.csv", variaveis)

Using such an editor, simply delete the character between " and H.

carregar_uml("H:\\7 - Script\\teste.csv", variaveis)

Even though this line is visually identical to your original line, I have deleted the offending character. Using this line will avoid the OSError that you report.

Leave a Comment