Python: How can I replace full-width characters with half-width characters?

The built-in unicodedata module can do it:

>>> import unicodedata
>>> foo = u'１２３４５６７８９０'
>>> unicodedata.normalize('NFKC', foo)
u'1234567890'

The “NFKC” stands for “Normalization Form KC [Compatibility Decomposition, followed by Canonical Composition]”, and replaces full-width characters by half-width ones, which are Unicode equivalent.

Note that it also normalizes all sorts of other things at the same time, like separate accent marks and Roman numeral symbols.

More Related Contents:

Python, Unicode, and the Windows console
(unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape [duplicate]
Unicode (UTF-8) reading and writing to files in Python
SyntaxError: Non-ASCII character ‘\xa3’ in file when function returns ‘£’
What’s the deal with Python 3.4, Unicode, different languages and Windows?
How to fetch a non-ascii url with urlopen?
Normalizing Unicode
Python and BeautifulSoup encoding issues [duplicate]
Where is Python’s “best ASCII for this Unicode” database? [closed]
How to find out if Python is compiled with UCS-2 or UCS-4?
How to convert a string to utf-8 in Python
NameError: global name ‘unicode’ is not defined – in Python 3
Python unicode equal comparison failed
General Unicode/UTF-8 support for csv files in Python 2.6
MySQL “incorrect string value” error when save unicode string in Django
Any gotchas using unicode_literals in Python 2.6?
How to correctly parse UTF-8 encoded HTML to Unicode strings with BeautifulSoup? [duplicate]
When should I use ugettext_lazy?
Get unicode code point of a character using Python
How can I check if a Python unicode string contains non-Western letters?
How do convert unicode escape sequences to unicode characters in a python string
ElementTree and unicode
Unicode filenames on Windows with Python & subprocess.Popen()
Convert UTF-16 to UTF-8 and remove BOM?
requests.get returns 403 while the same url works in browser
Python string to unicode [duplicate]
How to bind a text domain to a local folder for gettext under GTK3
Set encoding in Python 3 CGI scripts
Python 3 chokes on CP-1252/ANSI reading
How to make print() output UTF-8 in Python 3.0?

More Related Contents:

Leave a Comment Cancel reply