remove unicode emoji using re in python

You are not using the correct notation for non-BMP unicode points; you want to use \U0001FFFF, a capital U and 8 digits: myre = re.compile(u'[‘ u’\U0001F300-\U0001F5FF’ u’\U0001F600-\U0001F64F’ u’\U0001F680-\U0001F6FF’ u’\u2600-\u26FF\u2700-\u27BF]+’, re.UNICODE) This can be reduced to: myre = re.compile(u'[‘ u’\U0001F300-\U0001F64F’ u’\U0001F680-\U0001F6FF’ u’\u2600-\u26FF\u2700-\u27BF]+’, re.UNICODE) as your first two ranges are adjacent. Your version was specifying (with added … Read more