You’ve mixed a literal string \ud83d
in a json file on disk (six characters: \ u d 8 3 d
) and a single character u'\ud83d'
(specified using a string literal in Python source code) in memory. It is the difference between len(r'\ud83d') == 6
and len('\ud83d') == 1
on Python 3.
If you see '\ud83d\ude4f'
Python string (2 characters) then there is a bug upstream. Normally, you shouldn’t get such string. If you get one and you can’t fix upstream that generates it; you could fix it using surrogatepass
error handler:
>>> "\ud83d\ude4f".encode('utf-16', 'surrogatepass').decode('utf-16')
'🙏'
Note: even if your json file contains literal \ud83d\ude4f (12 characters); you shouldn’t get the surrogate pair:
>>> print(ascii(json.loads(r'"\ud83d\ude4f"')))
'\U0001f64f'
Notice: the result is 1 character ( '\U0001f64f'
), not the surrogate pair ('\ud83d\ude4f'
).