Using Objective C/Cocoa to unescape unicode characters, ie \u1234

It’s correct that Cocoa does not offer a solution, yet Core Foundation does: CFStringTransform.

CFStringTransform lives in a dusty, remote corner of Mac OS (and iOS) and so it’s a little know gem. It is the front end to Apple’s ICU compatible string transformation engine. It can perform real magic like transliterations between greek and latin (or about any known scripts), but it can also be used to do mundane tasks like unescaping strings from a crappy server:

NSString *input = @"\\u5404\\u500b\\u90fd";
NSString *convertedString = [input mutableCopy];

CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)convertedString, NULL, transform, YES);

NSLog(@"convertedString: %@", convertedString);

// prints: 各個都, tada!

As I said, CFStringTransform is really powerful. It supports a number of predefined transforms, like case mappings, normalizations or unicode character name conversion. You can even design your own transformations.

I have no idea why Apple does not make it available from Cocoa.

Edit 2015:

OS X 10.11 and iOS 9 add the following method to Foundation:

- (nullable NSString *)stringByApplyingTransform:(NSString *)transform reverse:(BOOL)reverse;

So the example from above becomes…

NSString *input = @"\\u5404\\u500b\\u90fd";
NSString *convertedString = [input stringByApplyingTransform:@"Any-Hex/Java"
                                                     reverse:YES];

NSLog(@"convertedString: %@", convertedString);

Thanks @nschmidt for the heads up.

Leave a Comment