Swift countElements() return incorrect value when count flag emoji

Update for Swift 4 (Xcode 9)

As of Swift 4 (tested with Xcode 9 beta) grapheme clusters break after every second regional indicator symbol, as mandated by the Unicode 9
standard:

let str1 = "πŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺ"
print(str1.count) // 5
print(Array(str1)) // ["πŸ‡©πŸ‡ͺ", "πŸ‡©πŸ‡ͺ", "πŸ‡©πŸ‡ͺ", "πŸ‡©πŸ‡ͺ", "πŸ‡©πŸ‡ͺ"]

Also String is a collection of its characters (again), so one can
obtain the character count with str1.count.


(Old answer for Swift 3 and older:)

From “3 Grapheme Cluster Boundaries”
in the “Standard Annex #29 UNICODE TEXT SEGMENTATION”:
(emphasis added):

A legacy grapheme cluster is defined as a base (such as A or γ‚«)
followed by zero or more continuing characters. One way to think of
this is as a sequence of characters that form a β€œstack”.

The base can be single characters, or be any sequence of Hangul Jamo
characters that form a Hangul Syllable, as defined by D133 in The
Unicode Standard, or be any sequence of Regional_Indicator (RI) characters. The RI characters are used in pairs to denote Emoji
national flag symbols corresponding to ISO country codes. Sequences of
more than two RI characters should be separated by other characters,
such as U+200B ZWSP.

(Thanks to @rintaro for the link).

A Swift Character represents an extended grapheme cluster, so it is (according
to this reference) correct that any sequence of regional indicator symbols
is counted as a single character.

You can separate the “flags” by a ZERO WIDTH NON-JOINER:

let str1 = "πŸ‡©πŸ‡ͺ\u{200C}πŸ‡©πŸ‡ͺ"
print(str1.characters.count) // 2

or insert a ZERO WIDTH SPACE:

let str2 = "πŸ‡©πŸ‡ͺ\u{200B}πŸ‡©πŸ‡ͺ"
print(str2.characters.count) // 3

This solves also possible ambiguities, e.g. should “πŸ‡«β€‹πŸ‡·β€‹πŸ‡Ίβ€‹πŸ‡Έ”
be “πŸ‡«β€‹πŸ‡·πŸ‡Ίβ€‹πŸ‡Έ” or “πŸ‡«πŸ‡·β€‹πŸ‡ΊπŸ‡Έ” ?

See also How to know if two emojis will be displayed as one emoji? about a possible method
to count the number of “composed characters” in a Swift string,
which would return 5 for your let str1 = "πŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺ".

Leave a Comment