Valid identifier characters in Scala

Working from the EBNF syntax in the spec:

upper ::= ‘A’ | ... | ‘Z’ | ‘$’ | ‘_’ and Unicode category Lu
lower ::= ‘a’ | ... | ‘z’ and Unicode category Ll
letter ::= upper | lower and Unicode categories Lo, Lt, Nl
digit ::= ‘0’ | ... | ‘9’
opchar ::= “all other characters in \u0020-007F and Unicode
            categories Sm, So except parentheses ([]) and periods”

But also taking into account the very beginning on Lexical Syntax that defines:

Parentheses ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’.
Delimiter characters ‘‘’ | ‘’’ | ‘"’ | ‘.’ | ‘;’ | ‘,’

Here is what I come up with. Working by elimination in the range \u0020-007F, eliminating letters, digits, parentheses and delimiters, we have for opchar… (drumroll):

! # % & * + - / : < = > ? @ \ ^ | ~
and also Sm and So – except for parentheses and periods.

(Edit: adding valid examples here:). In summary, here are some valid examples that highlights all cases – watch out for \ in the REPL, I had to escape as \\:

val !#%&*+-/:<=>?@\^|~ = 1 // all simple opchars
val simpleName = 1 
val withDigitsAndUnderscores_ab_12_ab12 = 1 
val wordEndingInOpChars_!#%&*+-/:<=>?@\^|~ = 1
val !^©® = 1 // opchars ans symbols
val abcαβγ_!^©® = 1 // mixing unicode letters and symbols

Note 1:

I found this Unicode category index to figure out Lu, Ll, Lo, Lt, Nl:

  • Lu (uppercase letters)
  • Ll (lowercase letters)
  • Lo (other letters)
  • Lt (titlecase)
  • Nl (letter numbers like roman numerals)
  • Sm (symbol math)
  • So (symbol other)

Note 2:

val #^ = 1 // legal   - two opchars
val #  = 1 // illegal - reserved word like class or => or @
val +  = 1 // legal   - opchar
val &+ = 1 // legal   - two opchars
val &2 = 1 // illegal - opchar and letter do not mix arbitrarily
val £2 = 1 // working - £ is part of Sc (Symbol currency) - undefined by spec
val ¬  = 1 // legal   - part of Sm

Note 3:

Other operator-looking things that are reserved words: _ : = => <- <: <% >: # @ and also \u21D2 ⇒ and \u2190

Leave a Comment