Skip to content

Clang 14 rejects certain Unicode characters in identifiers that are accepted by Clang 13 and the C++ Standard #54732

Closed
@tttapa

Description

@tttapa

Some Unicode characters like ₊ (U+208A) and other subscripts are rejected by Clang 14. These characters are in the allowed ranges for identifiers in the [lex.name] section of the C++ Standard. Recent versions of GCC and older versions of Clang do not raise any errors.

For example:

double foo(double xₖ, double xₖ₊₁) {
  return xₖ₊₁ - xₖ;
}
$ clang++-14 -c unicode.cpp -std=c++20                                                                                                                                                 
unicode.cpp:1:36: error: character <U+208A> not allowed in an identifier
double foo(double xₖ, double xₖ₊₁) {
                               ^
unicode.cpp:1:39: error: character <U+2081> not allowed in an identifier
double foo(double xₖ, double xₖ₊₁) {
                                ^
unicode.cpp:2:14: error: character <U+208A> not allowed in an identifier
  return xₖ₊₁ - xₖ;
           ^
unicode.cpp:2:17: error: character <U+2081> not allowed in an identifier
  return xₖ₊₁ - xₖ;
            ^
4 errors generated.
$ clang++-14 --version
Ubuntu clang version 14.0.1-++20220402053234+23d08271a4b2-1~exp1~20220402053315.111
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Is this a deliberate change or a regression bug from Clang 13 to 14?

Metadata

Metadata

Assignees

No one assigned

    Labels

    c++23clang:frontendLanguage frontend issues, e.g. anything involving "Sema"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions