rustc_lexer's definition of ids are more general than lang ref's spec

This is how Rust identifiers' lexical syntax is defined: https://doc.rust-lang.org/reference/identifiers.html
This is how the lexer for Rust identifiers is implemented: https://github.com/rust-lang/rust/blob/ce0d64e03ef9875e0935bb60e989542b7ec29579/compiler/rustc_lexer/src/lib.rs#L264-L297

The specification says it should start with ASCII alphabetic and continue with ASCII alphanumeric or underscore. But the implementation uses http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax which is much more general than that as far as I understand.

I think one of code or lang ref should be updated, but I'm not sure which one.

(I didn't check lexing for other tokens, it might be useful to compare others with the language reference's definitions too)

	/// True if `c` is valid as a first character of an identifier.
	/// See [Rust language reference](https://doc.rust-lang.org/reference/identifiers.html) for
	/// a formal definition of valid identifier name.
	pub fn is_id_start(c: char) -> bool {
	// This is XID_Start OR '_' (which formally is not a XID_Start).
	// We also add fast-path for ascii idents
	('a'..='z').contains(&c)
	\|\| ('A'..='Z').contains(&c)
	\|\| c == '_'
	\|\| (c > '\x7f' && unicode_xid::UnicodeXID::is_xid_start(c))
	}

	/// True if `c` is valid as a non-first character of an identifier.
	/// See [Rust language reference](https://doc.rust-lang.org/reference/identifiers.html) for
	/// a formal definition of valid identifier name.
	pub fn is_id_continue(c: char) -> bool {
	// This is exactly XID_Continue.
	// We also add fast-path for ascii idents
	('a'..='z').contains(&c)
	\|\| ('A'..='Z').contains(&c)
	\|\| ('0'..='9').contains(&c)
	\|\| c == '_'
	\|\| (c > '\x7f' && unicode_xid::UnicodeXID::is_xid_continue(c))
	}

	/// The passed string is lexically an identifier.
	pub fn is_ident(string: &str) -> bool {
	let mut chars = string.chars();
	if let Some(start) = chars.next() {
	is_id_start(start) && chars.all(is_id_continue)
	} else {
	false
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rustc_lexer's definition of ids are more general than lang ref's spec #85809

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rustc_lexer's definition of ids are more general than lang ref's spec #85809

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions