mirror of https://github.com/rust-lang/rfcs.git
Amend RFC2603 to allow identifiers to start with a digit.
This commit is contained in:
parent
dcd2b8f420
commit
5a4d154846
|
@ -506,7 +506,7 @@ mod gödel {
|
|||
would be mangled as:
|
||||
|
||||
```
|
||||
_RNvNtNtC7mycrateu8gdel_Fqa6escher4bach
|
||||
_RNvNtNtC7mycrateu8gdel_5qa6escher4bach
|
||||
<-------->
|
||||
Unicode component
|
||||
```
|
||||
|
@ -613,10 +613,10 @@ compiler generates mangled names.
|
|||
|
||||
The syntax of mangled names is given in extended Backus-Naur form:
|
||||
|
||||
- Non-terminals are within angle brackets (as in `<name-prefix>`)
|
||||
- Non-terminals are within angle brackets (as in `<path>`)
|
||||
- Terminals are within quotes (as in `"_R"`),
|
||||
- Optional parts are in brackets (as in `[<decimal>]`),
|
||||
- Repetition (zero or more times) is signified by curly braces (as in `{<name-prefix>}`)
|
||||
- Optional parts are in brackets (as in `[<disambiguator>]`),
|
||||
- Repetition (zero or more times) is signified by curly braces (as in `{<type>}`)
|
||||
- Comments are marked with `//`.
|
||||
|
||||
Mangled names conform to the following grammar:
|
||||
|
@ -641,11 +641,13 @@ Mangled names conform to the following grammar:
|
|||
<impl-path> = [<disambiguator>] <path>
|
||||
|
||||
// The <decimal-number> is the length of the identifier in bytes.
|
||||
// <bytes> is the identifier itself and must not start with a decimal digit.
|
||||
// <bytes> is the identifier itself, and it's optionally preceded by "_",
|
||||
// to separate it from its length - this "_" is mandatory if the <bytes>
|
||||
// starts with a decimal digit, or "_", in order to keep it unambiguous.
|
||||
// If the "u" is present then <bytes> is Punycode-encoded.
|
||||
<identifier> = [<disambiguator>] <undisambiguated-identifier>
|
||||
<disambiguator> = "s" <base-62-number>
|
||||
<undisambiguated-identifier> = ["u"] <decimal-number> <bytes>
|
||||
<undisambiguated-identifier> = ["u"] <decimal-number> ["_"] <bytes>
|
||||
|
||||
// Namespace of the identifier in a (nested) path.
|
||||
// It's an a-zA-Z character, with a-z reserved for implementation-internal
|
||||
|
@ -775,29 +777,22 @@ and, for now, only define a mangling for integer values.
|
|||
### Punycode Identifiers
|
||||
|
||||
Punycode generates strings of the form `([[:ascii:]]+-)?[[:alnum:]]+`.
|
||||
This is problematic for two reasons:
|
||||
This is problematic because of the `-` character, which is not in the
|
||||
supported character set; Punycode uses it to separate the ASCII part
|
||||
(if it exists), from the base-36 encoding of the non-ASCII characters.
|
||||
|
||||
- Generated strings can contain a `-` character; which is not in the
|
||||
supported character set.
|
||||
- Generated strings can start with a digit; which makes them clash
|
||||
with the byte-count prefix of the `<identifier>` production.
|
||||
|
||||
For these reasons, vanilla Punycode string are further encoded during mangling:
|
||||
|
||||
- The `-` character is simply replaced by a `_` character.
|
||||
- The part of the Punycode string that encodes the non-ASCII characters
|
||||
is a base-36 number, using `[a-z0-9]` as its "digits". We want to get
|
||||
rid of the decimal digits in there, so we simply remap `0-9` to `A-J`.
|
||||
For this reasons, we deviate from vanilla Punycode, by replacing
|
||||
the `-` character with a `_` character.
|
||||
|
||||
Here are some examples:
|
||||
|
||||
| Original | Punycode | Punycode + Encoding |
|
||||
|-----------------|-----------------|---------------------|
|
||||
| føø | f-5gaa | f_Fgaa |
|
||||
| α_ω | _-ylb7e | __ylbHe |
|
||||
| 铁锈 | n84amf | nIEamf |
|
||||
| 🤦 | fq9h | fqJh |
|
||||
| ρυστ | 2xaedc | Cxaedc |
|
||||
| føø | f-5gaa | f_5gaa |
|
||||
| α_ω | _-ylb7e | __ylb7e |
|
||||
| 铁锈 | n84amf | n84amf |
|
||||
| 🤦 | fq9h | fq9h |
|
||||
| ρυστ | 2xaedc | 2xaedc |
|
||||
|
||||
With this post-processing in place the Punycode strings can be treated
|
||||
like regular identifiers and need no further special handling.
|
||||
|
@ -1154,3 +1149,4 @@ pub static QUUX: u32 = {
|
|||
- Resolve question of complex constant data.
|
||||
- Add a recommended resolution for open question around Punycode identifiers.
|
||||
- Add a recommended resolution for open question around encoding function parameter types.
|
||||
- Allow identifiers to start with a digit.
|
||||
|
|
Loading…
Reference in New Issue