4.4 KiB
Summary
Include identifiers immediately after literals in the literal token to
allow future expansion, e.g. "foo"bar
and a 1baz
are considered
whole (but semantically invalid) tokens, rather than two separate
tokens "foo"
, bar
and 1
, baz
respectively. This allows future
expansion of handling literals without risking breaking (macro) code.
Motivation
Currently a few kinds of literals (integers and floats) can have a fixed set of suffixes and other kinds do not include any suffixes. The valid suffixes on numbers are:
u, u8, u16, u32, u64
i, i8, i16, i32, i64
f32, f64
Most things not in this list are just ignored and treated as an
entirely separate token (prefixes of 128
are errors: e.g. 1u12
has
an error "invalid int suffix"
), and similarly any suffixes on other
literals are also separate tokens. For example:
#![feature(macro_rules)]
// makes a tuple
macro_rules! foo( ($($a: expr)*) => { ($($a, )+) } )
fn main() {
let bar = "suffix";
let y = "suffix";
let t: (uint, uint) = foo!(1u256);
println!("{}", foo!("foo"bar));
println!("{}", foo!('x'y));
}
/*
output:
(1, 256)
(foo, suffix)
(x, suffix)
*/
The compiler is eating the 1u
and then seeing the invalid suffix
256
and so treating that as a separate token, and similarly for the
string and character literals. (This problem is only visible in
macros, since that is the only place where two literals/identifiers can be placed
directly adjacent.)
This behaviour means we would be unable to expand the possibilities
for literals after freezing the language/macros, which would be
unfortunate, since user defined literals in C++ are reportedly
very nice, proposals for "bit data" would like to use types like u1
and u5
(e.g. RFC PR 327), and there are "fringe" types like
f16
, f128
and u128
that have uses but are not
common enough to warrant adding to the language now.
Detailed design
The tokenizer will have grammar literal: raw_literal identifier?
where raw_literal
covers strings, characters and numbers without
suffixes (e.g. "foo"
, 'a'
, 1
, 0x10
).
Examples of "valid" literals after this change (that is, entities that will be consumed as a single token):
"foo"bar "foo"_baz
'a'x 'a'_y
15u16 17i18 19f20 21.22f23
0b11u25 0x26i27 28.29e30f31
123foo 0.0bar
Placing a space between the letter of the suffix and the literal will
cause it to be parsed as two separate tokens, just like today. That is
"foo"bar
is one token, "foo" bar
is two tokens.
The example above would then be an error, something like:
let t: (uint, uint) = foo!(1u256); // error: literal with unsupported size
println!("{}", foo!("foo"bar)); // error: literal with unsupported suffix
println!("{}", foo!('x'y)); // error: literal with unsupported suffix
The above demonstrates that numeric suffixes could be special cased
to detect u<...>
and i<...>
to give more useful error messages.
(The macro example there is definitely an error because it is using
the incorrectly-suffixed literals as expr
s. If it was only
handling them as a token, i.e. tt
, there is the possibility that it
wouldn't have to be illegal, e.g. stringify!(1u256)
doesn't have to
be illegal because the 1u256
never occurs at runtime/in the type
system.)
Drawbacks
None beyond outlawing placing a literal immediately before a pattern,
but the current behaviour can easily be restored with a space: 123u 456
. (If a macro is using this for the purpose of hacky generalised
literals, the unresolved question below touches on this.)
Alternatives
Don't do this, or consider doing it for adjacent suffixes with an
alternative syntax, e.g. 10'bar
or 10$bar
.
Unresolved questions
-
Should it be the parser or the tokenizer rejecting invalid suffixes? This is effectively asking if it is legal for syntax extensions to be passed the raw literals? That is, can a
foo
procedural syntax extension accept and handle literals likefoo!(1u2)
? -
Should this apply to all expressions, e.g.
(1 + 2)bar
?