16 KiB
- Start Date: 2014-08-28
- RFC PR: rust-lang/rfcs#218
- Rust Issue: rust-lang/rust#24266
Summary
When a struct type S
has no fields (a so-called "empty struct"),
allow it to be defined via either struct S;
or struct S {}
.
When defined via struct S;
, allow instances of it to be constructed
and pattern-matched via either S
or S {}
.
When defined via struct S {}
, require instances to be constructed
and pattern-matched solely via S {}
.
Motivation
Today, when writing code, one must treat an empty struct as a special case, distinct from structs that include fields. That is, one must write code like this:
struct S2 { x1: int, x2: int }
struct S0; // kind of different from the above.
let s2 = S2 { x1: 1, x2: 2 };
let s0 = S0; // kind of different from the above.
match (s2, s0) {
(S2 { x1: y1, x2: y2 },
S0) // you can see my pattern here
=> { println!("Hello from S2({}, {}) and S0", y1, y2); }
}
While this yields code that is relatively free of extraneous
curly-braces, this special case handling of empty structs presents
problems for two cases of interest: automatic code generators
(including, but not limited to, Rust macros) and conditionalized code
(i.e. code with cfg
attributes; see the CFG problem appendix).
The heart of the code-generator argument is: Why force all
to-be-written code-generators and macros with special-case handling of
the empty struct case (in terms of whether or not to include the
surrounding braces), especially since that special case is likely to
be forgotten (yielding a latent bug in the code generator).
The special case handling of empty structs is also a problem for programmers who actively add and remove fields from structs during development; such changes cause a struct to switch from being empty and non-empty, and the associated revisions of changing removing and adding curly braces is aggravating (both in effort revising the code, and also in extra noise introduced into commit histories).
This RFC proposes an approach similar to the one we used circa February
2013, when both S0
and S0 { }
were accepted syntaxes for an empty
struct. The parsing ambiguity that motivated removing support for
S0 { }
is no longer present (see the Ancient History appendix).
Supporting empty braces in the syntax for empty structs is easy to do
in the language now.
Detailed design
There are two kinds of empty structs: Braced empty structs and flexible empty structs. Flexible empty structs are a slight generalization of the structs that we have today.
Flexible empty structs are defined via the syntax struct S;
(as today).
Braced empty structs are defined via the syntax struct S { }
("new").
Both braced and flexible empty structs can be constructed via the
expression syntax S { }
("new"). Flexible empty structs, as today,
can also be constructed via the expression syntax S
.
Both braced and flexible empty structs can be pattern-matched via the
pattern syntax S { }
("new"). Flexible empty structs, as today,
can also be pattern-matched via the pattern syntax S
.
Braced empty struct definitions solely affect the type namespace, just like normal non-empty structs. Flexible empty structs affect both the type and value namespaces.
As a matter of style, using braceless syntax is preferred for constructing and pattern-matching flexible empty structs. For example, pretty-printer tools are encouraged to emit braceless forms if they know that the corresponding struct is a flexible empty struct. (Note that pretty printers that handle incomplete fragments may not have such information available.)
There is no ambiguity introduced by this change, because we have already introduced a restriction to the Rust grammar to force the use of parentheses to disambiguate struct literals in such contexts. (See Rust RFC 25).
The expectation is that when migrating code from a flexible empty struct to a non-empty struct, it can start by first migrating to a braced empty struct (and then have a tool indicate all of the locations where braces need to be added); after that step has been completed, one can then take the next step of adding the actual field.
Drawbacks
Some people like "There is only one way to do it." But, there is precedent in Rust for violating "one way to do it" in favor of syntactic convenience or regularity; see the Precedent for flexible syntax in Rust appendix. Also, see the Always Require Braces alternative below.
I have attempted to summarize the previous discussion from RFC PR 147 in the Recent History appendix; some of the points there include drawbacks to this approach and to the Always Require Braces alternative.
Alternatives
Always Require Braces
Alternative 1: "Always Require Braces". Specifically, require empty
curly braces on empty structs. People who like the current syntax of
curly-brace free structs can encode them this way: enum S0 { S0 }
This would address all of the same issues outlined above. (Also, the
author (pnkfelix) would be happy to take this tack.)
The main reason not to take this tack is that some people may like writing empty structs without braces, but do not want to switch to the unary enum version described in the previous paragraph. See "I wouldn't want to force noisier syntax ..." in the Recent History appendix.
Status quo
Alternative 2: Status quo. Macros and code-generators in general will need to handle empty structs as a special case. We may continue hitting bugs like CFG parse bug. Some users will be annoyed but most will probably cope.
Synonymous in all contexts
Alternative 3: An earlier version of this RFC proposed having struct S;
be entirely synonymous with struct S { }
, and the expression
S { }
be synonymous with S
.
This was deemed problematic, since it would mean that S { }
would
put an entry into both the type and value namespaces, while
S { x: int }
would only put an entry into the type namespace.
Thus the current draft of the RFC proposes the "flexible" versus
"braced" distinction for empty structs.
Never synonymous
Alternative 4: Treat struct S;
as requiring S
at the expression
and pattern sites, and struct S { }
as requiring S { }
at the
expression and pattern sites.
This in some ways follows a principle of least surprise, but it also
is really hard to justify having both syntaxes available for empty
structs with no flexibility about how they are used. (Note again that
one would have the option of choosing between
enum S { S }
, struct S;
, or struct S { }
, each with their own
idiosyncrasies about whether you have to write S
or S { }
.)
I would rather adopt "Always Require Braces" than "Never Synonymous"
Empty Tuple Structs
One might say "why are you including support for curly braces, but not parentheses?" Or in other words, "what about empty tuple structs?"
The code-generation argument could be applied to tuple-structs as
well, to claim that we should allow the syntax S0()
. I am less
inclined to add a special case for that; I think tuple-structs are
less frequently used (especially with many fields); they are largely
for ad-hoc data such as newtype wrappers, not for code generators.
Note that we should not attempt to generalize this RFC as proposed to
include tuple structs, i.e. so that given struct S0 {}
, the
expressions T0
, T0 {}
, and T0()
would be synonymous. The reason
is that given a tuple struct struct T2(int, int)
, the identifier
T2
is already bound to a constructor function:
fn main() {
#[deriving(Show)]
struct T2(int, int);
fn foo<S:std::fmt::Show>(f: |int, int| -> S) {
println!("Hello from {} and {}", f(2,3), f(4,5));
}
foo(T2);
}
So if we were to attempt to generalize the leniency of this RFC to
tuple structs, we would be in the unfortunate situation given struct T0();
of trying to treat T0
simultaneously as an instance of the
struct and as a constructor function. So, the handling of empty
structs proposed by this RFC does not generalize to tuple structs.
(Note that if we adopt alternative 1, Always Require Braces, then
the issue of how tuple structs are handled is totally orthogonal -- we
could add support for struct T0()
as a distinct type from struct S0 {}
, if we so wished, or leave it aside.)
Unresolved questions
None
Appendices
The CFG problem
A program like this works today:
fn main() {
#[deriving(Show)]
struct Svaries {
x: int,
y: int,
#[cfg(zed)]
z: int,
}
let s = match () {
#[cfg(zed)] _ => Svaries { x: 3, y: 4, z: 5 },
#[cfg(not(zed))] _ => Svaries { x: 3, y: 4 },
};
println!("Hello from {}", s)
}
Observe what happens when one modifies the above just a bit:
struct Svaries {
#[cfg(eks)]
x: int,
#[cfg(why)]
y: int,
#[cfg(zed)]
z: int,
}
Now, certain cfg
settings yield an empty struct, even though it
is surrounded by braces. Today this leads to a CFG parse bug
when one attempts to actually construct such a struct.
If we want to support situations like this properly, we will probably
need to further extend the cfg
attribute so that it can be placed
before individual fields in a struct constructor, like this:
// You cannot do this today,
// but maybe in the future (after a different RFC)
let s = Svaries {
#[cfg(eks)] x: 3,
#[cfg(why)] y: 4,
#[cfg(zed)] z: 5,
};
Supporting such a syntax consistently in the future should start today with allowing empty braces as legal code. (Strictly speaking, it is not necessary that we add support for empty braces at the parsing level to support this feature at the semantic level. But supporting empty-braces in the syntax still seems like the most consistent path to me.)
Ancient History
A parsing ambiguity was the original motivation for disallowing the
syntax S {}
in favor of S
for constructing an instance of
an empty struct. The ambiguity and various options for dealing with it
were well documented on the rust-dev thread.
Both syntaxes were simultaneously supported at the time.
In particular, at the time that mailing list thread was created, the
code match match x {} ...
would be parsed as match (x {}) ...
, not
as (match x {}) ...
(see Rust PR 5137); likewise, if x {}
would
be parsed as an if-expression whose test component is the struct
literal x {}
. Thus, at the time of Rust PR 5137, if the input to
a match
or if
was an identifier expression, one had to put
parentheses around the identifier to force it to be interpreted as
input to the match
/if
, and not as a struct constructor.
Of the options for resolving this discussed on the mailing list
thread, the one selected (removing S {}
construction expressions)
was chosen as the most expedient option.
At that time, the option of "Place a parser restriction on those
contexts where {
terminates the expression and say that struct
literals cannot appear there unless they are in parentheses." was
explicitly not chosen, in favor of continuing to use the
disambiguation rule in use at the time, namely that the presence of a
label (e.g. S { a_label: ... }
) was the way to distinguish a
struct constructor from an identifier followed by a control block, and
thus, "there must be one label."
Naturally, if the construction syntax were to be disallowed, it made
sense to also remove the struct S {}
declaration syntax.
Things have changed since the time of that mailing list thread;
namely, we have now adopted the aforementioned parser restriction
Rust RFC 25. (The text of RFC 25 does not explicitly address
match
, but we have effectively expanded it to include a curly-brace
delimited block of match-arms in the definition of "block".) Today,
one uses parentheses around struct literals in some contexts (such as
for e in (S {x: 3}) { ... }
or match (S {x: 3}) { ... }
Note that there was never an ambiguity for uses of struct S0 { }
in item
position. The issue was solely about expression position prior to the
adoption of Rust RFC 25.
Precedent for flexible syntax in Rust
There is precedent in Rust for violating "one way to do it" in favor of syntactic convenience or regularity.
For example, one can often include an optional trailing comma, for
example in: let x : &[int] = [3, 2, 1, ];
.
One can also include redundant curly braces or parentheses, for example in:
println!("hi: {}", { if { x.len() > 2 } { ("whoa") } else { ("there") } });
One can even mix the two together when delimiting match arms:
let z: int = match x {
[3, 2] => { 3 }
[3, 2, 1] => 2,
_ => { 1 },
};
We do have lints for some style violations (though none catch the cases above), but lints are different from fundamental language restrictions.
Recent history
There was a previous RFC PR that was effectively the same in spirit to this one. It was closed because it was not sufficient well fleshed out for further consideration by the core team. However, to save people the effort of reviewing the comments on that PR (and hopefully stave off potential bikeshedding on this PR), I here summarize the various viewpoints put forward on the comment thread there, and note for each one, whether that viewpoint would be addressed by this RFC (accept both syntaxes), by Always Require Braces, or by Status Quo.
Note that this list of comments is just meant to summarize the list of views; it does not attempt to reflect the number of commenters who agreed or disagreed with a particular point. (But since the RFC process is not a democracy, the number of commenters should not matter anyway.)
- "+1" ==> Favors: This RFC (or potentially Always Require Braces; I think the content of RFC PR 147 shifted over time, so it is hard to interpret the "+1" comments now).
- "I find
let s = S0;
jarring, think its an enum initially." ==> Favors: Always Require Braces - "Frequently start out with an empty struct and add fields as I need them." ==> Favors: This RFC or Always Require Braces
- "Foo{} suggests is constructing something that it's not; all uses of the value
Foo
are indistinguishable from each other" ==> Favors: Status Quo - "I find it strange anyone would prefer
let x = Foo{};
overlet x = Foo;
" ==> Favors Status Quo; strongly opposes Always Require Braces. - "I agree that 'instantiation-should-follow-declaration', that is, structs declared
;, (), {}
should only be instantiated [via];, (), { }
respectively" ==> Opposes leniency of this RFC in that it allows expression to use include or omit{}
on an empty struct, regardless of declaration form, and vice-versa. - "The code generation argument is reasonable, but I wouldn't want to force noisier syntax on all 'normal' code just to make macros work better." ==> Favors: This RFC