From 49820d63dc83bceb4415451e4aea0d4acc219e88 Mon Sep 17 00:00:00 2001
From: Chris Lattner
Typedefs in C make semantic analysis a bit more complex than it would
be without them. The issue is that we want to capture typedef information
@@ -312,8 +312,11 @@ and represent it in the AST perfectly, but the semantics of operations need to
void func() {
typedef int foo;
foo X, *Y;
+ typedef foo* bar;
+ bar Z;
*X; // error
**Y; // error
+ **Z; // error
}
@@ -321,12 +324,15 @@ void func() {
on the annotated lines. In this example, we expect to get:
-../t.c:4:1: error: indirection requires pointer operand ('foo' invalid) +test.c:6:1: error: indirection requires pointer operand ('foo' invalid) *X; // error ^~ -../t.c:5:1: error: indirection requires pointer operand ('foo' invalid) +test.c:7:1: error: indirection requires pointer operand ('foo' invalid) **Y; // error ^~~ +test.c:8:1: error: indirection requires pointer operand ('foo' invalid) +**Z; // error +^~~
While this example is somewhat silly, it illustrates the point: we want to @@ -334,37 +340,67 @@ retain typedef information where possible, so that we can emit errors about "std::string" instead of "std::basic_string<char, std:...". Doing this requires properly keeping typedef information (for example, the type of "X" is "foo", not "int"), and requires properly propagating it through the -various operators (for example, the type of *Y is "foo", not "int").
+various operators (for example, the type of *Y is "foo", not "int"). In order +to retain this information, the type of these expressions is an instance of the +TypedefType class, which indicates that the type of these expressions is a +typedef for foo. + +Representing types like this is great for diagnostics, because the +user-specified type is always immediately available. There are two problems +with this: first, various semantic checks need to make judgements about the +structure of a type, not its structure. Second, we need an efficient +way to query whether two types are structurally identical to each other, +ignoring typedefs. The solution to both of these problems is the idea of +canonical types.
+-/// Type - This is the base class of the type hierarchy. A central concept -/// with types is that each type always has a canonical type. A canonical type -/// is the type with any typedef names stripped out of it or the types it -/// references. For example, consider: -/// -/// typedef int foo; -/// typedef foo* bar; -/// 'int *' 'foo *' 'bar' -/// -/// There will be a Type object created for 'int'. Since int is canonical, its -/// canonicaltype pointer points to itself. There is also a Type for 'foo' (a -/// TypeNameType). Its CanonicalType pointer points to the 'int' Type. Next -/// there is a PointerType that represents 'int*', which, like 'int', is -/// canonical. Finally, there is a PointerType type for 'foo*' whose canonical -/// type is 'int*', and there is a TypeNameType for 'bar', whose canonical type -/// is also 'int*'. -/// -/// Non-canonical types are useful for emitting diagnostics, without losing -/// information about typedefs being used. Canonical types are useful for type -/// comparisons (they allow by-pointer equality tests) and useful for reasoning -/// about whether something has a particular form (e.g. is a function type), -/// because they implicitly, recursively, strip all typedefs out of a type. -/// -/// Types, once created, are immutable. -///
+Every instance of the Type class contains a canonical type pointer. For +simple types with no typedefs involved (e.g. "int", "int*", +"int**"), the type just points to itself. For types that have a +typedef somewhere in their structure (e.g. "foo", "foo*", +"foo**", "bar"), the canonical type pointer points to their +structurally equivalent type without any typedefs (e.g. "int", +"int*", "int**", and "int*" respectively).
+This design provides a constant time operation (dereferencing the canonical +type pointer) that gives us access to the structure of types. For example, +we can trivially tell that "bar" and "foo*" are the same type by dereferencing +their canonical type pointers and doing a pointer comparison (they both point +to the single "int*" type).
+ +Canonical types and typedef types bring up some complexities that must be +carefully managed. Specifically, the "isa/cast/dyncast" operators generally +shouldn't be used in code that is inspecting the AST. For example, when type +checking the indirection operator (unary '*' on a pointer), the type checker +must verify that the operand has a pointer type. It would not be correct to +check that with "isa<PointerType>(SubExpr->getType())", +because this predicate would fail if the subexpression had a typedef type.
+ +The solution to this problem are a set of helper methods on Type, used to +check their properties. In this case, it would be correct to use +"SubExpr->getType()->isPointerType()" to do the check. This +predicate will return true if the canonical type is a pointer, which is +true any time the type is structurally a pointer type. The only hard part here +is remembering not to use the isa/cast/dyncast operations.
+ +The second problem we face is how to get access to the pointer type once we +know it exists. To continue the example, the result type of the indirection +operator is the pointee type of the subexpression. In order to determine the +type, we need to get the instance of PointerType that best captures the typedef +information in the program. If the type of the expression is literally a +PointerType, we can return that, otherwise we have to dig through the +typedefs to find the pointer type. For example, if the subexpression had type +"foo*", we could return that type as the result. If the subexpression +had type "bar", we want to return "foo*" (note that we do +not want "int*"). In order to provide all of this, Type has +a getIfPointerType() method that checks whether the type is structurally a +PointerType and, if so, returns the best one. If not, it returns a null +pointer.
+ +This structure is somewhat mystical, but after meditating on it, it will +make sense to you :).