From 05917fa60041518d5dc30eb4301fec8483dadb17 Mon Sep 17 00:00:00 2001 From: Eric Christopher Date: Mon, 8 Dec 2014 18:00:47 +0000 Subject: [PATCH] Add Chapter 8 to the Kaleidoscope tutorial. This chapter adds a description of how to add debug information using DWARF and DIBuilder to the language. Thanks to David Blaikie for his assistance with this tutorial. llvm-svn: 223671 --- llvm/docs/tutorial/LangImpl8.rst | 612 ++++--- llvm/docs/tutorial/LangImpl9.rst | 267 +++ .../Kaleidoscope/Chapter8/CMakeLists.txt | 17 + llvm/examples/Kaleidoscope/Chapter8/Makefile | 16 + llvm/examples/Kaleidoscope/Chapter8/toy.cpp | 1493 +++++++++++++++++ llvm/examples/Kaleidoscope/Makefile | 2 +- 6 files changed, 2179 insertions(+), 228 deletions(-) create mode 100644 llvm/docs/tutorial/LangImpl9.rst create mode 100644 llvm/examples/Kaleidoscope/Chapter8/CMakeLists.txt create mode 100644 llvm/examples/Kaleidoscope/Chapter8/Makefile create mode 100644 llvm/examples/Kaleidoscope/Chapter8/toy.cpp diff --git a/llvm/docs/tutorial/LangImpl8.rst b/llvm/docs/tutorial/LangImpl8.rst index 6f694931ef86..931ff5aa7d4c 100644 --- a/llvm/docs/tutorial/LangImpl8.rst +++ b/llvm/docs/tutorial/LangImpl8.rst @@ -1,267 +1,425 @@ -====================================================== -Kaleidoscope: Conclusion and other useful LLVM tidbits -====================================================== +======================================================= +Kaleidoscope: Extending the Language: Debug Information +======================================================= .. contents:: :local: -Tutorial Conclusion -=================== +Chapter 8 Introduction +====================== -Welcome to the final chapter of the "`Implementing a language with -LLVM `_" tutorial. In the course of this tutorial, we have -grown our little Kaleidoscope language from being a useless toy, to -being a semi-interesting (but probably still useless) toy. :) +Welcome to Chapter 8 of the "`Implementing a language with +LLVM `_" tutorial. In chapters 1 through 7, we've built a +decent little programming language with functions and variables. +What happens if something goes wrong though, how do you debug your +program? -It is interesting to see how far we've come, and how little code it has -taken. We built the entire lexer, parser, AST, code generator, and an -interactive run-loop (with a JIT!) by-hand in under 700 lines of -(non-comment/non-blank) code. +Source level debugging uses formatted data that helps a debugger +translate from binary and the state of the machine back to the +source that the programmer wrote. In LLVM we generally use a format +called `DWARF `_. DWARF is a compact encoding +that represents types, source locations, and variable locations. -Our little language supports a couple of interesting features: it -supports user defined binary and unary operators, it uses JIT -compilation for immediate evaluation, and it supports a few control flow -constructs with SSA construction. +The short summary of this chapter is that we'll go through the +various things you have to add to a programming language to +support debug info, and how you translate that into DWARF. -Part of the idea of this tutorial was to show you how easy and fun it -can be to define, build, and play with languages. Building a compiler -need not be a scary or mystical process! Now that you've seen some of -the basics, I strongly encourage you to take the code and hack on it. -For example, try adding: +Caveat: For now we can't debug via the JIT, so we'll need to compile +our program down to something small and standalone. As part of this +we'll make a few modifications to the running of the language and +how programs are compiled. This means that we'll have a source file +with a simple program written in Kaleidoscope rather than the +interactive JIT. It does involve a limitation that we can only +have one "top level" command at a time to reduce the number of +changes necessary. -- **global variables** - While global variables have questional value - in modern software engineering, they are often useful when putting - together quick little hacks like the Kaleidoscope compiler itself. - Fortunately, our current setup makes it very easy to add global - variables: just have value lookup check to see if an unresolved - variable is in the global variable symbol table before rejecting it. - To create a new global variable, make an instance of the LLVM - ``GlobalVariable`` class. -- **typed variables** - Kaleidoscope currently only supports variables - of type double. This gives the language a very nice elegance, because - only supporting one type means that you never have to specify types. - Different languages have different ways of handling this. The easiest - way is to require the user to specify types for every variable - definition, and record the type of the variable in the symbol table - along with its Value\*. -- **arrays, structs, vectors, etc** - Once you add types, you can start - extending the type system in all sorts of interesting ways. Simple - arrays are very easy and are quite useful for many different - applications. Adding them is mostly an exercise in learning how the - LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction - works: it is so nifty/unconventional, it `has its own - FAQ <../GetElementPtr.html>`_! If you add support for recursive types - (e.g. linked lists), make sure to read the `section in the LLVM - Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that - describes how to construct them. -- **standard runtime** - Our current language allows the user to access - arbitrary external functions, and we use it for things like "printd" - and "putchard". As you extend the language to add higher-level - constructs, often these constructs make the most sense if they are - lowered to calls into a language-supplied runtime. For example, if - you add hash tables to the language, it would probably make sense to - add the routines to a runtime, instead of inlining them all the way. -- **memory management** - Currently we can only access the stack in - Kaleidoscope. It would also be useful to be able to allocate heap - memory, either with calls to the standard libc malloc/free interface - or with a garbage collector. If you would like to use garbage - collection, note that LLVM fully supports `Accurate Garbage - Collection <../GarbageCollection.html>`_ including algorithms that - move objects and need to scan/update the stack. -- **debugger support** - LLVM supports generation of `DWARF Debug - info <../SourceLevelDebugging.html>`_ which is understood by common - debuggers like GDB. Adding support for debug info is fairly - straightforward. The best way to understand it is to compile some - C/C++ code with "``clang -g -O0``" and taking a look at what it - produces. -- **exception handling support** - LLVM supports generation of `zero - cost exceptions <../ExceptionHandling.html>`_ which interoperate with - code compiled in other languages. You could also generate code by - implicitly making every function return an error value and checking - it. You could also make explicit use of setjmp/longjmp. There are - many different ways to go here. -- **object orientation, generics, database access, complex numbers, - geometric programming, ...** - Really, there is no end of crazy - features that you can add to the language. -- **unusual domains** - We've been talking about applying LLVM to a - domain that many people are interested in: building a compiler for a - specific language. However, there are many other domains that can use - compiler technology that are not typically considered. For example, - LLVM has been used to implement OpenGL graphics acceleration, - translate C++ code to ActionScript, and many other cute and clever - things. Maybe you will be the first to JIT compile a regular - expression interpreter into native code with LLVM? +Here's the sample program we'll be compiling: -Have fun - try doing something crazy and unusual. Building a language -like everyone else always has, is much less fun than trying something a -little crazy or off the wall and seeing how it turns out. If you get -stuck or want to talk about it, feel free to email the `llvmdev mailing -list `_: it has lots -of people who are interested in languages and are often willing to help -out. +.. code-block:: python -Before we end this tutorial, I want to talk about some "tips and tricks" -for generating LLVM IR. These are some of the more subtle things that -may not be obvious, but are very useful if you want to take advantage of -LLVM's capabilities. + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2); -Properties of the LLVM IR -========================= + fib(10) -We have a couple common questions about code in the LLVM IR form - lets -just get these out of the way right now, shall we? -Target Independence -------------------- +Why is this a hard problem? +=========================== -Kaleidoscope is an example of a "portable language": any program written -in Kaleidoscope will work the same way on any target that it runs on. -Many other languages have this property, e.g. lisp, java, haskell, -javascript, python, etc (note that while these languages are portable, -not all their libraries are). +Debug information is a hard problem for a few different reasons - mostly +centered around optimized code. First, optimization makes keeping source +locations more difficult. In LLVM IR we keep the original source location +for each IR level instruction on the instruction. Optimization passes +should keep the source locations for newly created instructions, but merged +instructions only get to keep a single location - this can cause jumping +around when stepping through optimized programs. Secondly, optimization +can move variables in ways that are either optimized out, shared in memory +with other variables, or difficult to track. For the purposes of this +tutorial we're going to avoid optimization (as you'll see with one of the +next sets of patches). -One nice aspect of LLVM is that it is often capable of preserving target -independence in the IR: you can take the LLVM IR for a -Kaleidoscope-compiled program and run it on any target that LLVM -supports, even emitting C code and compiling that on targets that LLVM -doesn't support natively. You can trivially tell that the Kaleidoscope -compiler generates target-independent code because it never queries for -any target-specific information when generating code. +Ahead-of-Time Compilation Mode +============================== -The fact that LLVM provides a compact, target-independent, -representation for code gets a lot of people excited. Unfortunately, -these people are usually thinking about C or a language from the C -family when they are asking questions about language portability. I say -"unfortunately", because there is really no way to make (fully general) -C code portable, other than shipping the source code around (and of -course, C source code is not actually portable in general either - ever -port a really old application from 32- to 64-bits?). +To highlight only the aspects of adding debug information to a source +language without needing to worry about the complexities of JIT debugging +we're going to make a few changes to Kaleidoscope to support compiling +the IR emitted by the front end into a simple standalone program that +you can execute, debug, and see results. -The problem with C (again, in its full generality) is that it is heavily -laden with target specific assumptions. As one simple example, the -preprocessor often destructively removes target-independence from the -code when it processes the input text: +First we make our anonymous function that contains our top level +statement be our "main": -.. code-block:: c +.. code-block:: udiff - #ifdef __i386__ - int X = 1; - #else - int X = 42; - #endif +- PrototypeAST *Proto = new PrototypeAST("", std::vector()); ++ PrototypeAST *Proto = new PrototypeAST("main", std::vector()); -While it is possible to engineer more and more complex solutions to -problems like this, it cannot be solved in full generality in a way that -is better than shipping the actual source code. +just with the simple change of giving it a name. -That said, there are interesting subsets of C that can be made portable. -If you are willing to fix primitive types to a fixed size (say int = -32-bits, and long = 64-bits), don't care about ABI compatibility with -existing binaries, and are willing to give up some other minor features, -you can have portable code. This can make sense for specialized domains -such as an in-kernel language. +Then we're going to remove the command line code wherever it exists: -Safety Guarantees ------------------ +.. code-block:: udiff -Many of the languages above are also "safe" languages: it is impossible -for a program written in Java to corrupt its address space and crash the -process (assuming the JVM has no bugs). Safety is an interesting -property that requires a combination of language design, runtime -support, and often operating system support. +@@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() { + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { +- fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: + return; +@@ -1184,7 +1183,6 @@ int main() { + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. +- fprintf(stderr, "ready> "); + getNextToken(); + +Lastly we're going to disable all of the optimization passes and the JIT so +that the only thing that happens after we're done parsing and generating +code is that the llvm IR goes to standard error: -It is certainly possible to implement a safe language in LLVM, but LLVM -IR does not itself guarantee safety. The LLVM IR allows unsafe pointer -casts, use after free bugs, buffer over-runs, and a variety of other -problems. Safety needs to be implemented as a layer on top of LLVM and, -conveniently, several groups have investigated this. Ask on the `llvmdev -mailing list `_ if -you are interested in more details. +.. code-block:: udiff -Language-Specific Optimizations -------------------------------- +@@ -1108,17 +1108,8 @@ static void HandleExtern() { + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (FunctionAST *F = ParseTopLevelExpr()) { +- if (Function *LF = F->Codegen()) { +- // We're just doing this to make sure it executes. +- TheExecutionEngine->finalizeObject(); +- // JIT the function, returning a function pointer. +- void *FPtr = TheExecutionEngine->getPointerToFunction(LF); +- +- // Cast it to the right type (takes no arguments, returns a double) so we +- // can call it as a native function. +- double (*FP)() = (double (*)())(intptr_t)FPtr; +- // Ignore the return value for this. +- (void)FP; ++ if (!F->Codegen()) { ++ fprintf(stderr, "Error generating code for top level expr"); + } + } else { + // Skip token for error recovery. +@@ -1439,11 +1459,11 @@ int main() { + // target lays out data structures. + TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); + OurFPM.add(new DataLayoutPass()); ++#if 0 + OurFPM.add(createBasicAliasAnalysisPass()); + // Promote allocas to registers. + OurFPM.add(createPromoteMemoryToRegisterPass()); +@@ -1218,7 +1210,7 @@ int main() { + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); +- ++ #endif + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. -One thing about LLVM that turns off many people is that it does not -solve all the world's problems in one system (sorry 'world hunger', -someone else will have to solve you some other day). One specific -complaint is that people perceive LLVM as being incapable of performing -high-level language-specific optimization: LLVM "loses too much -information". +This relatively small set of changes get us to the point that we can compile +our piece of Kaleidoscope language down to an executable program via this +command line: -Unfortunately, this is really not the place to give you a full and -unified version of "Chris Lattner's theory of compiler design". Instead, -I'll make a few observations: +.. code-block:: bash -First, you're right that LLVM does lose information. For example, as of -this writing, there is no way to distinguish in the LLVM IR whether an -SSA-value came from a C "int" or a C "long" on an ILP32 machine (other -than debug info). Both get compiled down to an 'i32' value and the -information about what it came from is lost. The more general issue -here, is that the LLVM type system uses "structural equivalence" instead -of "name equivalence". Another place this surprises people is if you -have two types in a high-level language that have the same structure -(e.g. two different structs that have a single int field): these types -will compile down into a single LLVM type and it will be impossible to -tell what it came from. +Kaleidoscope-Ch8 < fib.ks | & clang -x ir - -Second, while LLVM does lose information, LLVM is not a fixed target: we -continue to enhance and improve it in many different ways. In addition -to adding new features (LLVM did not always support exceptions or debug -info), we also extend the IR to capture important information for -optimization (e.g. whether an argument is sign or zero extended, -information about pointers aliasing, etc). Many of the enhancements are -user-driven: people want LLVM to include some specific feature, so they -go ahead and extend it. +which gives an a.out/a.exe in the current working directory. -Third, it is *possible and easy* to add language-specific optimizations, -and you have a number of choices in how to do it. As one trivial -example, it is easy to add language-specific optimization passes that -"know" things about code compiled for a language. In the case of the C -family, there is an optimization pass that "knows" about the standard C -library functions. If you call "exit(0)" in main(), it knows that it is -safe to optimize that into "return 0;" because C specifies what the -'exit' function does. +Compile Unit +============ -In addition to simple library knowledge, it is possible to embed a -variety of other language-specific information into the LLVM IR. If you -have a specific need and run into a wall, please bring the topic up on -the llvmdev list. At the very worst, you can always treat LLVM as if it -were a "dumb code generator" and implement the high-level optimizations -you desire in your front-end, on the language-specific AST. +The top level container for a section of code in DWARF is a compile unit. +This contains the type and function data for an individual translation unit +(read: one file of source code). So the first thing we need to do is +construct one for our fib.ks file. -Tips and Tricks -=============== +DWARF Emission Setup +==================== -There is a variety of useful tips and tricks that you come to know after -working on/with LLVM that aren't obvious at first glance. Instead of -letting everyone rediscover them, this section talks about some of these -issues. +Similar to the ``IRBuilder`` class we have a +```DIBuilder`` `_ class +that helps in constructing debug metadata for an llvm IR file. It +corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names. +Using it does require that you be more familiar with DWARF terminology than +you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you +read through the general documentation on the +```Metadata Format`` `_ it +should be a little more clear. We'll be using this class to construct all +of our IR level descriptions. Construction for it takes a module so we +need to construct it shortly after we construct our module. We've left it +as a global static variable to make it a bit easier to use. -Implementing portable offsetof/sizeof -------------------------------------- +Next we're going to create a small container to cache some of our frequent +data. The first will be our compile unit, but we'll also write a bit of +code for our one type since we won't have to worry about multiple typed +expressions: -One interesting thing that comes up, if you are trying to keep the code -generated by your compiler "target independent", is that you often need -to know the size of some LLVM type or the offset of some field in an -llvm structure. For example, you might need to pass the size of a type -into a function that allocates memory. +.. code-block:: c++ -Unfortunately, this can vary widely across targets: for example the -width of a pointer is trivially target-specific. However, there is a -`clever way to use the getelementptr -instruction `_ -that allows you to compute this in a portable way. + static DIBuilder *DBuilder; -Garbage Collected Stack Frames ------------------------------- + struct DebugInfo { + DICompileUnit TheCU; + DIType DblTy; -Some languages want to explicitly manage their stack frames, often so -that they are garbage collected or to allow easy implementation of -closures. There are often better ways to implement these features than -explicit stack frames, but `LLVM does support -them, `_ -if you want. It requires your front-end to convert the code into -`Continuation Passing -Style `_ and -the use of tail calls (which LLVM also supports). + DIType getDoubleTy(); + } KSDbgInfo; + + DIType DebugInfo::getDoubleTy() { + if (DblTy.isValid()) + return DblTy; + + DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float); + return DblTy; + } + +And then later on in ``main`` when we're constructing our module: + +.. code-block:: c++ + + DBuilder = new DIBuilder(*TheModule); + + KSDbgInfo.TheCU = DBuilder->createCompileUnit( + dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0); + +There are a couple of things to note here. First, while we're producing a +compile unit for a language called Kaleidoscope we used the language +constant for C. This is because a debugger wouldn't necessarily understand +the calling conventions or default ABI for a language it doesn't recognize +and we follow the C ABI in our llvm code generation so it's the closest +thing to accurate. This ensures we can actually call functions from the +debugger and have them execute. Secondly, you'll see the "fib.ks" in the +call to ``createCompileUnit``. This is a default hard coded value since +we're using shell redirection to put our source into the Kaleidoscope +compiler. In a usual front end you'd have an input file name and it would +go there. + +One last thing as part of emitting debug information via DIBuilder is that +we need to "finalize" the debug information. The reasons are part of the +underlying API for DIBuilder, but make sure you do this near the end of +main: + +.. code-block:: c++ + + DBuilder->finalize(); + +before you dump out the module. + +Functions +========= + +Now that we have our ``Compile Unit`` and our source locations, we can add +function definitions to the debug info. So in ``PrototypeAST::Codegen`` we +add a few lines of code to describe a context for our subprogram, in this +case the "File", and the actual definition of the function itself. + +So the context: + +.. code-block:: c++ + + DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), + KSDbgInfo.TheCU.getDirectory()); + +giving us a DIFile and asking the ``Compile Unit`` we created above for the +directory and filename where we are currently. Then, for now, we use some +source locations of 0 (since our AST doesn't currently have source location +information) and construct our function definition: + +.. code-block:: c++ + + DIDescriptor FContext(Unit); + unsigned LineNo = 0; + unsigned ScopeLine = 0; + DISubprogram SP = DBuilder->createFunction( + FContext, Name, StringRef(), Unit, LineNo, + CreateFunctionType(Args.size(), Unit), false /* internal linkage */, + true /* definition */, ScopeLine, DIDescriptor::FlagPrototyped, false, F); + +and we now have a DISubprogram that contains a reference to all of our metadata +for the function. + +Source Locations +================ + +The most important thing for debug information is accurate source location - +this makes it possible to map your source code back. We have a problem though, +Kaleidoscope really doesn't have any source location information in the lexer +or parser so we'll need to add it. + +.. code-block:: c++ + + struct SourceLocation { + int Line; + int Col; + }; + static SourceLocation CurLoc; + static SourceLocation LexLoc = {1, 0}; + + static int advance() { + int LastChar = getchar(); + + if (LastChar == '\n' || LastChar == '\r') { + LexLoc.Line++; + LexLoc.Col = 0; + } else + LexLoc.Col++; + return LastChar; + } + +In this set of code we've added some functionality on how to keep track of the +line and column of the "source file". As we lex every token we set our current +current "lexical location" to the assorted line and column for the beginning +of the token. We do this by overriding all of the previous calls to +``getchar()`` with our new ``advance()`` that keeps track of the information +and then we have added to all of our AST classes a source location: + +.. code-block:: c++ + + class ExprAST { + SourceLocation Loc; + + public: + int getLine() const { return Loc.Line; } + int getCol() const { return Loc.Col; } + ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} + virtual std::ostream &dump(std::ostream &out, int ind) { + return out << ':' << getLine() << ':' << getCol() << '\n'; + } + +that we pass down through when we create a new expression: + +.. code-block:: c++ + + LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS); + +giving us locations for each of our expressions and variables. + +From this we can make sure to tell ``DIBuilder`` when we're at a new source +location so it can use that when we generate the rest of our code and make +sure that each instruction has source location information. We do this +by constructing another small function: + +.. code-block:: c++ + + void DebugInfo::emitLocation(ExprAST *AST) { + DIScope *Scope; + if (LexicalBlocks.empty()) + Scope = &TheCU; + else + Scope = LexicalBlocks.back(); + Builder.SetCurrentDebugLocation( + DebugLoc::get(AST->getLine(), AST->getCol(), DIScope(*Scope))); + } + +that both tells the main ``IRBuilder`` where we are, but also what scope +we're in. Since we've just created a function above we can either be in +the main file scope (like when we created our function), or now we can be +in the function scope we just created. To represent this we create a stack +of scopes: + +.. code-block:: c++ + + std::vector LexicalBlocks; + std::map FnScopeMap; + +and keep a map of each function to the scope that it represents (a DISubprogram +is also a DIScope). + +Then we make sure to: + +.. code-block:: c++ + + KSDbgInfo.emitLocation(this); + +emit the location every time we start to generate code for a new AST, and +also: + +.. code-block:: c++ + + KSDbgInfo.FnScopeMap[this] = SP; + +store the scope (function) when we create it and use it: + + KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]); + +when we start generating the code for each function. + +One interesting thing to note at this point is that various debuggers have +assumptions based on how code and debug information was generated for them +in the past. In this case we need to do a little bit of a hack to avoid +generating line information for the function prologue so that the debugger +knows to skip over those instructions when setting a breakpoint. So in +``FunctionAST::CodeGen`` we add a couple of lines: + +.. code-block:: c++ + + // Unset the location for the prologue emission (leading instructions with no + // location in a function are considered part of the prologue and the debugger + // will run past them when breaking on a function) + KSDbgInfo.emitLocation(nullptr); + +and then emit a new location when we actually start generating code for the +body of the function: + +.. code-block:: c++ + + KSDbgInfo.emitLocation(Body); + +also, don't forget to pop the scope back off of your scope stack at the +end of the code generation for the function: + +.. code-block:: c++ + + // Pop off the lexical block for the function since we added it + // unconditionally. + KSDbgInfo.LexicalBlocks.pop_back(); + + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +debug information. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core jit native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp + :language: c++ + +`Next: Conclusion and other useful LLVM tidbits `_ diff --git a/llvm/docs/tutorial/LangImpl9.rst b/llvm/docs/tutorial/LangImpl9.rst new file mode 100644 index 000000000000..6f694931ef86 --- /dev/null +++ b/llvm/docs/tutorial/LangImpl9.rst @@ -0,0 +1,267 @@ +====================================================== +Kaleidoscope: Conclusion and other useful LLVM tidbits +====================================================== + +.. contents:: + :local: + +Tutorial Conclusion +=================== + +Welcome to the final chapter of the "`Implementing a language with +LLVM `_" tutorial. In the course of this tutorial, we have +grown our little Kaleidoscope language from being a useless toy, to +being a semi-interesting (but probably still useless) toy. :) + +It is interesting to see how far we've come, and how little code it has +taken. We built the entire lexer, parser, AST, code generator, and an +interactive run-loop (with a JIT!) by-hand in under 700 lines of +(non-comment/non-blank) code. + +Our little language supports a couple of interesting features: it +supports user defined binary and unary operators, it uses JIT +compilation for immediate evaluation, and it supports a few control flow +constructs with SSA construction. + +Part of the idea of this tutorial was to show you how easy and fun it +can be to define, build, and play with languages. Building a compiler +need not be a scary or mystical process! Now that you've seen some of +the basics, I strongly encourage you to take the code and hack on it. +For example, try adding: + +- **global variables** - While global variables have questional value + in modern software engineering, they are often useful when putting + together quick little hacks like the Kaleidoscope compiler itself. + Fortunately, our current setup makes it very easy to add global + variables: just have value lookup check to see if an unresolved + variable is in the global variable symbol table before rejecting it. + To create a new global variable, make an instance of the LLVM + ``GlobalVariable`` class. +- **typed variables** - Kaleidoscope currently only supports variables + of type double. This gives the language a very nice elegance, because + only supporting one type means that you never have to specify types. + Different languages have different ways of handling this. The easiest + way is to require the user to specify types for every variable + definition, and record the type of the variable in the symbol table + along with its Value\*. +- **arrays, structs, vectors, etc** - Once you add types, you can start + extending the type system in all sorts of interesting ways. Simple + arrays are very easy and are quite useful for many different + applications. Adding them is mostly an exercise in learning how the + LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction + works: it is so nifty/unconventional, it `has its own + FAQ <../GetElementPtr.html>`_! If you add support for recursive types + (e.g. linked lists), make sure to read the `section in the LLVM + Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that + describes how to construct them. +- **standard runtime** - Our current language allows the user to access + arbitrary external functions, and we use it for things like "printd" + and "putchard". As you extend the language to add higher-level + constructs, often these constructs make the most sense if they are + lowered to calls into a language-supplied runtime. For example, if + you add hash tables to the language, it would probably make sense to + add the routines to a runtime, instead of inlining them all the way. +- **memory management** - Currently we can only access the stack in + Kaleidoscope. It would also be useful to be able to allocate heap + memory, either with calls to the standard libc malloc/free interface + or with a garbage collector. If you would like to use garbage + collection, note that LLVM fully supports `Accurate Garbage + Collection <../GarbageCollection.html>`_ including algorithms that + move objects and need to scan/update the stack. +- **debugger support** - LLVM supports generation of `DWARF Debug + info <../SourceLevelDebugging.html>`_ which is understood by common + debuggers like GDB. Adding support for debug info is fairly + straightforward. The best way to understand it is to compile some + C/C++ code with "``clang -g -O0``" and taking a look at what it + produces. +- **exception handling support** - LLVM supports generation of `zero + cost exceptions <../ExceptionHandling.html>`_ which interoperate with + code compiled in other languages. You could also generate code by + implicitly making every function return an error value and checking + it. You could also make explicit use of setjmp/longjmp. There are + many different ways to go here. +- **object orientation, generics, database access, complex numbers, + geometric programming, ...** - Really, there is no end of crazy + features that you can add to the language. +- **unusual domains** - We've been talking about applying LLVM to a + domain that many people are interested in: building a compiler for a + specific language. However, there are many other domains that can use + compiler technology that are not typically considered. For example, + LLVM has been used to implement OpenGL graphics acceleration, + translate C++ code to ActionScript, and many other cute and clever + things. Maybe you will be the first to JIT compile a regular + expression interpreter into native code with LLVM? + +Have fun - try doing something crazy and unusual. Building a language +like everyone else always has, is much less fun than trying something a +little crazy or off the wall and seeing how it turns out. If you get +stuck or want to talk about it, feel free to email the `llvmdev mailing +list `_: it has lots +of people who are interested in languages and are often willing to help +out. + +Before we end this tutorial, I want to talk about some "tips and tricks" +for generating LLVM IR. These are some of the more subtle things that +may not be obvious, but are very useful if you want to take advantage of +LLVM's capabilities. + +Properties of the LLVM IR +========================= + +We have a couple common questions about code in the LLVM IR form - lets +just get these out of the way right now, shall we? + +Target Independence +------------------- + +Kaleidoscope is an example of a "portable language": any program written +in Kaleidoscope will work the same way on any target that it runs on. +Many other languages have this property, e.g. lisp, java, haskell, +javascript, python, etc (note that while these languages are portable, +not all their libraries are). + +One nice aspect of LLVM is that it is often capable of preserving target +independence in the IR: you can take the LLVM IR for a +Kaleidoscope-compiled program and run it on any target that LLVM +supports, even emitting C code and compiling that on targets that LLVM +doesn't support natively. You can trivially tell that the Kaleidoscope +compiler generates target-independent code because it never queries for +any target-specific information when generating code. + +The fact that LLVM provides a compact, target-independent, +representation for code gets a lot of people excited. Unfortunately, +these people are usually thinking about C or a language from the C +family when they are asking questions about language portability. I say +"unfortunately", because there is really no way to make (fully general) +C code portable, other than shipping the source code around (and of +course, C source code is not actually portable in general either - ever +port a really old application from 32- to 64-bits?). + +The problem with C (again, in its full generality) is that it is heavily +laden with target specific assumptions. As one simple example, the +preprocessor often destructively removes target-independence from the +code when it processes the input text: + +.. code-block:: c + + #ifdef __i386__ + int X = 1; + #else + int X = 42; + #endif + +While it is possible to engineer more and more complex solutions to +problems like this, it cannot be solved in full generality in a way that +is better than shipping the actual source code. + +That said, there are interesting subsets of C that can be made portable. +If you are willing to fix primitive types to a fixed size (say int = +32-bits, and long = 64-bits), don't care about ABI compatibility with +existing binaries, and are willing to give up some other minor features, +you can have portable code. This can make sense for specialized domains +such as an in-kernel language. + +Safety Guarantees +----------------- + +Many of the languages above are also "safe" languages: it is impossible +for a program written in Java to corrupt its address space and crash the +process (assuming the JVM has no bugs). Safety is an interesting +property that requires a combination of language design, runtime +support, and often operating system support. + +It is certainly possible to implement a safe language in LLVM, but LLVM +IR does not itself guarantee safety. The LLVM IR allows unsafe pointer +casts, use after free bugs, buffer over-runs, and a variety of other +problems. Safety needs to be implemented as a layer on top of LLVM and, +conveniently, several groups have investigated this. Ask on the `llvmdev +mailing list `_ if +you are interested in more details. + +Language-Specific Optimizations +------------------------------- + +One thing about LLVM that turns off many people is that it does not +solve all the world's problems in one system (sorry 'world hunger', +someone else will have to solve you some other day). One specific +complaint is that people perceive LLVM as being incapable of performing +high-level language-specific optimization: LLVM "loses too much +information". + +Unfortunately, this is really not the place to give you a full and +unified version of "Chris Lattner's theory of compiler design". Instead, +I'll make a few observations: + +First, you're right that LLVM does lose information. For example, as of +this writing, there is no way to distinguish in the LLVM IR whether an +SSA-value came from a C "int" or a C "long" on an ILP32 machine (other +than debug info). Both get compiled down to an 'i32' value and the +information about what it came from is lost. The more general issue +here, is that the LLVM type system uses "structural equivalence" instead +of "name equivalence". Another place this surprises people is if you +have two types in a high-level language that have the same structure +(e.g. two different structs that have a single int field): these types +will compile down into a single LLVM type and it will be impossible to +tell what it came from. + +Second, while LLVM does lose information, LLVM is not a fixed target: we +continue to enhance and improve it in many different ways. In addition +to adding new features (LLVM did not always support exceptions or debug +info), we also extend the IR to capture important information for +optimization (e.g. whether an argument is sign or zero extended, +information about pointers aliasing, etc). Many of the enhancements are +user-driven: people want LLVM to include some specific feature, so they +go ahead and extend it. + +Third, it is *possible and easy* to add language-specific optimizations, +and you have a number of choices in how to do it. As one trivial +example, it is easy to add language-specific optimization passes that +"know" things about code compiled for a language. In the case of the C +family, there is an optimization pass that "knows" about the standard C +library functions. If you call "exit(0)" in main(), it knows that it is +safe to optimize that into "return 0;" because C specifies what the +'exit' function does. + +In addition to simple library knowledge, it is possible to embed a +variety of other language-specific information into the LLVM IR. If you +have a specific need and run into a wall, please bring the topic up on +the llvmdev list. At the very worst, you can always treat LLVM as if it +were a "dumb code generator" and implement the high-level optimizations +you desire in your front-end, on the language-specific AST. + +Tips and Tricks +=============== + +There is a variety of useful tips and tricks that you come to know after +working on/with LLVM that aren't obvious at first glance. Instead of +letting everyone rediscover them, this section talks about some of these +issues. + +Implementing portable offsetof/sizeof +------------------------------------- + +One interesting thing that comes up, if you are trying to keep the code +generated by your compiler "target independent", is that you often need +to know the size of some LLVM type or the offset of some field in an +llvm structure. For example, you might need to pass the size of a type +into a function that allocates memory. + +Unfortunately, this can vary widely across targets: for example the +width of a pointer is trivially target-specific. However, there is a +`clever way to use the getelementptr +instruction `_ +that allows you to compute this in a portable way. + +Garbage Collected Stack Frames +------------------------------ + +Some languages want to explicitly manage their stack frames, often so +that they are garbage collected or to allow easy implementation of +closures. There are often better ways to implement these features than +explicit stack frames, but `LLVM does support +them, `_ +if you want. It requires your front-end to convert the code into +`Continuation Passing +Style `_ and +the use of tail calls (which LLVM also supports). + diff --git a/llvm/examples/Kaleidoscope/Chapter8/CMakeLists.txt b/llvm/examples/Kaleidoscope/Chapter8/CMakeLists.txt new file mode 100644 index 000000000000..053916f441d9 --- /dev/null +++ b/llvm/examples/Kaleidoscope/Chapter8/CMakeLists.txt @@ -0,0 +1,17 @@ +set(LLVM_LINK_COMPONENTS + Analysis + Core + ExecutionEngine + InstCombine + MC + ScalarOpts + Support + TransformUtils + nativecodegen + ) + +set(LLVM_REQUIRES_RTTI 1) + +add_llvm_example(Kaleidoscope-Ch8 + toy.cpp + ) diff --git a/llvm/examples/Kaleidoscope/Chapter8/Makefile b/llvm/examples/Kaleidoscope/Chapter8/Makefile new file mode 100644 index 000000000000..8e4d42211786 --- /dev/null +++ b/llvm/examples/Kaleidoscope/Chapter8/Makefile @@ -0,0 +1,16 @@ +##===- examples/Kaleidoscope/Chapter7/Makefile -------------*- Makefile -*-===## +# +# The LLVM Compiler Infrastructure +# +# This file is distributed under the University of Illinois Open Source +# License. See LICENSE.TXT for details. +# +##===----------------------------------------------------------------------===## +LEVEL = ../../.. +TOOLNAME = Kaleidoscope-Ch8 +EXAMPLE_TOOL = 1 +REQUIRES_RTTI := 1 + +LINK_COMPONENTS := core mcjit native + +include $(LEVEL)/Makefile.common diff --git a/llvm/examples/Kaleidoscope/Chapter8/toy.cpp b/llvm/examples/Kaleidoscope/Chapter8/toy.cpp new file mode 100644 index 000000000000..814ca61ccd67 --- /dev/null +++ b/llvm/examples/Kaleidoscope/Chapter8/toy.cpp @@ -0,0 +1,1493 @@ +#include "llvm/ADT/Triple.h" +#include "llvm/Analysis/Passes.h" +#include "llvm/ExecutionEngine/ExecutionEngine.h" +#include "llvm/ExecutionEngine/MCJIT.h" +#include "llvm/ExecutionEngine/SectionMemoryManager.h" +#include "llvm/IR/DataLayout.h" +#include "llvm/IR/DerivedTypes.h" +#include "llvm/IR/DIBuilder.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/LLVMContext.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/Verifier.h" +#include "llvm/PassManager.h" +#include "llvm/Support/Host.h" +#include "llvm/Support/TargetSelect.h" +#include "llvm/Transforms/Scalar.h" +#include +#include +#include +#include +#include +#include +using namespace llvm; + +//===----------------------------------------------------------------------===// +// Lexer +//===----------------------------------------------------------------------===// + +// The lexer returns tokens [0-255] if it is an unknown character, otherwise one +// of these for known things. +enum Token { + tok_eof = -1, + + // commands + tok_def = -2, + tok_extern = -3, + + // primary + tok_identifier = -4, + tok_number = -5, + + // control + tok_if = -6, + tok_then = -7, + tok_else = -8, + tok_for = -9, + tok_in = -10, + + // operators + tok_binary = -11, + tok_unary = -12, + + // var definition + tok_var = -13 +}; + +std::string getTokName(int Tok) { + switch (Tok) { + case tok_eof: + return "eof"; + case tok_def: + return "def"; + case tok_extern: + return "extern"; + case tok_identifier: + return "identifier"; + case tok_number: + return "number"; + case tok_if: + return "if"; + case tok_then: + return "then"; + case tok_else: + return "else"; + case tok_for: + return "for"; + case tok_in: + return "in"; + case tok_binary: + return "binary"; + case tok_unary: + return "unary"; + case tok_var: + return "var"; + } + return std::string(1, (char)Tok); +} + +namespace { +class PrototypeAST; +class ExprAST; +} +static IRBuilder<> Builder(getGlobalContext()); +struct DebugInfo { + DICompileUnit TheCU; + DIType DblTy; + std::vector LexicalBlocks; + std::map FnScopeMap; + + void emitLocation(ExprAST *AST); + DIType getDoubleTy(); +} KSDbgInfo; + +static std::string IdentifierStr; // Filled in if tok_identifier +static double NumVal; // Filled in if tok_number +struct SourceLocation { + int Line; + int Col; +}; +static SourceLocation CurLoc; +static SourceLocation LexLoc = { 1, 0 }; + +static int advance() { + int LastChar = getchar(); + + if (LastChar == '\n' || LastChar == '\r') { + LexLoc.Line++; + LexLoc.Col = 0; + } else + LexLoc.Col++; + return LastChar; +} + +/// gettok - Return the next token from standard input. +static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = advance(); + + CurLoc = LexLoc; + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = advance()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") + return tok_def; + if (IdentifierStr == "extern") + return tok_extern; + if (IdentifierStr == "if") + return tok_if; + if (IdentifierStr == "then") + return tok_then; + if (IdentifierStr == "else") + return tok_else; + if (IdentifierStr == "for") + return tok_for; + if (IdentifierStr == "in") + return tok_in; + if (IdentifierStr == "binary") + return tok_binary; + if (IdentifierStr == "unary") + return tok_unary; + if (IdentifierStr == "var") + return tok_var; + return tok_identifier; + } + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = advance(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + + if (LastChar == '#') { + // Comment until end of line. + do + LastChar = advance(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = advance(); + return ThisChar; +} + +//===----------------------------------------------------------------------===// +// Abstract Syntax Tree (aka Parse Tree) +//===----------------------------------------------------------------------===// +namespace { + +std::ostream &indent(std::ostream &O, int size) { + return O << std::string(size, ' '); +} + +/// ExprAST - Base class for all expression nodes. +class ExprAST { + SourceLocation Loc; + +public: + int getLine() const { return Loc.Line; } + int getCol() const { return Loc.Col; } + ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} + virtual std::ostream &dump(std::ostream &out, int ind) { + return out << ':' << getLine() << ':' << getCol() << '\n'; + } + virtual ~ExprAST() {} + virtual Value *Codegen() = 0; +}; + +/// NumberExprAST - Expression class for numeric literals like "1.0". +class NumberExprAST : public ExprAST { + double Val; + +public: + NumberExprAST(double val) : Val(val) {} + virtual std::ostream &dump(std::ostream &out, int ind) { + return ExprAST::dump(out << Val, ind); + } + virtual Value *Codegen(); +}; + +/// VariableExprAST - Expression class for referencing a variable, like "a". +class VariableExprAST : public ExprAST { + std::string Name; + +public: + VariableExprAST(SourceLocation Loc, const std::string &name) + : ExprAST(Loc), Name(name) {} + const std::string &getName() const { return Name; } + virtual std::ostream &dump(std::ostream &out, int ind) { + return ExprAST::dump(out << Name, ind); + } + virtual Value *Codegen(); +}; + +/// UnaryExprAST - Expression class for a unary operator. +class UnaryExprAST : public ExprAST { + char Opcode; + ExprAST *Operand; + +public: + UnaryExprAST(char opcode, ExprAST *operand) + : Opcode(opcode), Operand(operand) {} + virtual std::ostream &dump(std::ostream &out, int ind) { + ExprAST::dump(out << "unary" << Opcode, ind); + Operand->dump(out, ind + 1); + return out; + } + virtual Value *Codegen(); +}; + +/// BinaryExprAST - Expression class for a binary operator. +class BinaryExprAST : public ExprAST { + char Op; + ExprAST *LHS, *RHS; + +public: + BinaryExprAST(SourceLocation Loc, char op, ExprAST *lhs, ExprAST *rhs) + : ExprAST(Loc), Op(op), LHS(lhs), RHS(rhs) {} + virtual std::ostream &dump(std::ostream &out, int ind) { + ExprAST::dump(out << "binary" << Op, ind); + LHS->dump(indent(out, ind) << "LHS:", ind + 1); + RHS->dump(indent(out, ind) << "RHS:", ind + 1); + return out; + } + virtual Value *Codegen(); +}; + +/// CallExprAST - Expression class for function calls. +class CallExprAST : public ExprAST { + std::string Callee; + std::vector Args; + +public: + CallExprAST(SourceLocation Loc, const std::string &callee, + std::vector &args) + : ExprAST(Loc), Callee(callee), Args(args) {} + virtual std::ostream &dump(std::ostream &out, int ind) { + ExprAST::dump(out << "call " << Callee, ind); + for (ExprAST *Arg : Args) + Arg->dump(indent(out, ind + 1), ind + 1); + return out; + } + virtual Value *Codegen(); +}; + +/// IfExprAST - Expression class for if/then/else. +class IfExprAST : public ExprAST { + ExprAST *Cond, *Then, *Else; + +public: + IfExprAST(SourceLocation Loc, ExprAST *cond, ExprAST *then, ExprAST *_else) + : ExprAST(Loc), Cond(cond), Then(then), Else(_else) {} + virtual std::ostream &dump(std::ostream &out, int ind) { + ExprAST::dump(out << "if", ind); + Cond->dump(indent(out, ind) << "Cond:", ind + 1); + Then->dump(indent(out, ind) << "Then:", ind + 1); + Else->dump(indent(out, ind) << "Else:", ind + 1); + return out; + } + virtual Value *Codegen(); +}; + +/// ForExprAST - Expression class for for/in. +class ForExprAST : public ExprAST { + std::string VarName; + ExprAST *Start, *End, *Step, *Body; + +public: + ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, + ExprAST *step, ExprAST *body) + : VarName(varname), Start(start), End(end), Step(step), Body(body) {} + virtual std::ostream &dump(std::ostream &out, int ind) { + ExprAST::dump(out << "for", ind); + Start->dump(indent(out, ind) << "Cond:", ind + 1); + End->dump(indent(out, ind) << "End:", ind + 1); + Step->dump(indent(out, ind) << "Step:", ind + 1); + Body->dump(indent(out, ind) << "Body:", ind + 1); + return out; + } + virtual Value *Codegen(); +}; + +/// VarExprAST - Expression class for var/in +class VarExprAST : public ExprAST { + std::vector > VarNames; + ExprAST *Body; + +public: + VarExprAST(const std::vector > &varnames, + ExprAST *body) + : VarNames(varnames), Body(body) {} + + virtual std::ostream &dump(std::ostream &out, int ind) { + ExprAST::dump(out << "var", ind); + for (const auto &NamedVar : VarNames) + NamedVar.second->dump(indent(out, ind) << NamedVar.first << ':', ind + 1); + Body->dump(indent(out, ind) << "Body:", ind + 1); + return out; + } + virtual Value *Codegen(); +}; + +/// PrototypeAST - This class represents the "prototype" for a function, +/// which captures its argument names as well as if it is an operator. +class PrototypeAST { + std::string Name; + std::vector Args; + bool isOperator; + unsigned Precedence; // Precedence if a binary op. + int Line; + +public: + PrototypeAST(SourceLocation Loc, const std::string &name, + const std::vector &args, bool isoperator = false, + unsigned prec = 0) + : Name(name), Args(args), isOperator(isoperator), Precedence(prec), + Line(Loc.Line) {} + + bool isUnaryOp() const { return isOperator && Args.size() == 1; } + bool isBinaryOp() const { return isOperator && Args.size() == 2; } + + char getOperatorName() const { + assert(isUnaryOp() || isBinaryOp()); + return Name[Name.size() - 1]; + } + + unsigned getBinaryPrecedence() const { return Precedence; } + + Function *Codegen(); + + void CreateArgumentAllocas(Function *F); + const std::vector &getArgs() const { return Args; } +}; + +/// FunctionAST - This class represents a function definition itself. +class FunctionAST { + PrototypeAST *Proto; + ExprAST *Body; + +public: + FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {} + + std::ostream &dump(std::ostream &out, int ind) { + indent(out, ind) << "FunctionAST\n"; + ++ind; + indent(out, ind) << "Body:"; + return Body ? Body->dump(out, ind) : out << "null\n"; + } + + Function *Codegen(); +}; +} // end anonymous namespace + +//===----------------------------------------------------------------------===// +// Parser +//===----------------------------------------------------------------------===// + +/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current +/// token the parser is looking at. getNextToken reads another token from the +/// lexer and updates CurTok with its results. +static int CurTok; +static int getNextToken() { return CurTok = gettok(); } + +/// BinopPrecedence - This holds the precedence for each binary operator that is +/// defined. +static std::map BinopPrecedence; + +/// GetTokPrecedence - Get the precedence of the pending binary operator token. +static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) + return -1; + return TokPrec; +} + +/// Error* - These are little helper functions for error handling. +ExprAST *Error(const char *Str) { + fprintf(stderr, "Error: %s\n", Str); + return 0; +} +PrototypeAST *ErrorP(const char *Str) { + Error(Str); + return 0; +} +FunctionAST *ErrorF(const char *Str) { + Error(Str); + return 0; +} + +static ExprAST *ParseExpression(); + +/// identifierexpr +/// ::= identifier +/// ::= identifier '(' expression* ')' +static ExprAST *ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + SourceLocation LitLoc = CurLoc; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return new VariableExprAST(LitLoc, IdName); + + // Call. + getNextToken(); // eat ( + std::vector Args; + if (CurTok != ')') { + while (1) { + ExprAST *Arg = ParseExpression(); + if (!Arg) + return 0; + Args.push_back(Arg); + + if (CurTok == ')') + break; + + if (CurTok != ',') + return Error("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return new CallExprAST(LitLoc, IdName, Args); +} + +/// numberexpr ::= number +static ExprAST *ParseNumberExpr() { + ExprAST *Result = new NumberExprAST(NumVal); + getNextToken(); // consume the number + return Result; +} + +/// parenexpr ::= '(' expression ')' +static ExprAST *ParseParenExpr() { + getNextToken(); // eat (. + ExprAST *V = ParseExpression(); + if (!V) + return 0; + + if (CurTok != ')') + return Error("expected ')'"); + getNextToken(); // eat ). + return V; +} + +/// ifexpr ::= 'if' expression 'then' expression 'else' expression +static ExprAST *ParseIfExpr() { + SourceLocation IfLoc = CurLoc; + + getNextToken(); // eat the if. + + // condition. + ExprAST *Cond = ParseExpression(); + if (!Cond) + return 0; + + if (CurTok != tok_then) + return Error("expected then"); + getNextToken(); // eat the then + + ExprAST *Then = ParseExpression(); + if (Then == 0) + return 0; + + if (CurTok != tok_else) + return Error("expected else"); + + getNextToken(); + + ExprAST *Else = ParseExpression(); + if (!Else) + return 0; + + return new IfExprAST(IfLoc, Cond, Then, Else); +} + +/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression +static ExprAST *ParseForExpr() { + getNextToken(); // eat the for. + + if (CurTok != tok_identifier) + return Error("expected identifier after for"); + + std::string IdName = IdentifierStr; + getNextToken(); // eat identifier. + + if (CurTok != '=') + return Error("expected '=' after for"); + getNextToken(); // eat '='. + + ExprAST *Start = ParseExpression(); + if (Start == 0) + return 0; + if (CurTok != ',') + return Error("expected ',' after for start value"); + getNextToken(); + + ExprAST *End = ParseExpression(); + if (End == 0) + return 0; + + // The step value is optional. + ExprAST *Step = 0; + if (CurTok == ',') { + getNextToken(); + Step = ParseExpression(); + if (Step == 0) + return 0; + } + + if (CurTok != tok_in) + return Error("expected 'in' after for"); + getNextToken(); // eat 'in'. + + ExprAST *Body = ParseExpression(); + if (Body == 0) + return 0; + + return new ForExprAST(IdName, Start, End, Step, Body); +} + +/// varexpr ::= 'var' identifier ('=' expression)? +// (',' identifier ('=' expression)?)* 'in' expression +static ExprAST *ParseVarExpr() { + getNextToken(); // eat the var. + + std::vector > VarNames; + + // At least one variable name is required. + if (CurTok != tok_identifier) + return Error("expected identifier after var"); + + while (1) { + std::string Name = IdentifierStr; + getNextToken(); // eat identifier. + + // Read the optional initializer. + ExprAST *Init = 0; + if (CurTok == '=') { + getNextToken(); // eat the '='. + + Init = ParseExpression(); + if (Init == 0) + return 0; + } + + VarNames.push_back(std::make_pair(Name, Init)); + + // End of var list, exit loop. + if (CurTok != ',') + break; + getNextToken(); // eat the ','. + + if (CurTok != tok_identifier) + return Error("expected identifier list after var"); + } + + // At this point, we have to have 'in'. + if (CurTok != tok_in) + return Error("expected 'in' keyword after 'var'"); + getNextToken(); // eat 'in'. + + ExprAST *Body = ParseExpression(); + if (Body == 0) + return 0; + + return new VarExprAST(VarNames, Body); +} + +/// primary +/// ::= identifierexpr +/// ::= numberexpr +/// ::= parenexpr +/// ::= ifexpr +/// ::= forexpr +/// ::= varexpr +static ExprAST *ParsePrimary() { + switch (CurTok) { + default: + return Error("unknown token when expecting an expression"); + case tok_identifier: + return ParseIdentifierExpr(); + case tok_number: + return ParseNumberExpr(); + case '(': + return ParseParenExpr(); + case tok_if: + return ParseIfExpr(); + case tok_for: + return ParseForExpr(); + case tok_var: + return ParseVarExpr(); + } +} + +/// unary +/// ::= primary +/// ::= '!' unary +static ExprAST *ParseUnary() { + // If the current token is not an operator, it must be a primary expr. + if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') + return ParsePrimary(); + + // If this is a unary operator, read it. + int Opc = CurTok; + getNextToken(); + if (ExprAST *Operand = ParseUnary()) + return new UnaryExprAST(Opc, Operand); + return 0; +} + +/// binoprhs +/// ::= ('+' unary)* +static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + + // Okay, we know this is a binop. + int BinOp = CurTok; + SourceLocation BinLoc = CurLoc; + getNextToken(); // eat binop + + // Parse the unary expression after the binary operator. + ExprAST *RHS = ParseUnary(); + if (!RHS) + return 0; + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec + 1, RHS); + if (RHS == 0) + return 0; + } + + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS); + } +} + +/// expression +/// ::= unary binoprhs +/// +static ExprAST *ParseExpression() { + ExprAST *LHS = ParseUnary(); + if (!LHS) + return 0; + + return ParseBinOpRHS(0, LHS); +} + +/// prototype +/// ::= id '(' id* ')' +/// ::= binary LETTER number? (id, id) +/// ::= unary LETTER (id) +static PrototypeAST *ParsePrototype() { + std::string FnName; + + SourceLocation FnLoc = CurLoc; + + unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. + unsigned BinaryPrecedence = 30; + + switch (CurTok) { + default: + return ErrorP("Expected function name in prototype"); + case tok_identifier: + FnName = IdentifierStr; + Kind = 0; + getNextToken(); + break; + case tok_unary: + getNextToken(); + if (!isascii(CurTok)) + return ErrorP("Expected unary operator"); + FnName = "unary"; + FnName += (char)CurTok; + Kind = 1; + getNextToken(); + break; + case tok_binary: + getNextToken(); + if (!isascii(CurTok)) + return ErrorP("Expected binary operator"); + FnName = "binary"; + FnName += (char)CurTok; + Kind = 2; + getNextToken(); + + // Read the precedence if present. + if (CurTok == tok_number) { + if (NumVal < 1 || NumVal > 100) + return ErrorP("Invalid precedecnce: must be 1..100"); + BinaryPrecedence = (unsigned)NumVal; + getNextToken(); + } + break; + } + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + std::vector ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + // Verify right number of names for operator. + if (Kind && ArgNames.size() != Kind) + return ErrorP("Invalid number of operands for operator"); + + return new PrototypeAST(FnLoc, FnName, ArgNames, Kind != 0, BinaryPrecedence); +} + +/// definition ::= 'def' prototype expression +static FunctionAST *ParseDefinition() { + getNextToken(); // eat def. + PrototypeAST *Proto = ParsePrototype(); + if (Proto == 0) + return 0; + + if (ExprAST *E = ParseExpression()) + return new FunctionAST(Proto, E); + return 0; +} + +/// toplevelexpr ::= expression +static FunctionAST *ParseTopLevelExpr() { + SourceLocation FnLoc = CurLoc; + if (ExprAST *E = ParseExpression()) { + // Make an anonymous proto. + PrototypeAST *Proto = + new PrototypeAST(FnLoc, "main", std::vector()); + return new FunctionAST(Proto, E); + } + return 0; +} + +/// external ::= 'extern' prototype +static PrototypeAST *ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); +} + +//===----------------------------------------------------------------------===// +// Debug Info Support +//===----------------------------------------------------------------------===// + +static DIBuilder *DBuilder; + +DIType DebugInfo::getDoubleTy() { + if (DblTy.isValid()) + return DblTy; + + DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float); + return DblTy; +} + +void DebugInfo::emitLocation(ExprAST *AST) { + if (!AST) + return Builder.SetCurrentDebugLocation(DebugLoc()); + DIScope *Scope; + if (LexicalBlocks.empty()) + Scope = &TheCU; + else + Scope = LexicalBlocks.back(); + Builder.SetCurrentDebugLocation( + DebugLoc::get(AST->getLine(), AST->getCol(), DIScope(*Scope))); +} + +static DICompositeType CreateFunctionType(unsigned NumArgs, DIFile Unit) { + SmallVector EltTys; + DIType DblTy = KSDbgInfo.getDoubleTy(); + + // Add the result type. + EltTys.push_back(DblTy); + + for (unsigned i = 0, e = NumArgs; i != e; ++i) + EltTys.push_back(DblTy); + + DITypeArray EltTypeArray = DBuilder->getOrCreateTypeArray(EltTys); + return DBuilder->createSubroutineType(Unit, EltTypeArray); +} + +//===----------------------------------------------------------------------===// +// Code Generation +//===----------------------------------------------------------------------===// + +static Module *TheModule; +static std::map NamedValues; +static FunctionPassManager *TheFPM; + +Value *ErrorV(const char *Str) { + Error(Str); + return 0; +} + +/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of +/// the function. This is used for mutable variables etc. +static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, + const std::string &VarName) { + IRBuilder<> TmpB(&TheFunction->getEntryBlock(), + TheFunction->getEntryBlock().begin()); + return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, + VarName.c_str()); +} + +Value *NumberExprAST::Codegen() { + KSDbgInfo.emitLocation(this); + return ConstantFP::get(getGlobalContext(), APFloat(Val)); +} + +Value *VariableExprAST::Codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + if (V == 0) + return ErrorV("Unknown variable name"); + + KSDbgInfo.emitLocation(this); + // Load the value. + return Builder.CreateLoad(V, Name.c_str()); +} + +Value *UnaryExprAST::Codegen() { + Value *OperandV = Operand->Codegen(); + if (OperandV == 0) + return 0; + + Function *F = TheModule->getFunction(std::string("unary") + Opcode); + if (F == 0) + return ErrorV("Unknown unary operator"); + + KSDbgInfo.emitLocation(this); + return Builder.CreateCall(F, OperandV, "unop"); +} + +Value *BinaryExprAST::Codegen() { + KSDbgInfo.emitLocation(this); + + // Special case '=' because we don't want to emit the LHS as an expression. + if (Op == '=') { + // Assignment requires the LHS to be an identifier. + VariableExprAST *LHSE = dynamic_cast(LHS); + if (!LHSE) + return ErrorV("destination of '=' must be a variable"); + // Codegen the RHS. + Value *Val = RHS->Codegen(); + if (Val == 0) + return 0; + + // Look up the name. + Value *Variable = NamedValues[LHSE->getName()]; + if (Variable == 0) + return ErrorV("Unknown variable name"); + + Builder.CreateStore(Val, Variable); + return Val; + } + + Value *L = LHS->Codegen(); + Value *R = RHS->Codegen(); + if (L == 0 || R == 0) + return 0; + + switch (Op) { + case '+': + return Builder.CreateFAdd(L, R, "addtmp"); + case '-': + return Builder.CreateFSub(L, R, "subtmp"); + case '*': + return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), + "booltmp"); + default: + break; + } + + // If it wasn't a builtin binary operator, it must be a user defined one. Emit + // a call to it. + Function *F = TheModule->getFunction(std::string("binary") + Op); + assert(F && "binary operator not found!"); + + Value *Ops[] = { L, R }; + return Builder.CreateCall(F, Ops, "binop"); +} + +Value *CallExprAST::Codegen() { + KSDbgInfo.emitLocation(this); + + // Look up the name in the global module table. + Function *CalleeF = TheModule->getFunction(Callee); + if (CalleeF == 0) + return ErrorV("Unknown function referenced"); + + // If argument mismatch error. + if (CalleeF->arg_size() != Args.size()) + return ErrorV("Incorrect # arguments passed"); + + std::vector ArgsV; + for (unsigned i = 0, e = Args.size(); i != e; ++i) { + ArgsV.push_back(Args[i]->Codegen()); + if (ArgsV.back() == 0) + return 0; + } + + return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); +} + +Value *IfExprAST::Codegen() { + KSDbgInfo.emitLocation(this); + + Value *CondV = Cond->Codegen(); + if (CondV == 0) + return 0; + + // Convert condition to a bool by comparing equal to 0.0. + CondV = Builder.CreateFCmpONE( + CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond"); + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create blocks for the then and else cases. Insert the 'then' block at the + // end of the function. + BasicBlock *ThenBB = + BasicBlock::Create(getGlobalContext(), "then", TheFunction); + BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); + BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); + + Builder.CreateCondBr(CondV, ThenBB, ElseBB); + + // Emit then value. + Builder.SetInsertPoint(ThenBB); + + Value *ThenV = Then->Codegen(); + if (ThenV == 0) + return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Then' can change the current block, update ThenBB for the PHI. + ThenBB = Builder.GetInsertBlock(); + + // Emit else block. + TheFunction->getBasicBlockList().push_back(ElseBB); + Builder.SetInsertPoint(ElseBB); + + Value *ElseV = Else->Codegen(); + if (ElseV == 0) + return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Else' can change the current block, update ElseBB for the PHI. + ElseBB = Builder.GetInsertBlock(); + + // Emit merge block. + TheFunction->getBasicBlockList().push_back(MergeBB); + Builder.SetInsertPoint(MergeBB); + PHINode *PN = + Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp"); + + PN->addIncoming(ThenV, ThenBB); + PN->addIncoming(ElseV, ElseBB); + return PN; +} + +Value *ForExprAST::Codegen() { + // Output this as: + // var = alloca double + // ... + // start = startexpr + // store start -> var + // goto loop + // loop: + // ... + // bodyexpr + // ... + // loopend: + // step = stepexpr + // endcond = endexpr + // + // curvar = load var + // nextvar = curvar + step + // store nextvar -> var + // br endcond, loop, endloop + // outloop: + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create an alloca for the variable in the entry block. + AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); + + KSDbgInfo.emitLocation(this); + + // Emit the start code first, without 'variable' in scope. + Value *StartVal = Start->Codegen(); + if (StartVal == 0) + return 0; + + // Store the value into the alloca. + Builder.CreateStore(StartVal, Alloca); + + // Make the new basic block for the loop header, inserting after current + // block. + BasicBlock *LoopBB = + BasicBlock::Create(getGlobalContext(), "loop", TheFunction); + + // Insert an explicit fall through from the current block to the LoopBB. + Builder.CreateBr(LoopBB); + + // Start insertion in LoopBB. + Builder.SetInsertPoint(LoopBB); + + // Within the loop, the variable is defined equal to the PHI node. If it + // shadows an existing variable, we have to restore it, so save it now. + AllocaInst *OldVal = NamedValues[VarName]; + NamedValues[VarName] = Alloca; + + // Emit the body of the loop. This, like any other expr, can change the + // current BB. Note that we ignore the value computed by the body, but don't + // allow an error. + if (Body->Codegen() == 0) + return 0; + + // Emit the step value. + Value *StepVal; + if (Step) { + StepVal = Step->Codegen(); + if (StepVal == 0) + return 0; + } else { + // If not specified, use 1.0. + StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); + } + + // Compute the end condition. + Value *EndCond = End->Codegen(); + if (EndCond == 0) + return EndCond; + + // Reload, increment, and restore the alloca. This handles the case where + // the body of the loop mutates the variable. + Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str()); + Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); + Builder.CreateStore(NextVar, Alloca); + + // Convert condition to a bool by comparing equal to 0.0. + EndCond = Builder.CreateFCmpONE( + EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond"); + + // Create the "after loop" block and insert it. + BasicBlock *AfterBB = + BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); + + // Insert the conditional branch into the end of LoopEndBB. + Builder.CreateCondBr(EndCond, LoopBB, AfterBB); + + // Any new code will be inserted in AfterBB. + Builder.SetInsertPoint(AfterBB); + + // Restore the unshadowed variable. + if (OldVal) + NamedValues[VarName] = OldVal; + else + NamedValues.erase(VarName); + + // for expr always returns 0.0. + return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); +} + +Value *VarExprAST::Codegen() { + std::vector OldBindings; + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Register all variables and emit their initializer. + for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { + const std::string &VarName = VarNames[i].first; + ExprAST *Init = VarNames[i].second; + + // Emit the initializer before adding the variable to scope, this prevents + // the initializer from referencing the variable itself, and permits stuff + // like this: + // var a = 1 in + // var a = a in ... # refers to outer 'a'. + Value *InitVal; + if (Init) { + InitVal = Init->Codegen(); + if (InitVal == 0) + return 0; + } else { // If not specified, use 0.0. + InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); + } + + AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); + Builder.CreateStore(InitVal, Alloca); + + // Remember the old variable binding so that we can restore the binding when + // we unrecurse. + OldBindings.push_back(NamedValues[VarName]); + + // Remember this binding. + NamedValues[VarName] = Alloca; + } + + KSDbgInfo.emitLocation(this); + + // Codegen the body, now that all vars are in scope. + Value *BodyVal = Body->Codegen(); + if (BodyVal == 0) + return 0; + + // Pop all our variables from scope. + for (unsigned i = 0, e = VarNames.size(); i != e; ++i) + NamedValues[VarNames[i].first] = OldBindings[i]; + + // Return the body computation. + return BodyVal; +} + +Function *PrototypeAST::Codegen() { + // Make the function type: double(double,double) etc. + std::vector Doubles(Args.size(), + Type::getDoubleTy(getGlobalContext())); + FunctionType *FT = + FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false); + + Function *F = + Function::Create(FT, Function::ExternalLinkage, Name, TheModule); + + // If F conflicted, there was already something named 'Name'. If it has a + // body, don't allow redefinition or reextern. + if (F->getName() != Name) { + // Delete the one we just made and get the existing one. + F->eraseFromParent(); + F = TheModule->getFunction(Name); + + // If F already has a body, reject this. + if (!F->empty()) { + ErrorF("redefinition of function"); + return 0; + } + + // If F took a different number of args, reject. + if (F->arg_size() != Args.size()) { + ErrorF("redefinition of function with different # args"); + return 0; + } + } + + // Set names for all arguments. + unsigned Idx = 0; + for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); + ++AI, ++Idx) + AI->setName(Args[Idx]); + + // Create a subprogram DIE for this function. + DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), + KSDbgInfo.TheCU.getDirectory()); + DIDescriptor FContext(Unit); + unsigned LineNo = Line; + unsigned ScopeLine = Line; + DISubprogram SP = DBuilder->createFunction( + FContext, Name, StringRef(), Unit, LineNo, + CreateFunctionType(Args.size(), Unit), false /* internal linkage */, + true /* definition */, ScopeLine, DIDescriptor::FlagPrototyped, false, F); + + KSDbgInfo.FnScopeMap[this] = SP; + return F; +} + +/// CreateArgumentAllocas - Create an alloca for each argument and register the +/// argument in the symbol table so that references to it will succeed. +void PrototypeAST::CreateArgumentAllocas(Function *F) { + Function::arg_iterator AI = F->arg_begin(); + for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { + // Create an alloca for this variable. + AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); + + // Create a debug descriptor for the variable. + DIScope *Scope = KSDbgInfo.LexicalBlocks.back(); + DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), + KSDbgInfo.TheCU.getDirectory()); + DIVariable D = DBuilder->createLocalVariable(dwarf::DW_TAG_arg_variable, + *Scope, Args[Idx], Unit, Line, + KSDbgInfo.getDoubleTy(), Idx); + + Instruction *Call = DBuilder->insertDeclare( + Alloca, D, DBuilder->createExpression(), Builder.GetInsertBlock()); + Call->setDebugLoc(DebugLoc::get(Line, 0, *Scope)); + + // Store the initial value into the alloca. + Builder.CreateStore(AI, Alloca); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = Alloca; + } +} + +Function *FunctionAST::Codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->Codegen(); + if (TheFunction == 0) + return 0; + + // Push the current scope. + KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]); + + // Unset the location for the prologue emission (leading instructions with no + // location in a function are considered part of the prologue and the debugger + // will run past them when breaking on a function) + KSDbgInfo.emitLocation(nullptr); + + // If this is an operator, install it. + if (Proto->isBinaryOp()) + BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); + Builder.SetInsertPoint(BB); + + // Add all arguments to the symbol table and create their allocas. + Proto->CreateArgumentAllocas(TheFunction); + + KSDbgInfo.emitLocation(Body); + + if (Value *RetVal = Body->Codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Pop off the lexical block for the function. + KSDbgInfo.LexicalBlocks.pop_back(); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + // Optimize the function. + TheFPM->run(*TheFunction); + + return TheFunction; + } + + // Error reading body, remove function. + TheFunction->eraseFromParent(); + + if (Proto->isBinaryOp()) + BinopPrecedence.erase(Proto->getOperatorName()); + + // Pop off the lexical block for the function since we added it + // unconditionally. + KSDbgInfo.LexicalBlocks.pop_back(); + + return 0; +} + +//===----------------------------------------------------------------------===// +// Top-Level parsing and JIT Driver +//===----------------------------------------------------------------------===// + +static ExecutionEngine *TheExecutionEngine; + +static void HandleDefinition() { + if (FunctionAST *F = ParseDefinition()) { + if (!F->Codegen()) { + fprintf(stderr, "Error reading function definition:"); + } + } else { + // Skip token for error recovery. + getNextToken(); + } +} + +static void HandleExtern() { + if (PrototypeAST *P = ParseExtern()) { + if (!P->Codegen()) { + fprintf(stderr, "Error reading extern"); + } + } else { + // Skip token for error recovery. + getNextToken(); + } +} + +static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (FunctionAST *F = ParseTopLevelExpr()) { + if (!F->Codegen()) { + fprintf(stderr, "Error generating code for top level expr"); + } + } else { + // Skip token for error recovery. + getNextToken(); + } +} + +/// top ::= definition | external | expression | ';' +static void MainLoop() { + while (1) { + switch (CurTok) { + case tok_eof: + return; + case ';': + getNextToken(); + break; // ignore top-level semicolons. + case tok_def: + HandleDefinition(); + break; + case tok_extern: + HandleExtern(); + break; + default: + HandleTopLevelExpression(); + break; + } + } +} + +//===----------------------------------------------------------------------===// +// "Library" functions that can be "extern'd" from user code. +//===----------------------------------------------------------------------===// + +/// putchard - putchar that takes a double and returns 0. +extern "C" double putchard(double X) { + putchar((char)X); + return 0; +} + +/// printd - printf that takes a double prints it as "%f\n", returning 0. +extern "C" double printd(double X) { + printf("%f\n", X); + return 0; +} + +//===----------------------------------------------------------------------===// +// Main driver code. +//===----------------------------------------------------------------------===// + +int main() { + InitializeNativeTarget(); + InitializeNativeTargetAsmPrinter(); + InitializeNativeTargetAsmParser(); + LLVMContext &Context = getGlobalContext(); + + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['='] = 2; + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + getNextToken(); + + // Make the module, which holds all the code. + std::unique_ptr Owner = make_unique("my cool jit", Context); + TheModule = Owner.get(); + + // Add the current debug info version into the module. + TheModule->addModuleFlag(Module::Warning, "Debug Info Version", + DEBUG_METADATA_VERSION); + + // Darwin only supports dwarf2. + if (Triple(sys::getProcessTriple()).isOSDarwin()) + TheModule->addModuleFlag(llvm::Module::Warning, "Dwarf Version", 2); + + // Construct the DIBuilder, we do this here because we need the module. + DBuilder = new DIBuilder(*TheModule); + + // Create the compile unit for the module. + // Currently down as "fib.ks" as a filename since we're redirecting stdin + // but we'd like actual source locations. + KSDbgInfo.TheCU = DBuilder->createCompileUnit( + dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0); + + // Create the JIT. This takes ownership of the module. + std::string ErrStr; + TheExecutionEngine = EngineBuilder(std::move(Owner)) + .setErrorStr(&ErrStr) + .setMCJITMemoryManager(new SectionMemoryManager()) + .create(); + if (!TheExecutionEngine) { + fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); + exit(1); + } + + FunctionPassManager OurFPM(TheModule); + + // Set up the optimizer pipeline. Start with registering info about how the + // target lays out data structures. + TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); + OurFPM.add(new DataLayoutPass()); +#if 0 + // Provide basic AliasAnalysis support for GVN. + OurFPM.add(createBasicAliasAnalysisPass()); + // Promote allocas to registers. + OurFPM.add(createPromoteMemoryToRegisterPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + OurFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + OurFPM.add(createReassociatePass()); + // Eliminate Common SubExpressions. + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); + #endif + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. + TheFPM = &OurFPM; + + // Run the main "interpreter loop" now. + MainLoop(); + + TheFPM = 0; + + // Finalize the debug info. + DBuilder->finalize(); + + // Print out all of the generated code. + TheModule->dump(); + + return 0; +} diff --git a/llvm/examples/Kaleidoscope/Makefile b/llvm/examples/Kaleidoscope/Makefile index bd0c252c2c03..8c3b1e31d9bb 100644 --- a/llvm/examples/Kaleidoscope/Makefile +++ b/llvm/examples/Kaleidoscope/Makefile @@ -10,6 +10,6 @@ LEVEL=../.. include $(LEVEL)/Makefile.config -PARALLEL_DIRS:= Chapter2 Chapter3 Chapter4 Chapter5 Chapter6 Chapter7 +PARALLEL_DIRS:= Chapter2 Chapter3 Chapter4 Chapter5 Chapter6 Chapter7 Chapter8 include $(LEVEL)/Makefile.common