Free Compiler Construction Tools
Lexical analysers (lexers) and parser generators, programming language creation kits
Free Compiler Construction Tools
If you are thinking of creating your own programming language, writing a compiler or interpreter, or a scripting facility for your application, or even creating a documentation parsing facility, the tools on this page are designed to (hopefully) ease your task. These compiler construction kits, parser generators, lexical analyzer / analyser (lexers) generators, code optimzers (optimizer generators), provide the facility where you define your language and allow the compiler creation tools to generate the source code for your software.
If you want a (printed) book on compiler construction, you might want to check out the famous Compilers: Principles, Techniques, and Tools by Aho, Sethi and Ullman. The "Dragon book", as it is affectionately called by some, is regarded by many as the standard book on writing compilers.
If you want free programming language grammars for a particular language (eg, C, C++, Ada, COBOL, etc) to ease your task for constructing a compiler for that language, check out the Free Programming Language Grammars for Compiler Design page.
Free Compiler Construction Kits
AdaGOOP, which stands for Ada Generator of Object Oriented Parsers, creates a parser that generate an object oriented parse tree, and a traversal of the tree using the visitor pattern. It relies on the SCATC versions of aflex and ayacc which you can also get from their site. The source code is provided, and there are no restrictions on its use.
- JS/CC LALR(1) Parser and Lexical Analyzer Generator
- Quex - A Mode Oriented Directly Coded Lexical Analyzer Generator
Quex, or Queχ (depending on which part of the site you read), produces a directly coded lexical analyzer engine with pre- and post- conditions rather than the table-driven created by the Lex/Flex family (see elsewhere on this page). Features include inheritable "lexer modes" that provide transition control and indentation events, a general purpose token class, a token queue that allow tokens to be communicated without returning from the lexical analyzer function, line and column numbering, generation of transition graphs, etc. You will need to install Python before using this lexical analyser generator. It generates C++ code. It is released under the GNU LGPL with additional restrictions; see the documentation for details. Windows, Mac OS X, Solaris and Linux are supported.
- Gardens Point Scanner Generator
The Gardens Point Scanner Generator, GPLEX, accepts a lex-like input specification to create a C# lexical scanner. The scanner produced is thread-safe and all scanner state is maintained within the scanner instance. Note that the input program does not support the complete POSIX lex specifications. The scanner uses the generic types defined in C# 2.0.
- Gardens Point Parser Generator
The Gardens Point Parser Generator, GPPG, accepts a yacc-like program to produce a thread-safe C# parser. The parser uses the generic types defined in C# 2.0.
- Bison (parser generator)
Bison generates a parser when presented with a LALR (1) context-free grammar that is yacc compatible. The generated parser is in C. It includes extensions to the yacc features that actually make it easier to use if you want multiple parsers in your program. Bison works on Windows, MSDOS, Linux and numerous other operating systems. The link points to the source code which should compile with many compilers (especially GNU's gcc). Although the program itself is under GPL, the generated parser (using the bison.simple skeleton) can be distributed without restriction.
This program, re2c, generates a scanner from regular expressions. It supposedly generates scanners that are faster than a flex scanner (see elsewhere on this page). However, note that the scanner is not a complete scanner: you will need to supply some interface code (to provide the input stream to the scanner), which also means that you have a certain amount of flexibility in determining how the scanner gets its input (it can be from a file, or just some buffer you created on the fly).
Grammatica is a parser generator for C# and Java. It uses LL(k) grammars with unlimited number of look-ahead tokens. It purportedly creates commented and readable source code, has automatic error recovery and detailed error messages. The generator creates the parser at runtime thus also allowing you to test and debug the parser before you even write your source code. The program is released under the GNU General Public License with an exception to facilitate its use by commercial software.
- MLRISC Retargetable and Optimizing Compiler Back End
MLRISC is a customizable optimizing compiler backend that can be retargeted to multiple architectures. It is written in Standard ML, and requires that your front end be written in ML.
- YaYacc (Generates Parsers)
YaYacc, or Yet Another Yacc, generates C++ parsers using an LALR(1) algorithm. YaYacc itself runs on FreeBSD, but the resulting parser is not tied to any particular platform (it depends on your code, of course).
- Optimix Optimizer Generator
This optimizer generator allows you "to generate program analysis and transformations". It may be used in a CoSy compiler framework, with the Cocktail tool, or with Java.
- Jaccie (Java-based Compiler Compiler) and SIC (Smalltalk-based Interactive Compiler Compiler)
Jaccie includes a scanner generator and a variety of parser generators that can generate LL(1), SLR(1), LALR(1) grammars. It has a debugging mode where you can operate it non-deterministically. It is based on the earlier SIC, which uses the Smalltalk programming language for evaluation rules.
- GOLD Parser
The GOLD Parser is a parser generator (compiler-compiler) that generates parsers that use a Deterministic Finite Automaton (DFA) for the tokenizer and a LALR(1) for the state machine. Unlike other parser generators, GOLD does not require you to embed your grammar into your source code. It saves the parse tables into a separate file which is loaded by the parser engine when run.
- LEMON Parser Generator
This LALR(1) parser generator claims to generate faster parsers than Yacc or Bison. The generated parsers are also re-entrant and thread-safe. The program is written in C, and only the source code is provided, so you will need a C compiler to compile it before you can use it.
- Accent Compiler Compiler
A compiler-compiler that avoids the problems of the LALR parsers (eg, when faced with shift/reduce and reduce/reduce conflicts) and LL parsers (with its restrictions due to left-recursive rules). You specify your input grammar in the Extended-Backus-Naur-Form, in which you are allowed to indicate repetition, choices and optional parts. You can insert semantic actions anywhere, and ambiguous grammars are allowed. All these features make Accent grammars easier to write than (eg) Yacc grammars. The website warns however that the generated code require significantly more system resources than code generated by Yacc. Accent is distributed under GNU GPL. I'm not sure about the generated C code.
- PRECCX (Prettier Compiler-Compiler Extended)
PRECCX, or PREttier Compiler-Compiler eXtended, is "an infinite-lookahead compiler-compiler for context dependent grammars" which generates C code. You specify an input grammar in an extended BNF notation where inherited and synthetic attributes are allowed. The parser is essentially LL(infinity) with optimisations. You can get versions for MSDOS, Linux and other Unices (including Sun, HP, etc). Source code is available and you can apparently compile it on other platforms with an ANSI C compiler if needed.
- Byacc/J (Parser Generator)
This is a version of Berkeley yacc modified so that it can generate Java source code. You simply supply a "-J" option on the command line and it'll produce the Java code instead of the usual C output. You can either get the free source code and compile it yourself, or download any of the precompiled binaries for Solaris, SGI/IRIX, Windows, and Linux. Like the byacc original (see elsewhere on this page), your output is free of any restrictions, and you can freely use it for any purpose you wish.
- COCO/R (Lexer and Parser Generators)
This tool generates recursive descent LL(1) parsers and their associated lexical scanners from attributed grammars. It comes with source code, and there are versions to generate Oberon, Modula-2, Pascal, C, C++, Java. A version for Delphi is (at the time of this writing) "on the way". Platforms supported appear to vary (Unix systems, Apple Macintosh, Atari, MSDOS, Oberon, etc) depending on the language you want generated.
A programming environment that allows you to generate complete language implementations from application-oriented specifications. The user describes the problems that needs to be solved and Eli uses the tools and components required for that problem. It handles structural analysis, analysis of names, types, values, stores translation structures and produces the target text. It generates C code. The program is available in source form and has been tested under Linux, IRIX, HP-UX, OSF, and SunOS. Eli itself is distributed under the GNU GPL but the generated code is your property to do as you please.
This freeware system, written in Prolog, and requiring SICStus Prolog 3.7, SWI Prolog or Quintus Prolog (no longer maintained?) to run, handles phrase structure parsing, semantic-head-driven generation and constraint logic programming and includes a source level debugger.
- TP Lex/Yacc (Lexical Analyzer and Parser Generators)
This is a version of Lex and Yacc designed for Borland Delphi, Borland Turbo Pascal and the Free Pascal Compiler (you can find legally free versions of all the above listed on the Free Delphi Compilers and Pascal Compilers page). Like its lex and yacc predecessors, this version generates lexers and parsers, although in its case, the generated code is in the Pascal language.
- Gentle Compiler Construction System
This compiler construction tool purports to provide a uniform framework for language recognition, definition of abstract syntax trees, construction of tree walkers based on pattern recognition, smart traversal, simple unparsing for source to source translation and optimal code selection for microprocessors. Note however that if you use it to create an application, the licensing terms require that your applications be licensed under the GNU GPL. This probably restricts your use of it in a commercial program, unless you are prepared to pay for a special license or you plan to make the sources for your program available anyway.
- Bison for Eiffel (Parser generator)
This version of Bison produces Eiffel source code. Like Bison, it is released under the GNU GPL. I am uncertain whether the generated parser can be distributed freely (the current versions of Bison allow this if you do not modify the output) without restrictions.
- ANTLR (Recursive Descent Parser Generator)
ANTLR generates a recursive descent parser in C, C++ or Java from predicated-LL(k>1) grammars. It is able to build ASTs automatically. If you are using C, you may have to get the PCCTS 1.XX series (the precursor to ANTLR), also available at the site. The latest version may be used for C++ and Java.
- Byacc (Berkeley YACC)
Berkeley YACC ("Yet Another Compiler Compiler") is a public domain parser generator that is the precursor of the GNU BISON. The link above points to a directory where you can download the sources to the program (look for a file beginning with "byacc"). Although it appears to be no longer maintained, this is one of the best yacc clones (plus it's in the public domain). The link points to the source code which should compile with many compilers (including GNU's gcc). It generates C code.
- BtYacc (generates parsers)
To quote from the documentation, BtYacc, or BackTracking Yacc, "is a modified version of Berkeley Yacc that supports automatic backtracking and semantic disambiguation to parse ambiguous grammars, as well as syntactic sugar for inherited attributes". The program comes with sources which are in the public domain. Although the author only mentions compilation of the program on Unix and Win32 systems, it is likely that the program can be compiled and run on DOS systems using an MSDOS port of the GNU C compiler like DJGPP, since the GNU compiler was used on the other systems. For more information about DJGPP, see the Free C and C++ compilers page.
- Flex (Lex drop-in replacement)
FLEX generates a lexical analyser in C or C++ given an input program. It is compatible with the original lex, although it has numerous features that make it more useful if you are writing your own scanner. It is designed so that it can be used together with yacc and its clones (like byacc and bison, also listed on this page). It is highly compatible with the Unix lex program. The URL given above is for the directory containing the original source code. The file name to look for (at the time I write this) is
flex-2.5.4.tar.gz, although you should probably get the latest version (which probably has a bigger number attached to the filename) if there is any. The source code can also be obtained from the GNU ftp site.
- Java Compiler Compiler (JavaCC)
This Java parser generator is written in Java and produces pure Java code. It even comes with grammars for Java 1.0.2, 1.1 as well as HTML. It generates recursive descent parsers (top-down) and allows you to specify both lexical and grammar specifications in your input grammar. In terms of syntactic and semantic lookahead, it generates an LL(1) parser with specific portions LL(k) to resolve things like shift-shift conflicts. The input grammar is in extended BNF notation. It comes with JJTree, a tree building preprocessor; a documentation generator; support for Unicode (and hence internationalization), and many examples. There are numerous other features, including debugging capabilities, error reporting, etc.
- Programming Language Creator
According to the documentation, the Programming Language Creator is designed to enable you "to easily create new programming languages, or create interpreted versions of any compiled language" without the need for you to wrestle with yacc and lex. If you want your application to have a scripting language, you might want to look at this to see if it meets your requirements. The binaries, available free, are for Windows, and the source code is available for a fee.
- SableCC (generates lexers)
This is an object-oriented framework that generates DFA based lexers, LALR(1) parsers, strictly typed syntax trees, and tree walker classes from an extended BNF grammar (in other words, it's a compiler generator). The program was written in Java itself, runs on any Java 1.1 (or later) system and generates Java sources.
Can't Find What You're Looking For?
Search the site using Google.
How to Link to This Page
It will appear on your page as: