Parsing with Yapps

Recommended by Tim Peters!

Yapps (Yet Another Python Parser System) is an easy to use parser generator that is written in Python and generates Python code. Although there are several parser generators already available for Python, I had different goals, including learning about recursive descent parsers. Yapps is simple, is easy to use, and produces human-readable parsers. It is not fast, powerful, or particularly flexible. Yapps is designed to be used when regular expressions are not enough and other parser systems are too much: situations where you may write your own recursive descent parser. On this page you can find both Yapps 1 and Yapps 2. Yapps 1 is more like a functional language (concise grammars of the form when you see this, return this), while Yapps 2 is more like an imperative language (more verbose grammars of the form if/while you see this, do this). Both are completely free.

For a quick demonstration of how easy it is to write a Yapps grammar, take a look at the 15 line expression evaluation example, which produces code to parse and evaluate expressions like 13-(1+5)*8/7.

Please let me know if Yapps is useful for you! Thanks!

Yapps 1

Some unusual features of Yapps that may be of interest are:

  1. Yapps produces human-readable recursive descent parsers. There are several heuristics used to keep the generated code simple.
  2. Yapps produces context-sensitive scanners that pick tokens based on the type of tokens accepted by the parser. In some situations, token matching is ambiguous unless the context is taken into account.
  3. Yapps rules can pass arguments down to subrules, so subrules can use information (such as declarations) that was parsed at higher levels in the parsing process. These are sometimes called attribute grammars.

There are several disadvantages of using Yapps over another parser system:

  1. Yapps parsers are LL(1), which is less powerful in parsing power than LALR or SLR. There are some inconveniences having to do with LL(1) (such as the lack of operator precedence) that lead to more explicit grammars.
  2. The Yapps scanner can only read from strings, not from files, so it may not be useful if your input is large. However, it is possible to write a custom scanner for each application.
  3. Yapps is not designed with efficiency in mind.

My desire for a parser generator began in late 1997 with a school project for which I used my own hand-written parser. At the same time, there was a thread on comp.lang.python about parsing e-mail addresses with nested <>'s. I was reading Stroustrup's "The C++ Programming Language" and came across an expression parser. I was also reading Mark Lutz's "Programming Python" and came across another expression parser. The combination of these four events inspired me to write a parser generator that could produce parsers similar to Lutz's hand-written parser. Many of the features of Yapps are inspired by the ANTLR/PCCTS system written by Terrence Parr. ANTLR combines flexibility, speed, and parsing power into a system that produces readable recursive descent LL(k) parsers for C, C++, or Java.

Download

Yapps 1 is a single Python file, approximately 30k in length. It is likely that you will also want to see the documentation, available in these formats:

There are several small examples available:

Grammar Parser
Calculator expr.g expr.py
Lisp Expressions lisp.g lisp.py
Yapps Grammar parsedesc.g parsedesc.py

I use Yapps mostly for small languages. I have not released any Yapps grammars for large languages like HTML or Java, but someday I hope to work on a grammar and interpreter for a small object oriented language.

Yapps 2

Yapps 2 is more flexible than Yapps 1 but it requires Python 1.5 and is not backwards-compatible with Yapps 1. The main changes are:

Download

You can download a ZIP file here, with the Yapps 2.01 parser generator, a Yapps 2.01 run time library, the Yapps 2 grammar (used to build Yapps 2 itself!), some examples, a TeX documentation file, and a PostScript documentation file. Yapps (all versions) is free for all users.

Future

Yapps 2 is quite good for my needs. The weak point is primarily error handling. Yapps will detect errors in the input and complain, and it attempts to display the portion of the input that was bad, but its explanatory abilities are limited. There also isn't any attempt at error recovery.

I often need parsers in C++. It would be neat if Yapps could produce a C++ parser instead of a Python parser. However, since Yapps mixes the grammar with Python code, a single grammar couldn't be used to build both a parser module in Python and another in C++.

Last modified 11:10 Sun 07 Oct 2001 , Amit Patel