Kiuatan¶
Kiuatan ("horse" or "pony" in Chinook Jargon) is a library for building and running parsers in the Pony programming language.
- Kiuatan uses Parsing Expression Grammar semantics, which means:
- Choices are ordered, i.e. the parser will always try to parse alternatives in the order they are declared.
- Sequences are greedy, i.e. the parser will not backtrack from the end of a sequence.
- You can use positive and negative lookahead that does not advance the match position to constrain greedy sequences.
- Parsers do not backtrack from successful choices.
- Kiuatan parsers are "packrat" parsers; they memoize intermediate results, resulting in linear-time parsing.
- Parsers use Mederios et al's algorithm to handle unlimited left-recursion.
Obtaining Kiuatan¶
Corral¶
The easiest way to incorporate Kiuatan into your Pony project is to use Pony Corral. Once you have it installed, cd
to your project's directory and type:
This will add the library to your project. You can then build your project with something like:
Git¶
You can clone and build Kiuatan directly from GitHub (you must have Corral in your PATH
):
To use Kiuatan in a project without Corral you will need to add kiuatan/kiuatan
to your PONYPATH
environment variable.
Examples¶
The Kiuatan repo contains a canonical calulator example of how to use Kiuatan: calc.
Concepts¶
Kiuatan grammars can match over source sequences of any type that is
The most common source type will be U8
for parsing UTF-8 text (note that you will need to handle converting UTF-8 into normalized Unicode yourself if necessary).
Named Rules¶
A NamedRule
encapsulates and names a grammatical rule that encodes a PEG rule. To create a rule, you provide a name, a body, and an optional action. For example, the following rule will match either one two three
or one deux three
.
let rule =
recover val
let ws = NamedRule[U8]("WhiteSpace", Star[U8](Single[U8](" \t"), 1))
NamedRule[U8]("OneTwoThree",
Conj[U8](
[ Literal[U8]("one")
ws
Disj[U8]([ Literal[U8]("two"); Literal[U8]("deux") ])
ws
Literal[U8]("three")
]))
end
You can build the body of a rule from the following combinator classes:
Single
: matches a single source item. The constructor takes a set of possibilities to match. If you provide an empty list, this rule will match any single item.Literal
: matches a string of items.Conj
: matches a sequence of child rules.Disj
: matches one of a number of alternative child rules, in order. If one of the alternatives matches, but an outer rule fails later, the parser will not backtrack to another alternative.Error
: will trigger an error with the given message.Look
: will attempt to match its child rule, but will not advance the match position.Neg
: will succeed if its child rule does not match, and will not advance the match position.Star
: will match a number of repetitions of its child rule. You can specify a minimum or maximum number of times to match.Bind
: will bind the result of its child rule to an existing variable. See the calc example for an example of how to useBind
.Condition
: will succeed only if its child matches and the given condition returnstrue
.
Recursion¶
In order to allow recursive rules, you can create a rule with no body and set its body later using the set_body()
method:
// Add <- Add Op Num | Num
// Op <- [+-]
// Num <- [0-9]+
let rule: NamedRule[U8] val =
recover val
let add = NamedRule[U8]("Add", None)
let num = NamedRule[U8]("Num", Star[U8](Single[U8]("0123456789"), 1))
let op = NamedRule[U8]("Op", Single[U8]("+-"))
let body = Disj[U8]([Conj[U8]([add; op; num]); num])
add.set_body(body)
add
end
Note that Kiuatan can handle both direct and indirect left-recursion.
Source¶
A Source
is a sequence of sequences of your source type. Internally this is represented as a linked list of sequences, called "segments". The idea behind this is that you can swap out individual segments of text that your Parser
actor knows about, while maintaining the parse memo for the other segments. This allows a text editor, for example, to handle localized changes without re-parsing the whole source file.
Parser¶
A Parser
actor knows about the source you are parsing, and holds a memo of parsing results across parse attempts.
In order to attempt to parse a particular sequence (or sequence of segments) of items, create a Parser actor, giving it an initial source, and then call its parse
behaviour, passing a rule to use and a callback for when the parse either succeeds or fails:
let segment = "one two three"
let parser = Parser[U8]([segment])
parser.parse(rule, {(result: Result[U8]) =>
match result
| let success: Success[U8] =>
Debug.out("succeeded!")
| let failure: Failure[U8] =>
Debug.out("failed")
end
})
The generic parameters for the Parser
actor type (and other types in Kiuatan) are as follows:
S
: this is the "source type"; i.e. yourSource
s will be sequences of values of this type.D
: this is a "data" type. You can pass a value of this type to theParser.parse()
behaviour, and this value will be passed to your semanticAction
s.V
: this is the "result value" type. For each successful parse result, zero or more result values (from child results) will be passed to your semanticAction
, if present. If a rule has no action, child result values will be combined and passed to the next highest action.
Updating Source¶
You can update a parser's source by calling its remove_segment
and insert_segment
behaviours. The next time you initiate a parse, the parser's source will have been updated.
Results¶
If a parse succeeds, the result will be of type Success
, which represents the concrete parse tree. You can get details about the location of the match and results from child rules.
If a parse fails, the result will be of type Failure
, which contains some information about the location in the source where the failure occurred, and possibly an error message.
Actions and Result Values¶
If you wish, you can build a more abstract parse tree using semantic Action
s that you pass to rules. These actions should return a "result value" of your desired type. The result values from child rule successes are passed to the action.
Actions also receive a Bindings
map; you can access bound result values from child parses using this; you must return the map with the result of the action.
Public Types¶
- interface Action
- class Bind
- class Binding
- type Bindings
- class Cond
- class Conj
- class Disj
- class Error
- primitive ErrorMsg
- class Failure
- class Literal
- class Loc
- class Look
- class NamedRule
- class Neg
- interface ParseCallback
- actor Parser
- type Result
- trait RuleNode
- trait RuleNodeWithBody
- trait RuleNodeWithChildren
- type Segment
- class Single
- type Source
- class Star
- class Success
- class Variable