A Rust implementation of the Lark parsing toolkit

lark-rs

One grammar. Three engines. Native Rust.

Use the same Extended Backus–Naur Form grammar with LALR, Earley or CYK. Build automatic parse trees and reuse the core from Rust, Python, WebAssembly or C. Behaviour is checked against Python Lark.

Lark grammars, native Rust — checked, not claimed.

A field-guide engraving of a lark perched at the root of a parse tree, the tree's nodes tinted copper, sky-blue and green.
A grammar in, a tree out — drawn as a field-guide plate.

One grammar, three engines

Pick the parser, keep the grammar

The same .lark grammar runs under three algorithms — you change one option, not your grammar. Each comes with the lexer that suits it.

json.lark — one grammar, used by all three engines below
?start: value

?value: object
      | array
      | ESCAPED_STRING
      | SIGNED_NUMBER  -> number
      | "true"  | "false" | "null"

array  : "[" [value ("," value)*] "]"
object : "{" [pair ("," pair)*] "}"

%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%ignore common.WS
main.rs
live · lark-rs compiled to WebAssembly, parsing in your browser Open full playground ↗

Pick the parser, keep the grammar — for real.

Loads the lark-rs WebAssembly engine (~1 MB) and parses entirely client-side. Change the Parser control between lalr, earley and cyk on the same grammar and watch the tree.

Why lark-rs

Why this, and not an adjacent Rust parser

lark-rs is for people who value Lark's grammar model: one EBNF grammar, a choice of parsing algorithm, contextual lexing, automatic trees and explicit ambiguity. It is not trying to be an incremental editor parser — that is a different problem.

ToolIts focusHow lark-rs differs
tree-sitterIncremental, error-tolerant parsing for editorslark-rs targets batch grammar-driven parsing with automatic trees and a choice of algorithm — not incremental editor reparsing.
pestAccessible PEG grammarslark-rs uses EBNF/CFG with explicit ambiguity (Earley/CYK), contextual lexing and Lark-compatible semantics.
chumskyParser combinators with error recoverylark-rs is grammar-first: you write a .lark file, not Rust combinators.
LALRPOPLR(1) parser generationlark-rs keeps LALR but adds Earley + CYK behind the same grammar, plus contextual lexing.

The wedge: if you already have a .lark grammar, keep it — change the runtime, not the grammar.

Architecture

Four stages, one pipeline

A grammar becomes a parser in four clear stages. The engine never inspects a symbol name — everything is interned to integer ids first.

01 · load

Load

Hand-written lexer + recursive-descent parser turn .lark text into a surface grammar (rules, terminals, imports).

02 · lower

Lower

Every symbol is interned to a Copy id; tree-shaping flags and augmented start rules are precomputed.

03 · build

Build

The chosen engine builds its tables: dense LALR action/goto, the Earley recognizer + SPPF, or CYK's CNF.

04 · parse

Parse

Tokens drive the engine; automatic tree shaping yields Tree / Token with no user action code.

.lark grammar → load → lower → build → parse → Tree / Token

Full tourist map: ARCHITECTURE.md.

Evidence-gated development

Autonomy ends where verification ends

lark-rs is developed with coding agents, but authority follows evidence — not confidence. A change may proceed autonomously when its result can be checked against Python Lark, a compliance bank, a regression test or a deterministic complexity gate. Decisions without an objective basis remain with the human architect.

01
Oracle · gate · bank

The result can be independently checked.

→ Act autonomously and self-check.
02
Written principle + judgement

Evidence narrows the choice but does not settle it.

→ Decide, explain, and record the reasoning (an ADR).
03
No falsifiable basis

The question is product direction, taste or an ungrounded trade-off.

→ Keep the decision human.

Four concrete manifestations in the repository:

PrincipleIn lark-rs
Oracle before implementationPython Lark produces the expected behaviour before a feature is written.
Demonstrate before fixingBugs and performance pathologies become failing cases first.
Deterministic evidenceComplexity regressions use work counters, not noisy timing thresholds.
Durable decisionsImportant trade-offs and exceptions are recorded in the repository (ADRs).

Every significant claim on this page carries one of four statuses:

Verified enforced by an oracle, bank or gate   Measured observed in a named benchmark   Goal a stated direction, not yet shown   Open a known limitation

Not every valuable property has an oracle. API ergonomics, product direction and some resource-policy questions still require judgement. When the project cannot make a decision falsifiable, it records the uncertainty rather than pretending otherwise.

What you keep from Lark

The differentiators, preserved

lexing

Contextual lexer

Parser state narrows which terminals the lexer tries — resolving most LALR terminal conflicts with no user intervention.

ambiguity

Explicit ambiguity

SPPF-based Earley handles any CFG and can emit _ambig forests when a grammar is genuinely ambiguous.

grammar

Rich EBNF

+ * ?, alternation, char ranges, priorities, aliases, and parameterized templates.

trees

Automatic trees

Tree / Token without action code, with ?rule, _rule and !rule shaping modifiers.

imports

Grammar composition

%import pulls terminals and rules from bundled libraries or sibling files — common terminals can't drift from Lark.

targets

Multiple targets

One core, reachable from Rust, Python (PyO3), WebAssembly and a C API — plus standalone parser generation.

Targets

One core, many runtimes

rust

Rust crate

The native API: Lark, LarkOptions, ParserAlgorithm, LexerType.

python

Python (PyO3)

Native bindings, so a Lark grammar can run on the Rust core from Python.

wasm

WebAssembly

The whole engine in the browser — the live playground is this build.

c-api

C API

A C-callable surface for embedding in native applications.

standalone

Generated parser

Emit a self-contained Rust LALR parser that depends only on regex + std.

Performance

Measured first, honestly

Lead with measured compatibility; treat performance as a documented snapshot, not a slogan.

Measured

~4–5× Python Lark (LALR)

On the reference JSON workloads: ~4.8× small, ~4.7× medium, ~4.4× large, vs in-tree Python Lark's LALR engine.

Goal

10–100× on suitable workloads

The project's stated direction — explicitly a goal, not the present general result.

Verified

Deterministic scaling gates

Earley super-linearity, the CYK cubic envelope and lexer scans are gated on work counters, never wall-clock.

Methodology and the full trend: BENCH.md.

Known limits

What is still open

lark-rs is pre-user: backward compatibility is still free because there are no real dependants yet. A clearly labelled open gap is more useful than an unsupported "production-ready" badge.

Quick start

Run an example, then read the source

Until packaging is settled, start from source — the JSON example is the canonical entry point.

shell
# clone the fork and run the canonical JSON example
git clone https://github.com/okalldal/lark.git
cd lark/lark-rs
cargo run --release --example json_parser
cargo test