Sumchecks Bloody Sumchecks
I think that's what Bono really meant to say. But it was the 80s, disco was alive, Ozzy Osbourne was biting heads off bats, and boys didn't cry; so we must have lost it in the smoke.
The cryptography team at A16z, however, did not get it mixed up. They've designed this zk-VM, Jolt, that essentially reduces proving correct program execution to proving that a sequence of polynomial constraints are satisfied.
Once these constraints are defined, they go on to use the sum-check algorithm to efficiently prove that the constraints are indeed satisfied. There's a lot of drama about how to do this efficiently and correctly, and it can get quite complicated. Each constraint takes the form
$$ g(X) = f(X) \quad \forall X \in \bit^n $$
where $g, f \in \mathbb{F}[X_1, \ldots, X_n]$ are multi-linear extensions of functions over the program trace. To show that the two polynomials are the same, we sample a random point $r \xleftarrow{$} \mathbb{F}^n$, and check if
From the guarantees of Schwartz–Zippel lemma, we are guaranteed that the constraints are satisfied with high probability if the check passes.
The problem we have is — if we ask you to list every constraint (or equivalently, its randomised sum-check representation, as shown above) — it is actually non-trivial to extract this information from the source code.
So we went ahead, read through the code, and rewrote the sum-checks as an abstract syntax tree. This lets us programmatically extract the sum-check equations, print them in any format we want, and also verify that virtual polynomial openings are reduced to real committed polynomial openings1.
Below is a brief description of what we did. Here is the code.
Caveat: The specs currently omit Jolt's Advice polynomials — auxiliary inputs the prover supplies to shortcut certain computations. They participate in some sumchecks but are not yet modelled in the AST. Consider the specs a faithful-but-incomplete picture of the full pipeline. In the coming weeks, we will integrate this into the actual Rust codebase, so we can auto-generate the sum-checks as a zero-cost abstraction. For now we live with the potential noise of translating Rust into Python and generating specs from it.
Typically the way a sum-check oracle is simulated in practice is by having the prover commit to the polynomial using a polynomial commitment scheme, and then open the commitment at a random point. There are 21 sum-checks in Jolt, over 50-odd polynomials. Only 7 of them are committed. In Jolt we reduce the correctness of opening the other non-committed polynomials to opening a committed polynomial.
Each sumcheck is an abstract syntax tree
As described above, every sumcheck in Jolt proves a statement of this shape:
$$\sum_{\text{vars}} f(\text{vars}) = \text{claimed sum}$$
We represent each one as a SumcheckSpec with three core fields and four pieces of protocol metadata:
| Field | What it is |
|---|---|
sum_vars | The variables being summed over (list of Var) |
integrand | The expression tree $f$ inside the $\sum$ |
input_claim | What the sum equals — the RHS |
opening_point | The fresh random point the protocol produces |
degree | Max degree of the round polynomials |
rounds | How many rounds (e.g. log2(T)) |
openings | Which polynomial openings are produced afterward |
The first three are the mathematics. The rest is protocol metadata. And degree is auto-checkable: we can walk the integrand tree and infer it from the MLE structure, since every polynomial in Jolt is multilinear (degree $\leq 1$ in each variable).
sumcheck/spec.py:76
@dataclass
class SumcheckSpec:
name: str
sum_vars: list[Var] # what we sum over
integrand: Expr # the expression tree inside the Σ
input_claim: Expr # what the sum equals (RHS)
opening_point: list[Opening] # fresh random point from the protocol
rounds: str # e.g. "log2(T)"
degree: int | str # max degree of round polynomials
openings: list[tuple[str, list[Arg]]] # polynomial openings produced
@property
def inferred_degree(self) -> int:
return(self.)
The degree is inferred automatically — every polynomial in Jolt is an MLE (multilinear extension), so each polynomial leaf has degree exactly 1. The rest follows from how leaves combine:
sumcheck/ast.py:332
def degree(expr:) -> int:
if isinstance(,): return 0
if isinstance(, (,,)): return 1
if isinstance(,): return max((.),(.))
if isinstance(,): return(.) +(.)
if isinstance(,): return expr.exponent *(.)
if isinstance(,): return(.)
A product of two MLEs has degree 2. A booleanity check $p^2 - p$ has degree 2. The degree directly determines proof size — the prover sends a univariate polynomial of this degree in each round.
Building a sumcheck
Here's a real example — the Stage 4 registers read/write checking sumcheck. It verifies that register reads and writes are consistent across all cycles, batched together with a random challenge $\gamma$:
Each polynomial in Jolt lives in a registry as a PolyDef — it knows its name, its kind (committed, virtual, or verifier-computable), and its domain. Since PolyDef is callable, we import polynomials directly from the registry and call them with arguments to build AST nodes:
sumcheck/examples/stage4/registers_read_write.py:24
from sumcheck.defs import Var, Opening, DIM_K_REG, DIM_CYCLE
from sumcheck.registry import virtual, committed
from sumcheck.ast import Const, Mul, Pow, eq, add
from sumcheck.spec import SumcheckSpec
def stage4_registers_read_write() ->:
# Variables we sum over
X_k =("X_k", DIM_K_REG) # register index
X_t =("X_t", DIM_CYCLE) # cycle/timestep
# Fixed point from Stage 3
r3_cycle =(3, DIM_CYCLE)
gamma =("γ")
# Polynomial leaves -- imported from registry, called with args.
# The registry knows RdWa is virtual, RdInc is committed, etc.
rd_wa = virtual.(,) # → VirtualPoly("RdWa", [X_k, X_t])
rs1_ra = virtual.(,)
rs2_ra = virtual.(,)
reg_val = virtual.(,)
rd_inc = committed.() # → CommittedPoly("RdInc", [X_t])
# Build the integrand tree
integrand =(
(,), # verifier-computable
(
(,(,)),
(,(,)),
((, 2),(,)),
),
)
# RHS -- what the sum equals
rhs =(
.(),
(,.()),
((, 2),.()),
)
return(
name="RegistersReadWriteChecking",
sum_vars=[,],
integrand=,
input_claim=,
opening_point=[(4, DIM_K_REG),(4, DIM_CYCLE)],
rounds="log2(K_reg) + log2(T)",
degree=3, # eq(1) · RdWa(1) · Val(1) = 3
openings=[
("RegistersVal", [(4, DIM_K_REG),(4, DIM_CYCLE)]),
("Rs1Ra", [(4, DIM_K_REG),(4, DIM_CYCLE)]),
("Rs2Ra", [(4, DIM_K_REG),(4, DIM_CYCLE)]),
("RdWa", [(4, DIM_K_REG),(4, DIM_CYCLE)]),
("RdInc", [(4, DIM_CYCLE)]),
],
)Vars and Openings
Every polynomial in the tree is evaluated at some arguments. There are exactly two kinds:
sumcheck/defs.py:64
@dataclass(frozen=True)
class Var:
"""A free variable being summed over."""
name: str # "X_t", "X_k"
dim: DimDef # which domain dimension it ranges over
@dataclass(frozen=True)
class Opening:
"""A fixed point from a prior-stage sumcheck."""
stage: int # which stage produced it (1-7)
dim: DimDef # which domain dimension
label: str = "" # override print label; defaults to dim.label
Arg = Union[Var, Opening]
A Var is what the sumcheck sums over -- it's a free variable ranging over a hypercube $\{0,1\}^n$. After the protocol completes, the verifier's random challenges bind it to a concrete point, producing an Opening for the next stage. That's the whole pipeline: today's Var becomes tomorrow's Opening.
In the Stage 4 example above, X_k and X_t are the free variables being summed out. After the sumcheck, the verifier has random points $\opening{4}{K_reg}$ and $\opening{4}{cycle}$ — these become Opening(4, DIM_K_REG) and Opening(4, DIM_CYCLE) that later stages reference. Meanwhile, r3_cycle = Opening(3, DIM_CYCLE) is a fixed point received from Stage 3 -- it appears in the eq(r3_cycle, X_t) selector and in the RHS evaluations.
Both DIM_K_REG and DIM_CYCLE are DimDef constants. A DimDef names one factor of a polynomial's hypercube domain — a polynomial over $\{0,1\}^{\log_2 T} \times {0,1}^{\log_2 K_{\text{reg}}}$ has two DimDefs, one for each factor:
sumcheck/defs.py:29
@dataclass(frozen=True)
class DimDef:
"""One dimension of a polynomial's hypercube domain."""
size: str # symbolic parameter: "T", "K_ram", "N_instr"
label: str # short label for openings: "cycle", "K_ram", "addr"
description: str = "" # human-readable: "RAM address", "one-hot chunk"
The standard dimensions used across Jolt:
sumcheck/defs.py:48
DIM_CYCLE =("T", "cycle", "cycle/timestep")
DIM_K_REG =("K_reg", "K_reg", "register index")
DIM_K_RAM =("K_ram", "K_ram", "RAM address")
DIM_ADDR =("N_instr", "addr", "one-hot chunk")
DIM_K_INSTR =("K_instr", "K_instr", "instruction address")
DIM_K_BC =("K_bc", "K_bc", "bytecode address")
DIM_N_V =("N_v", "N_v", "virtual chunk")
DIM_GROUP =("2", "group", "constraint group")
The size field is a symbolic parameter name ("T", "K_reg") -- Var uses it to compute log_size (e.g. "log2(T)"), which determines how many rounds that variable contributes to the sumcheck. The label field is what shows up in opening notation: Opening(3, DIM_CYCLE) renders as $\opening{3}{cycle}$.
This is why an opening point is a list of Openings -- one per dimension of the polynomial's domain. A polynomial like RegistersVal lives on $\{0,1\}^{\log_2 K_{\text{reg}}} \times {0,1}^{\log_2 T}$, so its full opening is two points:
opening_point=[(4, DIM_K_REG),(4, DIM_CYCLE)]
Both happen to come from Stage 4 here, but that's not always the case. A polynomial's dimensions can be pinned by different stages. For example, RamRa lives on $\{0,1\}^{\log_2 K_{\text{ram}}} \times {0,1}^{\log_2 T}$ and its opening might be:
[(2, DIM_K_RAM),(4, DIM_CYCLE)]
# ↑ address part from Stage 2 ↑ cycle part from Stage 4
The address dimension was summed out in Stage 2, the cycle dimension in Stage 4 -- so each component carries the stage that produced it.
Formatting the tree
We have an AST. Now we need to print it. The key idea: each node type gets its own format method. Define a Format base class with one abstract method per node type, then subclass it for each output format (plain text, LaTeX, HTML, ...).
Start with the leaves — arguments and polynomial evaluations:
sumcheck/format.py:36
class Format(ABC):
# ── Arguments ──
@abstractmethod
def fmt_var(self, v:) -> str: ...
@abstractmethod
def fmt_opening(self, o:) -> str: ...
# ── Leaves ──
@abstractmethod
def fmt_const(self, value:[int, float, str]) -> str: ...
@abstractmethod
def fmt_committed_poly(self, name: str, args:[str]) -> str: ...
@abstractmethod
def fmt_virtual_poly(self, name: str, args:[str]) -> str: ...
@abstractmethod
def fmt_verifier_poly(self, name: str, args:[str]) -> str: ...
For plain text, the leaf implementations are straightforward — prefix with the kind and list the arguments:
sumcheck/format.py:180
class TextFormat(Format):
def fmt_var(self, v):
return v.name # "X_t"
def fmt_opening(self, o):
return f"r_{o.print_label}^({o.stage})" # "r_cycle^(3)"
def fmt_const(self, value):
return str() # "γ", "0", "2^64"
def fmt_committed_poly(self, name, args):
arg_str = ", ".()
return f"cp:{name}({arg_str})" # "cp:RdInc(X_t)"
def fmt_virtual_poly(self, name, args):
arg_str = ", ".()
return f"vp:{name}({arg_str})" # "vp:Rs1Ra(X_k, X_t)"
def fmt_verifier_poly(self, name, args):
arg_str = ", ".()
return f"{name}:{name}({arg_str})" # "eq:eq(r_cycle^(3), X_t)"
For LaTeX, the same leaves become coloured and properly typeset:
sumcheck/format.py:323
class LatexFormat(Format):
def fmt_var(self, v):
return v.name # X_t -- already valid LaTeX
def fmt_opening(self, o):
label = o.print_label.("_", r"\_")
return f"r_{{\\text{{{label}}}}}^{{({o.stage})}}"
def fmt_committed_poly(self, name, args):
arg_str = ", ".()
latex_name =()
colored = f"\\textcolor{{ForestGreen}}{{{latex_name}}}"
return f"{colored}({arg_str})" # green
def fmt_virtual_poly(self, name, args):
arg_str = ", ".()
latex_name =()
colored = f"\\textcolor{{BurntOrange}}{{{latex_name}}}"
return f"{colored}({arg_str})" # orange
def fmt_verifier_poly(self, name, args):
arg_str = ", ".()
latex_name =()
return f"{latex_name}({arg_str})" # eq → \widetilde{\text{eq}}
Same tree, different format object, different output. The leaves are the easy part — the interesting bit is how the tree walker handles arithmetic and precedence.
Once the AST was in place, Claudio Bravo built the HTML and LaTeX front-ends with relatively little intervention — a clean demonstration that the per-node format architecture pays off in practice.
How resolution works in AST land
Having a nice AST for each sumcheck immediately raises a question: is the whole pipeline consistent? Specifically, Jolt has two kinds of polynomial:
- Committed polynomials (
cp:*) are backed by the PCS — the prover sends a commitment, and any claimed evaluation can be verified by the verifier directly. - Virtual polynomials (
vp:*) are not committed. The prover claims evaluations, and correctness is only established by a later sumcheck whose RHS consumes that claim.
So a virtual polynomial opening produced by stage $k$ is only valid once a sumcheck in a later stage $k' > k$ has its RHS reduce to that very evaluation. If a virtual claim is never consumed, the protocol is broken.
The AST gives us everything we need to check this mechanically.
Claims as AST leaves
When a sumcheck runs, the protocol binds the free summation variables to a fresh random point. This produces a list of opening claims — one per polynomial that appeared in the integrand. In the AST, these are exactly the leaf nodes evaluated at Var arguments: after the protocol they become Opening nodes for the next stage.
The openings field of each SumcheckSpec records exactly which claims are produced:
openings=[
("RegistersVal", [(4, DIM_K_REG),(4, DIM_CYCLE)]),
("Rs1Ra", [(4, DIM_K_REG),(4, DIM_CYCLE)]),
("RdInc", [(4, DIM_CYCLE)]), # committed — goes to PCS
]
Each entry is a (name, args) pair. The name identifies the polynomial; the args are the opening point (all Opening nodes, since the Vars have been bound). The kind (cp vs vp) is inferred by scanning the integrand for a matching polynomial leaf.
The resolution loop
The resolver walks all 21 sumchecks in stage order, maintaining a dict of outstanding claims — those produced but not yet consumed:
outstanding: dict[str, dict] = {} # claim_key → {node_id, kind, name}
The claim key is "PolyName@(r_label^(stage), ...)" — the polynomial name plus its opening point, stringified. For each sumcheck:
- Consume: walk the RHS (
input_claim) for polynomial leaves evaluated atOpeningargs. For each one, look up its claim key inoutstanding. If found, mark it resolved and delete it. If not found (it's a public input or constant), note it. - Produce: iterate
spec.openings. For each, compute the claim key and insert intooutstanding.
Committed openings that reach the end of the pipeline without being consumed are fine — they go to the PCS for direct verification. Unclaimed virtual openings are a bug.
Parametric matching
Many polynomials are parametric — InstructionRa(i) is a family, one poly per chunk index $i$. The integrand references a single template like vp("InstructionRa(i)", X_k, X_t), while the openings field might produce $N_{ra}$ individual claims InstructionRa(0), InstructionRa(1), ... When the RHS of a later sumcheck references InstructionRa(i), the resolver matches it against all outstanding claims whose base name starts with InstructionRa( at the same opening point:
matches = [
okey for okey in outstanding
if okey.()
and okey[:-len()].( + "(")
]
This bulk-resolves the entire family in one go.
The claim-flow DAG
Running python3 -m sumcheck resolve prints the full resolution trace on the terminal — which sumcheck consumed which claim, and a final verdict on unresolved virtuals.
The HTML site (python3 -m sumcheck html) also generates an interactive resolve.html page. It renders the same data as a directed acyclic graph using Cytoscape.js with a dagre layout: nodes are individual sumchecks coloured by stage, edges are claim flows (orange for virtual, green for committed → PCS). Clicking a node shows what it consumes and produces; clicking an edge lists the polynomials flowing along it.
For Jolt's 21 sumchecks, the graph has 22 nodes (21 sumchecks + 1 PCS sink) and 37 grouped edges. Every virtual claim resolves — the pipeline is consistent.