Sumchecks Bloody Sumchecks

I think that's what Bono really meant to say. But it was the 80s, disco was alive, Ozzy Osbourne was biting heads off bats, and boys didn't cry; so we must have lost it in the smoke.

The cryptography team at A16z, however, did not get it mixed up. They've designed this zk-VM, Jolt, that essentially reduces proving correct program execution to proving that a sequence of polynomial constraints are satisfied.

Once these constraints are defined, they go on to use the sum-check algorithm to efficiently prove that the constraints are indeed satisfied. There's a lot of drama about how to do this efficiently and correctly, and it can get quite complicated. Each constraint takes the form

$$ g(X) = f(X) \quad \forall X \in \bit^n $$

where $g, f \in \mathbb{F}[X_1, \ldots, X_n]$ are multi-linear extensions of functions over the program trace. To show that the two polynomials are the same, we sample a random point $r \xleftarrow{$} \mathbb{F}^n$, and check if

\begin{align*} f(r) &= g(r) \\[10pt] &= \sum_{X \in \bit^n }\eq{r}{X} g(X) \end{align*}

From the guarantees of Schwartz–Zippel lemma, we are guaranteed that the constraints are satisfied with high probability if the check passes.

The problem we have is — if we ask you to list every constraint (or equivalently, its randomised sum-check representation, as shown above) — it is actually non-trivial to extract this information from the source code.

So we went ahead, read through the code, and rewrote the sum-checks as an abstract syntax tree. This lets us programmatically extract the sum-check equations, print them in any format we want, and also verify that virtual polynomial openings are reduced to real committed polynomial openings¹.

Below is a brief description of what we did. Here is the code.

Caveat: The specs currently omit Jolt's Advice polynomials — auxiliary inputs the prover supplies to shortcut certain computations. They participate in some sumchecks but are not yet modelled in the AST. Consider the specs a faithful-but-incomplete picture of the full pipeline. In the coming weeks, we will integrate this into the actual Rust codebase, so we can auto-generate the sum-checks as a zero-cost abstraction. For now we live with the potential noise of translating Rust into Python and generating specs from it.

Typically the way a sum-check oracle is simulated in practice is by having the prover commit to the polynomial using a polynomial commitment scheme, and then open the commitment at a random point. There are 21 sum-checks in Jolt, over 50-odd polynomials. Only 7 of them are committed. In Jolt we reduce the correctness of opening the other non-committed polynomials to opening a committed polynomial.

Each sumcheck is an abstract syntax tree

As described above, every sumcheck in Jolt proves a statement of this shape:

$$\sum_{\text{vars}} f(\text{vars}) = \text{claimed sum}$$

We represent each one as a SumcheckSpec with three core fields and four pieces of protocol metadata:

Field	What it is
`sum_vars`	The variables being summed over (list of `Var`)
`integrand`	The expression tree $f$ inside the $\sum$
`input_claim`	What the sum equals — the RHS
`opening_point`	The fresh random point the protocol produces
`degree`	Max degree of the round polynomials
`rounds`	How many rounds (e.g. `log2(T)`)
`openings`	Which polynomial openings are produced afterward

The first three are the mathematics. The rest is protocol metadata. And degree is auto-checkable: we can walk the integrand tree and infer it from the MLE structure, since every polynomial in Jolt is multilinear (degree $\leq 1$ in each variable).

sumcheck/spec.py:76

@dataclass
class SumcheckSpec:
    name: str
    sum_vars: list[Var]            # what we sum over
    integrand: Expr                # the expression tree inside the Σ
    input_claim: Expr              # what the sum equals (RHS)
    opening_point: list[Opening]   # fresh random point from the protocol
    rounds: str                    # e.g. "log2(T)"
    degree: int | str              # max degree of round polynomials
    openings: list[tuple[str, list[Arg]]]  # polynomial openings produced

    @property
    def inferred_degree(self) -> int:
        return degree(self.integrand)

The degree is inferred automatically — every polynomial in Jolt is an MLE (multilinear extension), so each polynomial leaf has degree exactly 1. The rest follows from how leaves combine:

sumcheck/ast.py:332

def degree(expr: Expr) -> int:
    if isinstance(expr, Const):                                    return 0
    if isinstance(expr, (CommittedPoly, VirtualPoly, VerifierPoly)): return 1
    if isinstance(expr, Add):  return max(degree(expr.left), degree(expr.right))
    if isinstance(expr, Mul):  return degree(expr.left) + degree(expr.right)
    if isinstance(expr, Pow):  return expr.exponent * degree(expr.base)
    if isinstance(expr, Neg):  return degree(expr.expr)

A product of two MLEs has degree 2. A booleanity check $p^2 - p$ has degree 2. The degree directly determines proof size — the prover sends a univariate polynomial of this degree in each round.

Building a sumcheck

Here's a real example — the Stage 4 registers read/write checking sumcheck. It verifies that register reads and writes are consistent across all cycles, batched together with a random challenge $\gamma$:

\begin{aligned} \sum_{X_k, X_t} \text{eq}(r^{(3)}_{\text{cycle}}, X_t) \cdot \Big( &\text{RdWa}(X_k, X_t) \cdot (\text{RdInc}(X_t) + \text{RegistersVal}(X_k, X_t)) \\ + \gamma \cdot &\text{Rs1Ra}(X_k, X_t) \cdot \text{RegistersVal}(X_k, X_t) \\ + \gamma^2 \cdot &\text{Rs2Ra}(X_k, X_t) \cdot \text{RegistersVal}(X_k, X_t) \Big) \\ = \text{RdWriteValue}(r^{(3)}_{\text{cycle}}) &+ \gamma \cdot \text{Rs1Value}(r^{(3)}_{\text{cycle}}) + \gamma^2 \cdot \text{Rs2Value}(r^{(3)}_{\text{cycle}}) \end{aligned}

Each polynomial in Jolt lives in a registry as a PolyDef — it knows its name, its kind (committed, virtual, or verifier-computable), and its domain. Since PolyDef is callable, we import polynomials directly from the registry and call them with arguments to build AST nodes:

sumcheck/examples/stage4/registers_read_write.py:24

from sumcheck.defs import Var, Opening, DIM_K_REG, DIM_CYCLE
from sumcheck.registry import virtual, committed
from sumcheck.ast import Const, Mul, Pow, eq, add
from sumcheck.spec import SumcheckSpec

def stage4_registers_read_write() -> SumcheckSpec:
    # Variables we sum over
    X_k = Var("X_k", DIM_K_REG)     # register index
    X_t = Var("X_t", DIM_CYCLE)     # cycle/timestep

    # Fixed point from Stage 3
    r3_cycle = Opening(3, DIM_CYCLE)
    gamma = Const("γ")

    # Polynomial leaves -- imported from registry, called with args.
    # The registry knows RdWa is virtual, RdInc is committed, etc.
    rd_wa   = virtual.RdWa(X_k, X_t)           # → VirtualPoly("RdWa", [X_k, X_t])
    rs1_ra  = virtual.Rs1Ra(X_k, X_t)
    rs2_ra  = virtual.Rs2Ra(X_k, X_t)
    reg_val = virtual.RegistersVal(X_k, X_t)
    rd_inc  = committed.RdInc(X_t)              # → CommittedPoly("RdInc", [X_t])

    # Build the integrand tree
    integrand = Mul(
        eq(r3_cycle, X_t),                      # verifier-computable
        add(
            Mul(rd_wa, add(rd_inc, reg_val)),
            Mul(gamma, Mul(rs1_ra, reg_val)),
            Mul(Pow(gamma, 2), Mul(rs2_ra, reg_val)),
        ),
    )

    # RHS -- what the sum equals
    rhs = add(
        virtual.RdWriteValue(r3_cycle),
        Mul(gamma, virtual.Rs1Value(r3_cycle)),
        Mul(Pow(gamma, 2), virtual.Rs2Value(r3_cycle)),
    )

    return SumcheckSpec(
        name="RegistersReadWriteChecking",
        sum_vars=[X_k, X_t],
        integrand=integrand,
        input_claim=rhs,
        opening_point=[Opening(4, DIM_K_REG), Opening(4, DIM_CYCLE)],
        rounds="log2(K_reg) + log2(T)",
        degree=3,   # eq(1) · RdWa(1) · Val(1) = 3
        openings=[
            ("RegistersVal", [Opening(4, DIM_K_REG), Opening(4, DIM_CYCLE)]),
            ("Rs1Ra",        [Opening(4, DIM_K_REG), Opening(4, DIM_CYCLE)]),
            ("Rs2Ra",        [Opening(4, DIM_K_REG), Opening(4, DIM_CYCLE)]),
            ("RdWa",         [Opening(4, DIM_K_REG), Opening(4, DIM_CYCLE)]),
            ("RdInc",        [Opening(4, DIM_CYCLE)]),
        ],
    )

Vars and Openings

Every polynomial in the tree is evaluated at some arguments. There are exactly two kinds:

sumcheck/defs.py:64

@dataclass(frozen=True)
class Var:
    """A free variable being summed over."""
    name: str       # "X_t", "X_k"
    dim: DimDef     # which domain dimension it ranges over

@dataclass(frozen=True)
class Opening:
    """A fixed point from a prior-stage sumcheck."""
    stage: int      # which stage produced it (1-7)
    dim: DimDef     # which domain dimension
    label: str = "" # override print label; defaults to dim.label

Arg = Union[Var, Opening]

A Var is what the sumcheck sums over -- it's a free variable ranging over a hypercube $\{0,1\}^n$. After the protocol completes, the verifier's random challenges bind it to a concrete point, producing an Opening for the next stage. That's the whole pipeline: today's Var becomes tomorrow's Opening.

In the Stage 4 example above, X_k and X_t are the free variables being summed out. After the sumcheck, the verifier has random points $\opening{4}{K_reg}$ and $\opening{4}{cycle}$ — these become Opening(4, DIM_K_REG) and Opening(4, DIM_CYCLE) that later stages reference. Meanwhile, r3_cycle = Opening(3, DIM_CYCLE) is a fixed point received from Stage 3 -- it appears in the eq(r3_cycle, X_t) selector and in the RHS evaluations.

Both DIM_K_REG and DIM_CYCLE are DimDef constants. A DimDef names one factor of a polynomial's hypercube domain — a polynomial over $\{0,1\}^{\log_2 T} \times {0,1}^{\log_2 K_{\text{reg}}}$ has two DimDefs, one for each factor:

sumcheck/defs.py:29

@dataclass(frozen=True)
class DimDef:
    """One dimension of a polynomial's hypercube domain."""
    size: str          # symbolic parameter: "T", "K_ram", "N_instr"
    label: str         # short label for openings: "cycle", "K_ram", "addr"
    description: str = ""  # human-readable: "RAM address", "one-hot chunk"

The standard dimensions used across Jolt:

sumcheck/defs.py:48

DIM_CYCLE   = DimDef("T",       "cycle",   "cycle/timestep")
DIM_K_REG   = DimDef("K_reg",   "K_reg",   "register index")
DIM_K_RAM   = DimDef("K_ram",   "K_ram",   "RAM address")
DIM_ADDR    = DimDef("N_instr", "addr",    "one-hot chunk")
DIM_K_INSTR = DimDef("K_instr", "K_instr", "instruction address")
DIM_K_BC    = DimDef("K_bc",    "K_bc",    "bytecode address")
DIM_N_V     = DimDef("N_v",     "N_v",     "virtual chunk")
DIM_GROUP   = DimDef("2",       "group",   "constraint group")

The size field is a symbolic parameter name ("T", "K_reg") -- Var uses it to compute log_size (e.g. "log2(T)"), which determines how many rounds that variable contributes to the sumcheck. The label field is what shows up in opening notation: Opening(3, DIM_CYCLE) renders as $\opening{3}{cycle}$.

This is why an opening point is a list of Openings -- one per dimension of the polynomial's domain. A polynomial like RegistersVal lives on $\{0,1\}^{\log_2 K_{\text{reg}}} \times {0,1}^{\log_2 T}$, so its full opening is two points:

opening_point=[Opening(4, DIM_K_REG), Opening(4, DIM_CYCLE)]

Both happen to come from Stage 4 here, but that's not always the case. A polynomial's dimensions can be pinned by different stages. For example, RamRa lives on $\{0,1\}^{\log_2 K_{\text{ram}}} \times {0,1}^{\log_2 T}$ and its opening might be:

[Opening(2, DIM_K_RAM), Opening(4, DIM_CYCLE)]
#  ↑ address part from Stage 2    ↑ cycle part from Stage 4

The address dimension was summed out in Stage 2, the cycle dimension in Stage 4 -- so each component carries the stage that produced it.

Formatting the tree

We have an AST. Now we need to print it. The key idea: each node type gets its own format method. Define a Format base class with one abstract method per node type, then subclass it for each output format (plain text, LaTeX, HTML, ...).

Start with the leaves — arguments and polynomial evaluations:

sumcheck/format.py:36

class Format(ABC):
    # ── Arguments ──
    @abstractmethod
    def fmt_var(self, v: Var) -> str: ...

    @abstractmethod
    def fmt_opening(self, o: Opening) -> str: ...

    # ── Leaves ──
    @abstractmethod
    def fmt_const(self, value: Union[int, float, str]) -> str: ...

    @abstractmethod
    def fmt_committed_poly(self, name: str, args: list[str]) -> str: ...

    @abstractmethod
    def fmt_virtual_poly(self, name: str, args: list[str]) -> str: ...

    @abstractmethod
    def fmt_verifier_poly(self, name: str, args: list[str]) -> str: ...

For plain text, the leaf implementations are straightforward — prefix with the kind and list the arguments:

sumcheck/format.py:180

class TextFormat(Format):
    def fmt_var(self, v):
        return v.name                          # "X_t"

    def fmt_opening(self, o):
        return f"r_{o.print_label}^({o.stage})"  # "r_cycle^(3)"

    def fmt_const(self, value):
        return str(value)                      # "γ", "0", "2^64"

    def fmt_committed_poly(self, name, args):
        arg_str = ", ".join(args)
        return f"cp:{name}({arg_str})"         # "cp:RdInc(X_t)"

    def fmt_virtual_poly(self, name, args):
        arg_str = ", ".join(args)
        return f"vp:{name}({arg_str})"         # "vp:Rs1Ra(X_k, X_t)"

    def fmt_verifier_poly(self, name, args):
        arg_str = ", ".join(args)
        return f"{name}:{name}({arg_str})"     # "eq:eq(r_cycle^(3), X_t)"

For LaTeX, the same leaves become coloured and properly typeset:

sumcheck/format.py:323

class LatexFormat(Format):
    def fmt_var(self, v):
        return v.name                          # X_t -- already valid LaTeX

    def fmt_opening(self, o):
        label = o.print_label.replace("_", r"\_")
        return f"r_{{\\text{{{label}}}}}^{{({o.stage})}}"

    def fmt_committed_poly(self, name, args):
        arg_str = ", ".join(args)
        latex_name = _latex_poly_name(name)
        colored = f"\\textcolor{{ForestGreen}}{{{latex_name}}}"
        return f"{colored}({arg_str})"         # green

    def fmt_virtual_poly(self, name, args):
        arg_str = ", ".join(args)
        latex_name = _latex_poly_name(name)
        colored = f"\\textcolor{{BurntOrange}}{{{latex_name}}}"
        return f"{colored}({arg_str})"         # orange

    def fmt_verifier_poly(self, name, args):
        arg_str = ", ".join(args)
        latex_name = _latex_verifier_name(name)
        return f"{latex_name}({arg_str})"      # eq → \widetilde{\text{eq}}

Same tree, different format object, different output. The leaves are the easy part — the interesting bit is how the tree walker handles arithmetic and precedence.

Once the AST was in place, Claudio Bravo built the HTML and LaTeX front-ends with relatively little intervention — a clean demonstration that the per-node format architecture pays off in practice.

How resolution works in AST land

Having a nice AST for each sumcheck immediately raises a question: is the whole pipeline consistent? Specifically, Jolt has two kinds of polynomial:

Committed polynomials (cp:*) are backed by the PCS — the prover sends a commitment, and any claimed evaluation can be verified by the verifier directly.
Virtual polynomials (vp:*) are not committed. The prover claims evaluations, and correctness is only established by a later sumcheck whose RHS consumes that claim.

So a virtual polynomial opening produced by stage $k$ is only valid once a sumcheck in a later stage $k' > k$ has its RHS reduce to that very evaluation. If a virtual claim is never consumed, the protocol is broken.

The AST gives us everything we need to check this mechanically.

Claims as AST leaves

When a sumcheck runs, the protocol binds the free summation variables to a fresh random point. This produces a list of opening claims — one per polynomial that appeared in the integrand. In the AST, these are exactly the leaf nodes evaluated at Var arguments: after the protocol they become Opening nodes for the next stage.

The openings field of each SumcheckSpec records exactly which claims are produced:

openings=[
    ("RegistersVal", [Opening(4, DIM_K_REG), Opening(4, DIM_CYCLE)]),
    ("Rs1Ra",        [Opening(4, DIM_K_REG), Opening(4, DIM_CYCLE)]),
    ("RdInc",        [Opening(4, DIM_CYCLE)]),   # committed — goes to PCS
]

Each entry is a (name, args) pair. The name identifies the polynomial; the args are the opening point (all Opening nodes, since the Vars have been bound). The kind (cp vs vp) is inferred by scanning the integrand for a matching polynomial leaf.

The resolution loop

The resolver walks all 21 sumchecks in stage order, maintaining a dict of outstanding claims — those produced but not yet consumed:

outstanding: dict[str, dict] = {}  # claim_key → {node_id, kind, name}

The claim key is "PolyName@(r_label^(stage), ...)" — the polynomial name plus its opening point, stringified. For each sumcheck:

Consume: walk the RHS (input_claim) for polynomial leaves evaluated at Opening args. For each one, look up its claim key in outstanding. If found, mark it resolved and delete it. If not found (it's a public input or constant), note it.
Produce: iterate spec.openings. For each, compute the claim key and insert into outstanding.

Committed openings that reach the end of the pipeline without being consumed are fine — they go to the PCS for direct verification. Unclaimed virtual openings are a bug.

Parametric matching

Many polynomials are parametric — InstructionRa(i) is a family, one poly per chunk index $i$. The integrand references a single template like vp("InstructionRa(i)", X_k, X_t), while the openings field might produce $N_{ra}$ individual claims InstructionRa(0), InstructionRa(1), ... When the RHS of a later sumcheck references InstructionRa(i), the resolver matches it against all outstanding claims whose base name starts with InstructionRa( at the same opening point:

matches = [
    okey for okey in outstanding
    if okey.endswith(point_suffix)
    and okey[:-len(point_suffix)].startswith(base + "(")
]

This bulk-resolves the entire family in one go.

The claim-flow DAG

Running python3 -m sumcheck resolve prints the full resolution trace on the terminal — which sumcheck consumed which claim, and a final verdict on unresolved virtuals.

The HTML site (python3 -m sumcheck html) also generates an interactive resolve.html page. It renders the same data as a directed acyclic graph using Cytoscape.js with a dagre layout: nodes are individual sumchecks coloured by stage, edges are claim flows (orange for virtual, green for committed → PCS). Clicking a node shows what it consumes and produces; clicking an edge lists the polynomials flowing along it.

For Jolt's 21 sumchecks, the graph has 22 nodes (21 sumchecks + 1 PCS sink) and 37 grouped edges. Every virtual claim resolves — the pipeline is consistent.