Streaming Sumchecks In Jolt

jolt streaming

2025-11-19 By Ari

These notes are based on new and yet unpublished work by Quang Dao and Zachary DeStefano. Optimisations specific to the paper, to speed up sum-check are not discussed in this post. We defer the reader to the paper for a complete description of all the changes.

In this post we give a gentle introduction to the streaming sum-check algorithm, by walking through the current implementation in Jolt. It can be read as an onboarding document for converting the generalising the remaining sum-checks in Jolt.

The sum-check we focus on this post is the Stage 1 Spartan Sum-check More details about why Jolt needs this can be found here.

Review Of Sum-check Notation

We briefly review the linear sum-check to recap notation, and help ease into the discussion about streaming. We remain relatively abstract in this re-cap and provide more explicit details while walking through the code in later sections.

Linear Sum-check

We begin by streaming the trace¹ of a program $P$ that allows us to compute evaluations of a multivariate multilinear polynomial $\widetilde{A} \in \mathbb{F}[X_{m-1}, ..., X_0]$ over the hypercube $\{0,1\}^m$.

The goal is for a prover to give a proof of the claim that: \[\sum_{x_{m-1}, ..., x_0 \in \{0, 1\}^m} \widetilde{A}(x_{m-1}, .., x_0) = c\] It is known that any multi-linear polynomial with $m$ variables can be represented $2^m$ unique evaluations. So when we say we compute $\widetilde{A}(X_{m-1}, ..., X_0)$, we will always mean we store $M$ field elements in memory which are evaluations over the hypercube.

Assuming we have enough memory to store $M := 2^m$ field elements. Then this is the last time we need to stream the trace as we have enough memory to store $\widetilde{A}$. Thus, we have all the information we need to compute prover messages for subsequent sum-check rounds. Specifically, in round $0$, we send to the verifier the univariate polynomial \[t_i(\textcolor{pink}{X_0}) = \sum_{(x_{m-1}...x_{1}) \in \{0,1\}^{m-i-1}}A[x_{m-1}, ..., X_1, \textcolor{pink}{X_0}]\]

which can be efficiently computed from $t_i(0)$ and $t_i(1)$ which we have in memory. Then on receiving challenge $r_0 \xleftarrow{\$} S$ where $S$ is the set of challenges, the prover “binds” $\widetilde{A}$ to compute $\widetilde{A}(X_{m-1}, \ldots, X_1, \textcolor{orange}{r_0})$ . This requires only half the space from the previous round as shown below, as

\[ \widetilde{A}(X_{m-1}, \ldots, X_1, \textcolor{orange}{r_0}) = \textcolor{orange}{r_0}\widetilde{A}(X_{m-1}, \ldots, X_1, \textcolor{green}{1}) + (1 - \textcolor{orange}{r_0})\widetilde{A}(X_{m-1}, \ldots, X_1, \textcolor{green}{0}) \]

This process is repeated for $m$ rounds and in round $i$ the prover sends to the verifier the univariate polynomial by computing $t_i(0)$, and $t_i(1)$ which it always has access to as shown in the figure below.

\[t_i(\textcolor{pink}{X_i}) = \sum_{(x_{m-1}...x_{i+1}) \in \{0,1\}^{m-i-1}}A[x_{m-1}, , \textcolor{pink}{X_i}, \textcolor{orange}{r_{i-1}}, .., \textcolor{orange}{r_0}]\]

Streaming Sum-check

Now imagine a world where $2^m$ units of space is too expensive. We have space with which we can store at most $W:= 2^w$ field elements where $w < m$. We can no longer store all $M$ evaluations of $\widetilde{A}$ over the hypercube, and this we cannot store every detail about $\widetilde{A}$. Now the prover must store what it can to answer a few rounds of sum-check messages, and then later stream the trace again to send the remaining messages.

To explain this with more precision and detail, we introduce the idea of a streaming schedule.

Streaming Schedule

The rough idea is as follows. We divide rounds $0,...,m-1$ into 2 phases - A streaming phase and a linear phase. The streaming phase is further divided into streaming windows. The figure below illustrates the idea with $m=16$,

In the figure above rounds are divided into streaming rounds depicted in yellow and linear rounds depicted in purple. The streaming phase always precedes the linear phase. In the example above, the streaming rounds are further divided into windows of constant size 3, but other window sizes are also admissible.

Note that the linear sum-check discussed above also has a schedule with no streaming rounds at all.

Actual Code

The sumcheck_schedule trait defines the properties any implementation of a schedule must satisfy. The LinearSchedule and HalfSplitSchedule implementations are linked here.

The rough idea is as follows:

If a round $i \in \{0, .. m-1\}$ is the start of a window with width $w$ – we need to stream over the entire trace to generate and store a multi-variate polynomial $t'$ with $w$ variables below. This requires space $W:=2^w$ which we can afford.
Once we manifest $t'$ i.e compute enough evaluations to uniquely represent the polynomial, we can compute all prover messages needed for the next $w$ rounds of sum-check
On receiving a verifier challenge $r$, we shrink the size of $t'$ by a factor of $d+1$ (just as in the case of the linear-sum-check) where $d$ is the degree of $t'$. It’s half if $t'$ is multi-linear.
In the last round of the given window the size of $t$ should reduce to 1.
The next round is either the start of a new window or the start the linear phase. If it’s the former, we repeat the process described above. That is, we stream the trace again, and compute $t'$ for the current window.
If the current round $i$ is the start of a linear round, we stream over the trace and store a multi-linear polynomial $\widetilde{A}(X_{m-1}, \ldots, X_i, \textcolor{orange}{r_{i-1}}, \ldots ,\textcolor{orange}{r_{0}})$ which only requires $2^{m-1} < 2^w$ field elements worth of storage.

Next we walk through the actual code in Jolt right now, and describe how it corresponds to the above steps. To be fully precise, we need to recap the details of the Stage 1 Spartan Sumcheck which can be found here.

Code Walk Through

Assume the schedule is the HalfSplitSchedule described in the figure above. The first step is to compute $t'$

Computing $t'(X_{w-1}, ..., X_0)$

In the example above the maximum window width is $w=3$, and round 0 is the start of a window. The prover computes the multi-variate polynomial
\[\begin{align} t'(\textcolor{pink}{X_2, X_1, X_0}) = \sum_{x_{15}..x_{3} \in {0,1}^{13}} &A(x_{15}...x_3, , \textcolor{pink}{X_2, X_1, X_0}) \\ &\cdot B(x_{15}...x_3, ,\textcolor{pink}{X_2, X_1, X_0}) \\[10pt] &\cdot\text{eq}(w_{15}..w_3, x_{15}...x_3) \end{align}\] Which is a degree $d=2$ multi-variate polynomial. Thus, we need $(d+1)^w = 3^w$ evaluations to represent $t$. In Jolt we evaluate $t$ at $\{0,1, \infty\}^{w}$ and the data structure that stores is this evaluations is called t_grid and this function computes the above evaluations.

This computation requires having access to $A(x_{15}...x_0)$ and $B(x_{15}...x_0)$ for all $x_{15}...x_0$, which can be computed from a single pass over the trace.

Note: the important distinction is that we now only accumulate over the trace, and explicitly only store 9 field elements as opposed to $2^{16}$ field elements in the case of the linear sum-check.

The next stage is to compute $t''(0)$ and $t''(\infty)$ where \[ t''(\textcolor{pink}{X_0}) = \sum_{x_2, x_1 \in \{0,1\}^2 } t'(x_2, x_1, \textcolor{pink}{X_0})\,\,\text{eq}(w_2w_1, x_2x_1) \] The above snippet of code is computed by this function The univariate polynomial that the verifier receives is given by which is this function computes \[ s(\textcolor{pink}{X_0}) = t''(\textcolor{pink}{X_0})\,\text{eq}(w_0, \textcolor{pink}{X_0}) \]

A Note About Evaluating At Infinity

For a univariate polynomial $p$, $p(\infty)$ is equal to the coefficient associated with the highest degree variable of the polynomial. For example, if $p$ is linear i.e $p(X) = aX + b$, we have $p(\infty) = a$.

This completes the first round message the prover sends.

Binding in the streaming phase

Note as $t'$ is multi-variate, we are sorted for the next $w$ rounds, till we bind all $w$ of its variables. Binding is exactly the same as the linear version but we do it for the multi-quadratic polynomial and is done by the following function

After binding in round 0 we go from $t'(X_2, X_1, X_0)$ to $t'(X_2, X_1, \textcolor{orange}{r_0})$. Then at then of round 1 we are left with $t'(X_2, \textcolor{orange}{r_1, r_0})$ and after round 2 i.e 3 rounds of sum-check we are left with $t( \textcolor{orange}{r_2, r_1, r_0})$. Additionally, we also build a data structure called r_grid which looks like the following

[
    [1]
    [(1-r0), r0]
    [(1-r0)(1-r1), r0(1-r1), (1-r0)r1, r0r1]
    ...
]

The last line of r_grid helps us compute $\text{eq}(\textcolor{orange}{r_{w-1},..., r_0}, x_{w-1},..., x_{0})$ which is useful when we recompute t_grid in the first round of the next window.

Next Window

Now we are still in the streaming phase, but we need to compute and the code that does this re-computation is given here

\[\begin{align} t'(\textcolor{pink}{X_5, X_4, X_3}) = \sum_{x_{15}..x_{6}, x_{2}, x_{1}, x_{0} \in {0,1}^{13}} &A(x_{15}...x_6, \textcolor{pink}{X_5, X_4, X_3}, x_2, x_1, x_0) \\ &\cdot B(x_{15}...x_6 ,\textcolor{pink}{X_5, X_4, X_3}, x_2, x_1, x_0) \\[10pt] &\cdot\text{eq}(w_{15}..w_6, x_{15}...x_6) \, \text{eq}(\textcolor{orange}{r_2, r_1, r_0}, x_{2}, x_{1}, x_{0}) \end{align}\]

First Round Of Linear Phase

We stream through the trace one time to manifest the bounded polynomials.

We now store for each $X_{15}...X_9 \in \{0,1\}^{7}$

\[\begin{align} A(X_{15}...X_9, \textcolor{orange}{r_8, ..., r_0}) &= \sum_{x_8...x_0 \in \{0,1\}^9} A(X_{15}, ..., X_9, x_8...x_0)\,\text{eq}(\textcolor{orange}{r_8..r0}, x_8...x_0) \end{align} \] \[\begin{align} B(X_{15}...X_9, \textcolor{orange}{r_8, ..., r_0}) &= \sum_{x_8...x_0 \in \{0,1\}^9} B(X_{15}, ..., X_9, x_8...x_0)\,\text{eq}(\textcolor{orange}{r_8..r0}, x_8...x_0) \end{align} \]

and for each polynomial we need $2^7$ field elements of storage, which is acceptable under our space limitations. From here on proving and computing is exactly like how it’s supposed to be during the linear phase.

see these notes to recap what a trace looks like Jolt. Though for the purposes of this document you can treat the trace, as something we iterate over to generate a multi-linear polynomial $\widetilde{A}$.↩︎