Spartan: Outer Sumcheck
Remember the named constraints from Day 3. For each time step we got a witness vector \(z_t\) with 19 locations. Here’s the mental model you should have. TODO picture
Each of these constraints had a \(A\) part and a \(B\) part, and together we wanted \(A \circ B\) to zero. See this example for a refresher. For every cycle of the trace, we wanted the prover to satisfy all 19 constraints. Now let’s say we split the 19 constraints into two groups \(G_0\) of size 10 and \(G_1\) of size 9.
- Group 0 (b=0): Constraints 1-10 (y ∈ {-4,…,5})
- Az: booleans
- Bz: i128 values
- Group 1 (b=1): Constraints 11-19 (9 real + 1
padding)
- Az: u8 values
- Bz: S160 values
Algebraically, this is simply saying:
\[\widetilde{A}(t, b, y) \otimes \widetilde{B}(t, b, y) = 0 \quad \forall t \in \{0,1\}^{\log T}, \forall b \in \{0,1\}, \forall y \in \{-4, ..., 5\}\]
where \(A(t, b, y)\) and \(B(t, b, y)\) gives you one specific cell from the able above. Now we do the same thing we have done since the beginning of time. We want to make sure the the LHS is equal to the RHS as polynomials. So we sample \(\tau_t || \tau_b || \tau_y) \xleftarrow[]{\$}\mathbb{F}^{\log T+2}\) and check if the following sumcheck holds:
\[\sum_{t \in \{0,1\}^{\log T}} \sum_{b \in \{0,1\}} \sum_{y \in \{-4,...,5\}} \text{eq}(\tau_t, t) \cdot \text{eq}(\tau_b, b)\text{eq}(\tau_y, y) \cdot \widetilde{A}(t, b, y) \widetilde{B}(t, b, y)\]
First Round Polynomial
Now in the first round we have to bind \(y\), which is a little bit different from the rest.
In the first-round the prover needs to return the following univaraite polynomial to the prover.
\[g_1(Y) = \sum_{t \in \{0,1\}^{\log T}} \sum_{b \in \{0,1\}} \text{eq}(\tau_t, t) \cdot \text{eq}(\tau_b, b) \cdot \widetilde{A}(t, b, Y) \cdot \widetilde{B}(t, b, Y)\]
and then bind some \(r \xleftarrow[]{\$}\mathbb{F}\) to \(y\).
Observe that, the degree of \(g\) must be 18. To see why observe that for a fixed \(t, b\), \(\widetilde{A}(t, b, Y)\) is a univariate polynomial with 10 evaluation locations. A univariate polynomial that passes through 10 locations must have degree at least 9. The same can be said about \(\widetilde{B}(t, b, Y)\). Thus, their product is at most degree 18. Now we already have 10 evaluations of \(g_1(Y)\). If the prover was honest these should evaluate to 0 at the locations \(\{-4, -3, \ldots, 3, 4, 5\}\). To full describe \(g\) we need to evaluate \(A[t, b, Y]\) and \(B[t, b, Y]\) at 9 more locations, and the lcoations we choose in the code are simply, points on either side of the base window.
- \(g_1(y) = 0\) for all \(y \in \{-4, ..., 5\}\) (base window) when constraints are satisfied
- We use Lagrange interpolation to extend to arbitrary \(Y \in \mathbb{F}\) (extended domain)
- This function computes \(g_1(Y_j)\) for evaluation points outside the base window
Code Walkthrough
Section 1: Precompute Lagrange Coefficients
let base_left: i64 = -((UNIVARIATE_SKIP_DOMAIN_SIZE as i64 - 1) / 2);
let targets: [i64; UNIVARIATE_SKIP_DEGREE] =
uniskip_targets::<UNIVARIATE_SKIP_DOMAIN_SIZE, UNIVARIATE_SKIP_DEGREE>();
let target_shifts: [i64; UNIVARIATE_SKIP_DEGREE] =
core::array::from_fn(|j| targets[j] - base_left);
let coeffs_per_j: [[i32; UNIVARIATE_SKIP_DOMAIN_SIZE]; UNIVARIATE_SKIP_DEGREE] =
core::array::from_fn(|j| {
LagrangeHelper::shift_coeffs_i32::<UNIVARIATE_SKIP_DOMAIN_SIZE>(target_shifts[j])
});
What’s happening: - base_left = -4:
Leftmost point of base window {-4, -3, …, 5} - targets[j]:
The extended domain evaluation points \(Y_j\) (outside base window) -
target_shifts[j]: Distance from base_left to each target
point - coeffs_per_j[j][i]: Lagrange coefficients \(\mathcal{L}_i(Y_j)\) for interpolating to
point \(Y_j\) from base point \(i\)
Purpose: To compute \(Az(Y_j)\) and \(Bz(Y_j)\) at extended points using: \[\widetilde{A}(t, b, Y_j) = \sum_{i=0}^{9} \text{coeffs\_per\_j}[j][i] \cdot \widetilde{A}(t, b Y_i)\]
Section 2: Split Challenge and Compute Eq Polynomials
let m = tau_low.len() / 2;
let (tau_out, tau_in) = tau_low.split_at(m);
let (E_out, E_in) = rayon::join(
|| EqPolynomial::evals_with_scaling(tau_out, Some(F::MONTGOMERY_R_SQUARE)),
|| EqPolynomial::evals(tau_in),
);
What’s happening: - tau_low has length
\(\log T + 1\) (the binary hypercube
variables) - Split into tau_out (length \(\log T/2\)) and tau_in (length
\(\log T/2 + 1\)) -
E_out[x_out] = \(\text{eq}(\tau_{out}, x_{out})\) for all
\(x_{out} \in \{0,1\}^{\log T/2}\) -
E_in[x_in] = \(\text{eq}(\tau_{in}, x_{in})\) for all
\(x_{in} \in \{0,1\}^{\log T/2 +
1}\)
Note: E_out is scaled by Montgomery R² for optimization in later accumulation.
Purpose: Precompute all eq polynomial evaluations to use in the sum over the binary hypercube.
Section 3: Setup Iteration Parameters
let num_x_out_vals = E_out.len();
let num_x_in_vals = E_in.len();
assert!(
num_x_in_vals >= 2,
"univariate skip expects at least 2 x_in values (last bit is group index)"
);
let num_x_in_half = num_x_in_vals >> 1;
What’s happening: -
num_x_out_vals = 2^(log T/2): Number of x_out values -
num_x_in_vals = 2^(log T/2 + 1): Number of x_in values -
Last bit of x_in is the group selector \(b\) -
num_x_in_half = num_x_in_vals / 2: Number of \((x_{in}')\) values (ignoring group
bit)
Indexing scheme: - Even x_in indices (last bit = 0) → Group 0 - Odd x_in indices (last bit = 1) → Group 1
Section 4: Parallel Processing Setup
let num_parallel_chunks = core::cmp::min(
num_x_out_vals,
rayon::current_num_threads().next_power_of_two() * 8,
);
let x_out_chunk_size = if num_x_out_vals > 0 {
core::cmp::max(1, num_x_out_vals.div_ceil(num_parallel_chunks))
} else {
0
};
let iter_num_x_in_vars = num_x_in_vals.log_2();
let iter_num_x_in_prime_vars = iter_num_x_in_vars - 1;
What’s happening: -
num_parallel_chunks: How many parallel tasks to create (up
to 8× thread count or num_x_out_vals) - x_out_chunk_size:
How many x_out indices each parallel chunk will process -
iter_num_x_in_vars = log₂(num_x_in_vals): Number of bits
for full x_in index (including group bit) -
iter_num_x_in_prime_vars = iter_num_x_in_vars - 1: Number
of bits for x_in’ (excluding group bit)
Purpose: Split the sum over x_out across multiple threads for parallelization.
Example: If num_x_out_vals = 1024 and 8 threads, we might create 64 chunks of size 16 each.
Section 5: Main Parallel Computation Loop
(0..num_parallel_chunks)
.into_par_iter()
.map(|chunk_idx| {
let x_out_start = chunk_idx * x_out_chunk_size;
let x_out_end = core::cmp::min((chunk_idx + 1) * x_out_chunk_size, num_x_out_vals);
let mut acc_field: [F; UNIVARIATE_SKIP_DEGREE] = [F::zero(); UNIVARIATE_SKIP_DEGREE];
What’s happening: - Parallel iteration: each thread
processes one chunk - Each chunk handles x_out values in range
[x_out_start, x_out_end) - acc_field[j]: Accumulator for
\(g_1(Y_j)\) for each extended point
\(Y_j\) - This will accumulate the sum
over all (x_out, x_in’, b) in this chunk
Mathematical notation: We’re computing a partial sum: \[\text{acc\_field}[j] = \sum_{x_{out} \in \text{chunk}} \sum_{x_{in}'} \left[\text{eq}(\tau_{out}, x_{out}) \cdot \left(\text{eq}(\tau_{in}, (x_{in}', 0)) \cdot Az_0(Y_j) \cdot Bz_0(Y_j) + \text{eq}(\tau_{in}, (x_{in}', 1)) \cdot Az_1(Y_j) \cdot Bz_1(Y_j)\right)\right]\]
Outer Loop: Iterate Over x_out Values
for x_out_val in x_out_start..x_out_end {
let mut inner_acc: [Acc8S<F>; UNIVARIATE_SKIP_DEGREE] = [Acc8S::<F>::new(); UNIVARIATE_SKIP_DEGREE];
What’s happening: - Process each x_out value in this
chunk - inner_acc[j]: Bigint accumulator for summing over
all (x_in’, b) at this x_out - Uses 8-limb signed accumulator to avoid
frequent modular reductions
Inner Loop: Iterate Over x_in’ Values
for x_in_prime in 0..num_x_in_half {
let base_step_idx = (x_out_val << iter_num_x_in_prime_vars) | x_in_prime;
let row_inputs = R1CSCycleInputs::from_trace::<F>(preprocess, trace, base_step_idx);
What’s happening: -
base_step_idx = (x_out_val << iter_num_x_in_prime_vars) | x_in_prime:
Constructs the timestep index \(t\) -
Bit layout: [x_out bits | x_in’ bits] - This encodes the position in the
execution trace - row_inputs: The 36 witness fields for
this timestep \(t\)
Key insight: We fetch trace data once per (x_out, x_in’) and use it for both constraint groups!
Processing Group 0 (b=0)
let x_in_even = x_in_prime << 1;
let e_in_even = E_in[x_in_even];
let az0_bool = eval_az_first_group(&row_inputs);
let bz0_i128 = eval_bz_first_group(&row_inputs);
What’s happening: -
x_in_even = x_in_prime << 1: Append bit 0 to get full
x_in index for Group 0 -
e_in_even = eq(tau_in, (x_in', 0)): The eq polynomial
evaluation for this group - az0_bool[i]: Az values for
Group 0 at all 10 base domain points \(y_i \in
\{-4,...,5\}\) - bz0_i128[i]: Bz values for Group 0
at all 10 base domain points
Invariant check:
az0_bool[i] * bz0_i128[i] = 0 for all i (constraints
satisfied in base domain)
Lagrange Interpolation for Group 0
for j in 0..UNIVARIATE_SKIP_DEGREE {
let coeffs = &coeffs_per_j[j];
let mut sum_c_az0_i64: i64 = 0;
let mut sum_c_bz0_s128 = S128::from(0i128);
for i in 0..UNIVARIATE_SKIP_DOMAIN_SIZE {
let c = coeffs[i] as i64;
if az0_bool[i] {
sum_c_az0_i64 += c;
} else {
let term = S128::from_i128(c as i128 * bz0_i128[i]);
sum_c_bz0_s128 += term;
}
}
let sum_az0_s64 = S64::from_i64(sum_c_az0_i64);
let prod_s192 = sum_az0_s64.mul_trunc::<2, 3>(&sum_c_bz0_s128);
inner_acc[j].fmadd(&e_in_even, &prod_s192);
}
What’s happening: For each extended point \(Y_j\):
Compute \(Az_0(Y_j)\) via Lagrange: \[Az_0(Y_j) = \sum_{i=0}^{9} c_{j,i} \cdot Az_0(y_i)\] Since \(Az_0(y_i)\) is boolean (0 or 1), this simplifies to: \[Az_0(Y_j) = \sum_{i: Az_0(y_i)=1} c_{j,i}\] This is
sum_c_az0_i64Compute \(Bz_0(Y_j)\) via Lagrange: \[Bz_0(Y_j) = \sum_{i=0}^{9} c_{j,i} \cdot Bz_0(y_i)\] This is
sum_c_bz0_s128Key optimization: Due to the invariant \(Az_0(y_i) \cdot Bz_0(y_i) = 0\):
- If \(Az_0(y_i) = 1\), then \(Bz_0(y_i) = 0\), so we only add \(c_{j,i}\) to Az sum
- If \(Az_0(y_i) = 0\), then we only add \(c_{j,i} \cdot Bz_0(y_i)\) to Bz sum
- The sums have disjoint support!
Compute product: \(Az_0(Y_j) \cdot Bz_0(Y_j)\) as
prod_s192Accumulate:
inner_acc[j] += e_in_even * prod_s192This adds \(\text{eq}(\tau_{in}, (x_{in}', 0)) \cdot Az_0(Y_j) \cdot Bz_0(Y_j)\) to the accumulator
Processing Group 1 (b=1)
let x_in_odd = x_in_even + 1;
let e_in_odd = E_in[x_in_odd];
let az1_bool = eval_az_second_group(&row_inputs);
let bz1 = eval_bz_second_group(&row_inputs);
What’s happening: -
x_in_odd = x_in_even + 1: Append bit 1 to get full x_in
index for Group 1 - e_in_odd = eq(tau_in, (x_in', 1)): The
eq polynomial evaluation for this group - Group 1 constraints (9 real +
1 padding to reach 10)
Padding logic:
let g2_len = core::cmp::min(NUM_REMAINING_R1CS_CONSTRAINTS, UNIVARIATE_SKIP_DOMAIN_SIZE);
let mut az1_bool_padded: [bool; UNIVARIATE_SKIP_DOMAIN_SIZE] = [false; UNIVARIATE_SKIP_DOMAIN_SIZE];
let mut bz1_s160_padded: [S160; UNIVARIATE_SKIP_DOMAIN_SIZE] = [S160::from(0i128); UNIVARIATE_SKIP_DOMAIN_SIZE];
az1_bool_padded[..g2_len].copy_from_slice(&az1_bool[..g2_len]);
bz1_s160_padded[..g2_len].copy_from_slice(&bz1[..g2_len]);
Pads Group 1 to 10 elements with zeros (satisfies constraints trivially).
Lagrange interpolation for Group 1: Same logic as
Group 0, but with S160 bigints for Bz - Result: prod_s256 -
Accumulate: inner_acc[j] += e_in_odd * prod_s256
Fold with E_out and Reduce
let e_out = E_out[x_out_val];
for j in 0..UNIVARIATE_SKIP_DEGREE {
let reduced = inner_acc[j].reduce();
acc_field[j] += e_out * reduced;
}
What’s happening: - After summing over all (x_in’, b), we have computed: \[\text{inner\_acc}[j] = \sum_{x_{in}', b} \text{eq}(\tau_{in}, (x_{in}', b)) \cdot Az_{t,b}(Y_j) \cdot Bz_{t,b}(Y_j)\] where \(t = (x_{out}, x_{in}')\)
Multiply by \(\text{eq}(\tau_{out}, x_{out})\) and add to accumulator: \[\text{acc\_field}[j] \mathrel{+}= \text{eq}(\tau_{out}, x_{out}) \cdot \text{inner\_acc}[j]\]
reduce(): Convert bigint accumulator back to field element (modular reduction)
Parallel Reduction
.reduce(
|| [F::zero(); UNIVARIATE_SKIP_DEGREE],
|mut a, b| {
for j in 0..UNIVARIATE_SKIP_DEGREE {
a[j] += b[j];
}
a
},
)
What’s happening: - Combine results from all
parallel chunks - Sum up acc_field[j] across all chunks to
get final \(g_1(Y_j)\) values
Final result: Returns
[F; UNIVARIATE_SKIP_DEGREE] containing: \[g_1(Y_j) = \sum_{t \in \{0,1\}^{\log T}} \sum_{b
\in \{0,1\}} \text{eq}(\tau_t, t) \cdot \text{eq}(\tau_b, b) \cdot
Az_{t,b}(Y_j) \cdot Bz_{t,b}(Y_j)\]
for each extended evaluation point \(Y_j\).
The Extended Evaluation Points
Constants:
UNIVARIATE_SKIP_DEGREE = (NUM_R1CS_CONSTRAINTS - 1) / 2 = (19 - 1) / 2 = 9
UNIVARIATE_SKIP_DOMAIN_SIZE = UNIVARIATE_SKIP_DEGREE + 1 = 10
UNIVARIATE_SKIP_EXTENDED_DOMAIN_SIZE = 2 * UNIVARIATE_SKIP_DEGREE + 1 = 19
So: - Base window: 10 points from \(\{-4, -3, -2, -1, 0, 1, 2, 3, 4, 5\}\) - Extended domain: 19 points total from \(\{-9, -8, ..., 8, 9\}\) - Extended points (outside base): 9 points = \(\{-9, -8, -7, -6, -5\} \cup \{6, 7, 8, 9\}\)
The function uniskip_targets determines which 9 points
to evaluate at:
Algorithm: Alternately select points just outside
the base window: 1. Start at n = -5 (one left of base) and
p = 6 (one right of base) 2. Alternate: add n,
then p, then decrement n, increment
p 3. Continue until 9 points collected
Result:
targets = [-5, 6, -6, 7, -7, 8, -8, 9, -9]
The points spiral outward from the base window, alternating left and right!
Now let’s say we split the 19 constraints into two groups \(G_0\) of size 10 and \(G_1\) of size 9.
- Group 0 (b=0): Constraints 1-10 (y ∈ {-4,…,5})
- Az: booleans
- Bz: i128 values
- Group 1 (b=1): Constraints 11-19 (9 real + 1
padding)
- Az: u8 values
- Bz: S160 values
Algebraically, this is simply saying:
\[\widetilde{A}(t, b, y) \otimes \widetilde{B}(t, b, y) = 0 \quad \forall t \in \{0,1\}^{\log T}, \forall b \in \{0,1\}, \forall y \in \{-4, ..., 5\}\]
where \(A(t, b, y)\) and \(B(t, b, y)\) gives you one specific cell from the able above. Now we do the same thing we have done since the beginning of time. We want to make sure the the LHS is equal to the RHS as polynomials. So we sample \(\tau_t || \tau_b || \tau_y) \xleftarrow[]{\$}\mathbb{F}^{\log T+2}\) and check if the following sumcheck holds:
\[\sum_{t \in \{0,1\}^{\log T}} \sum_{b \in \{0,1\}} \sum_{y \in \{-4,...,5\}} \text{eq}(\tau_t, t) \cdot \text{eq}(\tau_b, b)\text{eq}(\tau_y, y) \cdot \widetilde{A}(t, b, y) \widetilde{B}(t, b, y)\]