Advent Of Jolt (Day 2): The Committed Polynomials
Quick Recap
In the last post we took a rust
program, and transformed it into a RISC-V Binary. Using this binary file
as input, we generate two intermediate representations : 1. The Bytecode
object 2. The Jolt Execution trace. The byte code is just a vetor of
Instruction structs,
where Instruction is a data structure we defined to
represent one assembly instruction1. The exectution trace is
a vector of Cycle structs
where Cycle is a jolt specific data structure which
contains an assembly instruction, and a description of the machine state
before and after the execution of the instruction.
In this post, we go from a jolt-trace (and some pre-processing) to polynomials, and commitments.
For brevity, and mostly my sanity, I will skip all the commitment specific details of Jolt. Perhaps, later I can convice Markos to write a guest post, outlining his Dory Tales. For now, we will gingerly step around commitment code – and just assume any Rust vector of finite field elements can be committed to.
From Trace To Polynomials
In this post we describe how we go from Jolt data structures which describe assembly instructions and machine state, to polynomials over 256 bit fields that describe the same. That is we focus on this phase of the architecture.
** TODO: fix this **Our object of study is still this program that practically nothing.
#[jolt::provable(memory_size = 10240, max_trace_length = 65536)]
fn fib(n: u32) -> u128 {
let a: u128 = 0;
let mut b: u128 = 1;
start_cycle_tracking("fib_loop"); // Use `start_cycle_tracking("{name}")` to start a cycle span
b = a + b;
end_cycle_tracking("fib_loop"); // Use `end_cycle_tracking("{name}")` to end a cycle span
b
}
Our inputs will be
- Dynamic Trace
- Program input
nwhich we do not use (so you can ignore this for most of this post). - Some pre-processing, which we detail below.
Throughout this series, and most Jolt papers, when I say polynomials, what I really mean is vectors. The polynomial in question will be the multi-linear extension of said vector, which representst the evaluation of said polynomial over the boolean hypercube.
Pre-Processing
As mentioned earlier, I will dance around commitment specific details here, and only focus on items I need to later explain the sum-check proving for later in the series. In my opinion, the commitment bit (although an essential non-trivial component of the proof system) is only interesting if one wants to understand algorithmic decisions like “why did we introduce the parameter \(d\)” and “what should \(d\) be?”. Here, we are just trying to understand how the jolt prover works, and the commitment scheme is a little bit of a distraction. Note: you do not have to know what \(d\) means. For now it’s just some small integer like 2 or 16.
Our first object of study begins here on Line 261 of
jolt/jolt-core/benches/e2e_profiling.rs
let preprocessing = JoltRV64IMAC::prover_preprocess(
bytecode.clone(), // We have already described this
program_io.memory_layout.clone(), // See below
init_memory_state, //`vec<(u64, u8)>` of address value pairs.
trace.len().next_power_of_two(), // This will be 64 for us given we had 63 virtual instructions.
);
Memory layout looks like the following with most of the keys being self explanatory
pub struct MemoryLayout {
/// The total size of the elf's sections, including the .text, .data, .rodata, and .bss sections.
pub program_size: u64,
pub max_trusted_advice_size: u64,
pub trusted_advice_start: u64,
pub trusted_advice_end: u64,
pub max_untrusted_advice_size: u64,
pub untrusted_advice_start: u64,
pub untrusted_advice_end: u64,
pub max_input_size: u64,
pub max_output_size: u64,
pub input_start: u64,
pub input_end: u64,
pub output_start: u64,
pub output_end: u64,
pub stack_size: u64,
/// Stack starts from (RAM_START_ADDRESS + `program_size` + `stack_size`) and grows in descending addresses by `stack_size` bytes.
pub stack_end: u64,
pub memory_size: u64,
/// Heap starts just after the start of the stack and is `memory_size` bytes.
pub memory_end: u64,
pub panic: u64,
pub termination: u64,
/// End of the memory region containing inputs, outputs, the panic bit,
/// and the termination bit
pub io_end: u64,
}
and the initial memory state is given by
vec<(u64, u8)> of address value pairs. Diving deeper
into pre-processing gets us to :
Line 240 jolt/jolt-core/src/zkvm/mod.rs
#[tracing::instrument(skip_all, name = "Jolt::prover_preprocess")]
fn prover_preprocess(
bytecode: Vec<Instruction>,
memory_layout: MemoryLayout,
memory_init: Vec<(u64, u8)>,
max_trace_length: usize,
) -> JoltProverPreprocessing<F, PCS> {
let shared = Self::shared_preprocess(bytecode, memory_layout, memory_init); // <- we care about this only
let max_T: usize = max_trace_length.next_power_of_two();
// Ignored for this post
let generators = PCS::setup_prover(DTH_ROOT_OF_K.log_2() + max_T.log_2());
JoltProverPreprocessing { generators, shared }
}
We only focus on the shared portion of the pre-processing, so
fn shared_preprocess(
bytecode: Vec<Instruction>,
memory_layout: MemoryLayout,
memory_init: Vec<(u64, u8)>,
) -> JoltSharedPreprocessing {
let bytecode_preprocessing = BytecodePreprocessing::preprocess(bytecode); // LINE 1
let ram_preprocessing = RAMPreprocessing::preprocess(memory_init); // LINE 2
JoltSharedPreprocessing {
memory_layout,
bytecode: bytecode_preprocessing,
ram: ram_preprocessing,
}
}
We already know what MemoryLayout is so we focus on the
remaining two outputs.
Byte Code Preprocessing
This pre-processings stage outputs a struct of the following type
pub struct BytecodePreprocessing {
pub code_size: usize,
pub bytecode: Vec<Instruction>,
/// Maps the memory address of each instruction in the bytecode to its "virtual" address.
/// See Section 6.1 of the Jolt paper, "Reflecting the program counter". The virtual address
/// is the one used to keep track of the next (potentially virtual) instruction to execute.
pub pc_map: BytecodePCMapper,
pub d: usize,
}
Roughly it does the following:
- Computes
Bytecode_ra_dparameter used for chunking one hot encodings (we will get to this). For our program, as it is so small it is we have \(d=1\). - Constructs the
pc_mapobject which is ofBytecodePcMapperwhich looks like this for our program. - Computes how much memory byte code needs.
The interesting object in this pre-processing phase is the following object.
pub struct BytecodePCMapper {
/// Stores the mapping of the PC at the beginning of each inline sequence
/// and the maximum number of the inline sequence
/// Indexed by the address of instruction unmapped divided by 2
indices: Vec<Option<(usize, u16)>>,
}
How to Interpret Each Position of indices
Each index position i represents a specific memory
address in your program:
address = (i - 1) * 2 + 0x80000000
This is the reverse of get_index() which we will explain
shortly.
Examples:
rustindices[0] → Reserved for NOOP
indices[1] → address (1-1)*2 + 0x80000000 = 0x80000000
indices[2] → address (2-1)*2 + 0x80000000 = 0x80000002
indices[3] → address (3-1)*2 + 0x80000000 = 0x80000004
indices[34] → address (34-1)*2 + 0x80000000 = 0x80000042
indices[76] → address (76-1)*2 + 0x80000000 = 0x80000096
What the value means:
indices[3] = Some((2, 0)) → “At address 0x80000004,
bytecode position 2, no virtual sequence”
indices[34] = Some((36, 7)) → “At address 0x80000042,
bytecode position 36, 8-step virtual sequence”
indices[2] = None → “No instruction at address
0x80000002”
So the array is a direct map: Index → Address → Bytecode Position
In more detail, the input to the creating of this map is the bytecode which is just
a Vec<Instruction> object representing the entire
program’s bytecode in ascending order by memory address:
// Line 1
AUIPC { address: 2147483648, virtual_sequence_remaining: None, ... }
// Line 2
ADDI { address: 2147483652, virtual_sequence_remaining: None, ... }
// Line 3
AUIPC { address: 2147483656, virtual_sequence_remaining: None, ... }
// ...
// Lines 36-43: Virtual sequence at same address
ADDI { address: 2147483714, virtual_sequence_remaining: Some(7), ... }
ANDI { address: 2147483714, virtual_sequence_remaining: Some(6), ... }
LD { address: 2147483714, virtual_sequence_remaining: Some(5), ... }
// ... (5 more at same address)
The output is a BytecodePCMapper containing a sparse array that maps instruction addresses to bytecode positions:
BytecodePCMapper {
indices: Vec<Option<(usize, u16)>>
// ^^^^^^^^^^^^^^^^^^^^^^^^
// (bytecode_position, virtual_sequence_remaining)
}
where The Address-to-Index Formula is given by
pub const fn get_index(address: usize) -> usize {
(address - RAM_START_ADDRESS) / ALIGNMENT_FACTOR_BYTECODE + 1
}
where
RAM_START_ADDRESS = 0x80000000 (2147483648)
ALIGNMENT_FACTOR_BYTECODE = 2 (2-byte alignment)
The formula in more detail: Subtract
RAM_START_ADDRESS to get offset from program start Divide
by 2 because instructions are on even addresses Add 1 because index 0 is
reserved for NOOP
The length of this array is given by taking the last instruction in the bytecode array, and pulling its physucal address and computing the index, which gives us the size the array (as bytecode stores instructions in ascending order of memory).
Worked Out Examples
Example 1: Simple Instruction
The second Instruction in Bytecode
rustADDI {
address: 2147483652, // 0x80000004
virtual_sequence_remaining: None,
...
}
Calculation:
address = 2147483652
offset = 2147483652 - 2147483648 = 4
index = 4 / 2 + 1 = 3
Result:
indices[3] = Some((2, 0))
// | |
// | └─ No virtual sequence (0)
// └───── Bytecode position 2
Interpretation: The instruction at address 0x80000004 is located at bytecode position 2, with no virtual expansion.
Example 2: None Entry (Address Gap)
Address 0x80000002 (2147483650): Calculation:
address = 2147483650
offset = 2147483650 - 2147483648 = 2
index = 2 / 2 + 1 = 2
Result:
indices[2] = None
Why None? The array is initialised with
None values. Then as we iterate through the bytecode, for
each instruction address we update the corresponding index of the
mapper. Some indices of the mapper never have instructions. Instructions
appear at 0x80000000, then 0x80000004—the even addresses. The addresses
between them (like 0x80000002) have no code, so their indices remain
None. This creates a sparse array where gaps represent unused memory
addresses.
Example 3: Virtual Instruction Sequence
Bytecode Lines 36-43:
// Line 36
ADDI { address: 2147483714, virtual_sequence_remaining: Some(7), ... }
// Line 37
ANDI { address: 2147483714, virtual_sequence_remaining: Some(6), ... }
// Line 38
LD { address: 2147483714, virtual_sequence_remaining: Some(5), ... }
// Line 39
XORI { address: 2147483714, virtual_sequence_remaining: Some(4), ... }
// Line 40
VirtualMULI { address: 2147483714, virtual_sequence_remaining: Some(3), ... }
// Line 41
VirtualPow2 { address: 2147483714, virtual_sequence_remaining: Some(2), ... }
// Line 42
MUL { address: 2147483714, virtual_sequence_remaining: Some(1), ... }
// Line 43
VirtualSRAI { address: 2147483714, virtual_sequence_remaining: Some(0), ... }
All 8 instructions share address 0x80000042
(2147483714). Calculation:
address = 2147483714
offset = 2147483714 - 2147483648 = 66
index = 66 / 2 + 1 = 34
Result:
indices[34] = Some((36, 7))
// | |
// | └─ 7 more virtual steps follow this one (8 total)
// └───── Bytecode position 36 (first instruction)
Interpretation:
Address 0x80000042 maps to bytecode position 36 This
starts an 8-step virtual sequence To execute this instruction, read
bytecode positions 36-43 consecutively The mapper stores only the first
position, not all 8
And that’s all the bytecode pre-processing we do. Next we get to RAM -pre processing
RAM Pre-processing
let ram_preprocessing = RAMPreprocessing::preprocess(memory_init);
This simply returns a struct of type
pub struct RAMPreprocessing {
pub min_bytecode_address: u64,
pub bytecode_words: Vec<u64>,
}
Input:
memory_init: Vec<(u64, u8)> - List of
(address, byte) pairs representing initial RAM state
Output: RAMPreprocessing with:
min_bytecode_address: Starting address of memory region
bytecode_words: Dense array of 64-bit words
It takes the memory which a sequence of bytes and chunks it into a sequence of words (64 bits).
We are done with pre-processing; Now Polynomials
Now we know exactly what the inputs are and we are ready to generate polynomials. The right entry point for this seeing how polynomials are generated starts here.
let (jolt_proof, program_io, _, _) = JoltRV64IMAC::prove(
&preprocessing, // what we discussed above + crypto commitment mallarkey
elf_contents, // the binary file in bytes
&serialized_input, // This is the postcard thing we called fib with see part a of day 1 post i.e n
&[], // no trusted or untrusted advice for now
&[],
None,
);
Digging in further: the code is divided into the following logical sections:
- Bookkeeping (not very interesting): It does things like count number
of actual Risc-V cycles in the trace, and how many virtual cycles. Pads
the trace appropriately to make things nice powers of 2. Truncate zeros
on program on device outputs. We don’t detail this any further.
- Commitment setup (won’t bother with this)
- Generation of polynomials. Our focus
- Committing of the polynomials (we won’t go here)
- Rest of proving (which we will get to later in the series)
We intialise the prover state with all the data they’d ever need:
- The pre-processing needed to commit to things, which we treat like a magic black box for now.
- The trace which we explained in detail
- the
program_io(which we will explain shortly) - we had no trusted_advice so safe to ignore this
- finally the final memory state after the program has done running.
let state_manager = StateManager::new_prover(
preprocessing, // what we discussed above
trace, // the dynamic trace discussed in detail in part 1
program_io.clone(), // see above
trusted_advice_commitment, // There was no trusted advice in our example
final_memory_state, // The final Vec<(u64)> state of memory
);
// We will spend ALOT of time here
let (proof, debug_info) = JoltDAG::prove(state_manager).ok().unwrap();
```
where the `program_io` is given by the following struct
```rust
pub struct JoltDevice {
pub inputs: Vec<u8>, // the post card bit of our input n
pub trusted_advice: Vec<u8>, // These were []
pub untrusted_advice: Vec<u8>, // These were []
pub outputs: Vec<u8>, // the final answer b
pub panic: bool,
pub memory_layout: MemoryLayout,
}
It’s ok if a lot of this makes no sense of the bat. We’ll clear things slowly. The main body which we focus on is the following:
let _guard = (
AllCommittedPolynomials::initialize(compute_d_parameter(ram_K), bytecode_d), // <--- Focus here
);
let opening_proof_hints = Self::generate_and_commit_polynomials(&mut state_manager)?;
Before we can fill in our polynomials with the right values that
match the trace, we need to allocate space in memory for them. This is
exactly what initialize does. The work is done in
let opening_proof_hints = Self::generate_and_commit_polynomials(&mut state_manager)?;
which generates the following vectors (from which we can get polynomials).
let mut polynomials = vec![
CommittedPolynomial::RdInc,
CommittedPolynomial::RamInc,
CommittedPolynomial::InstructionRa(0),
CommittedPolynomial::InstructionRa(1),
CommittedPolynomial::InstructionRa(2),
CommittedPolynomial::InstructionRa(3),
CommittedPolynomial::InstructionRa(4),
CommittedPolynomial::InstructionRa(5),
CommittedPolynomial::InstructionRa(6),
CommittedPolynomial::InstructionRa(7),
CommittedPolynomial::InstructionRa(8),
CommittedPolynomial::InstructionRa(9),
CommittedPolynomial::InstructionRa(10),
CommittedPolynomial::InstructionRa(11),
CommittedPolynomial::InstructionRa(12),
CommittedPolynomial::InstructionRa(13),
CommittedPolynomial::InstructionRa(14),
CommittedPolynomial::InstructionRa(15),
];
for i in 0..ram_d {
polynomials.push(CommittedPolynomial::RamRa(i));
}
for i in 0..bytecode_d {
polynomials.push(CommittedPolynomial::BytecodeRa(i));
}
So this is the image you should have for what is committed to:
RdIncandRamIncareVec<i128>of size \(T\) where \(T\) is the number of cycles in our dynamic trace rounded to the next power of 2 (so 64 for our case).bytecode_ra,instruction_raandram_raare have the same structure: they consists of \(d_1, 16, d_2\) matrices. Each matrix has \(T\) rows and \(255\) columns. For every row exactly 1 column is to set 1 based on the logic discussed below. \(d_1\) and \(d_2\) are dynamically calculated based on the trace.
Details about how we compute each polynomial
The RdInc Polynomial
For each cycle \(i \in [T]\) if register was written to store the difference between the value before and after write. Simple enough!
batch_ref.rd_inc[i] = post_rd as i128 - pre_rd as i128;
The RamInc Polynomial
Quite simple, for each cycle, if the trace does not touch ram store 0, otherwise, if it writes to RAM at some address, then store the difference in value before and after write.
let ram_inc = match cycle.ram_access() {
tracer::instruction::RAMAccess::Write(write) => {
write.post_value as i128 - write.pre_value as i128
}
_ => 0,
};
batch_ref.ram_inc[i] = ram_inc;
Finally we get to all the one hot encoded polynomials
One Hot Encoded Polynomials
Click on the links below to see detailed worked out examples of how we get to the polynomials using the trace and the pc-mapper. 1. BytecodeRa
// BytecodeRa indices
if let Some((d, log_K_chunk, K_chunk)) = bytecode_constants {
let pc = preprocessing.shared.bytecode.get_pc(cycle);
for j in 0..bytecode_d {
let index = (pc >> (log_K_chunk * (d - 1 - j))) % K_chunk;
batch_ref.bytecode_ra[j][i] = Some(index as u8);
}
}
// InstructionRa indices
let lookup_index = LookupQuery::<XLEN>::to_lookup_index(cycle);
for j in 0..instruction_lookups::D {
let k = (lookup_index >> instruction_ra_shifts[j])
% instruction_lookups::K_CHUNK as u128;
batch_ref.instruction_ra[j][i] = Some(k as u8);
}
// RamRa indices
if let Some(dth_log) = dth_root_log {
let address_opt = remap_address(
cycle.ram_access().address() as u64,
&preprocessing.shared.memory_layout,
);
for j in 0..ram_d {
let index = address_opt.map(|address| {
((address as usize >> (dth_log * (ram_d - 1 - j))) % DTH_ROOT_OF_K)
as u8
});
batch_ref.ram_ra[j][i] = index;
}
}
To be precise, we should say 1 virtual assembly instruction. In jolt some RISC-V instructions such as division are represented by a sequence of virtual instructions. See memory instruction in previous example for a physical example.↩︎