Advent Of Jolt (Day 2): The Committed Polynomials

jolt

2025-10-26 By Ari

Quick Recap

In the last post we took a rust program, and transformed it into a RISC-V Binary. Using this binary file as input, we generate two intermediate representations : 1. The Bytecode object 2. The Jolt Execution trace. The byte code is just a vetor of Instruction structs, where Instruction is a data structure we defined to represent one assembly instruction¹. The exectution trace is a vector of Cycle structs where Cycle is a jolt specific data structure which contains an assembly instruction, and a description of the machine state before and after the execution of the instruction.

In this post, we go from a jolt-trace (and some pre-processing) to polynomials, and commitments.

For brevity, and mostly my sanity, I will skip all the commitment specific details of Jolt. Perhaps, later I can convice Markos to write a guest post, outlining his Dory Tales. For now, we will gingerly step around commitment code – and just assume any Rust vector of finite field elements can be committed to.

From Trace To Polynomials

In this post we describe how we go from Jolt data structures which describe assembly instructions and machine state, to polynomials over 256 bit fields that describe the same. That is we focus on this phase of the architecture.

** TODO: fix this **

Our object of study is still this program that practically nothing.

#[jolt::provable(memory_size = 10240, max_trace_length = 65536)]
fn fib(n: u32) -> u128 {
    let a: u128 = 0;
    let mut b: u128 = 1;

    start_cycle_tracking("fib_loop"); // Use `start_cycle_tracking("{name}")` to start a cycle span
    b = a + b;
    end_cycle_tracking("fib_loop"); // Use `end_cycle_tracking("{name}")` to end a cycle span
    b
}

Our inputs will be

Dynamic Trace
Program input n which we do not use (so you can ignore this for most of this post).
Some pre-processing, which we detail below.

Throughout this series, and most Jolt papers, when I say polynomials, what I really mean is vectors. The polynomial in question will be the multi-linear extension of said vector, which representst the evaluation of said polynomial over the boolean hypercube.

Pre-Processing

As mentioned earlier, I will dance around commitment specific details here, and only focus on items I need to later explain the sum-check proving for later in the series. In my opinion, the commitment bit (although an essential non-trivial component of the proof system) is only interesting if one wants to understand algorithmic decisions like “why did we introduce the parameter \(d\)” and “what should \(d\) be?”. Here, we are just trying to understand how the jolt prover works, and the commitment scheme is a little bit of a distraction. Note: you do not have to know what \(d\) means. For now it’s just some small integer like 2 or 16.

Our first object of study begins here on Line 261 of jolt/jolt-core/benches/e2e_profiling.rs

let preprocessing = JoltRV64IMAC::prover_preprocess(
        bytecode.clone(), // We have already described this 
        program_io.memory_layout.clone(), // See below
        init_memory_state, //`vec<(u64, u8)>` of address value pairs.
        trace.len().next_power_of_two(), // This will be 64 for us given we had 63 virtual instructions. 
    );

Memory layout looks like the following with most of the keys being self explanatory

pub struct MemoryLayout {
    /// The total size of the elf's sections, including the .text, .data, .rodata, and .bss sections.
    pub program_size: u64,
    pub max_trusted_advice_size: u64,
    pub trusted_advice_start: u64,
    pub trusted_advice_end: u64,
    pub max_untrusted_advice_size: u64,
    pub untrusted_advice_start: u64,
    pub untrusted_advice_end: u64,
    pub max_input_size: u64,
    pub max_output_size: u64,
    pub input_start: u64,
    pub input_end: u64,
    pub output_start: u64,
    pub output_end: u64,
    pub stack_size: u64,
    /// Stack starts from (RAM_START_ADDRESS + `program_size` + `stack_size`) and grows in descending addresses by `stack_size` bytes.
    pub stack_end: u64,
    pub memory_size: u64,
    /// Heap starts just after the start of the stack and is `memory_size` bytes.
    pub memory_end: u64,
    pub panic: u64,
    pub termination: u64,
    /// End of the memory region containing inputs, outputs, the panic bit,
    /// and the termination bit
    pub io_end: u64,
}

and the initial memory state is given by vec<(u64, u8)> of address value pairs. Diving deeper into pre-processing gets us to :

Line 240 jolt/jolt-core/src/zkvm/mod.rs

    #[tracing::instrument(skip_all, name = "Jolt::prover_preprocess")]
    fn prover_preprocess(
        bytecode: Vec<Instruction>,
        memory_layout: MemoryLayout,
        memory_init: Vec<(u64, u8)>,
        max_trace_length: usize,
    ) -> JoltProverPreprocessing<F, PCS> {
        let shared = Self::shared_preprocess(bytecode, memory_layout, memory_init); // <- we care about this only

        let max_T: usize = max_trace_length.next_power_of_two();

        // Ignored for this post
        let generators = PCS::setup_prover(DTH_ROOT_OF_K.log_2() + max_T.log_2());

        JoltProverPreprocessing { generators, shared }
    }

We only focus on the shared portion of the pre-processing, so

fn shared_preprocess(
        bytecode: Vec<Instruction>,
        memory_layout: MemoryLayout,
        memory_init: Vec<(u64, u8)>,
    ) -> JoltSharedPreprocessing {
        let bytecode_preprocessing = BytecodePreprocessing::preprocess(bytecode); // LINE 1 
        let ram_preprocessing = RAMPreprocessing::preprocess(memory_init); // LINE 2

        JoltSharedPreprocessing {
            memory_layout,
            bytecode: bytecode_preprocessing,
            ram: ram_preprocessing,
        }
    }

We already know what MemoryLayout is so we focus on the remaining two outputs.

Byte Code Preprocessing

This pre-processings stage outputs a struct of the following type

pub struct BytecodePreprocessing {
    pub code_size: usize,
    pub bytecode: Vec<Instruction>,
    /// Maps the memory address of each instruction in the bytecode to its "virtual" address.
    /// See Section 6.1 of the Jolt paper, "Reflecting the program counter". The virtual address
    /// is the one used to keep track of the next (potentially virtual) instruction to execute.
    pub pc_map: BytecodePCMapper,
    pub d: usize,
}

Roughly it does the following:

Computes Bytecode_ra_d parameter used for chunking one hot encodings (we will get to this). For our program, as it is so small it is we have \(d=1\).
Constructs the pc_map object which is of BytecodePcMapper which looks like this for our program.
Computes how much memory byte code needs.

The interesting object in this pre-processing phase is the following object.

pub struct BytecodePCMapper {
    /// Stores the mapping of the PC at the beginning of each inline sequence
    /// and the maximum number of the inline sequence
    /// Indexed by the address of instruction unmapped divided by 2
    indices: Vec<Option<(usize, u16)>>,
}

How to Interpret Each Position of indices

Each index position i represents a specific memory address in your program:

address = (i - 1) * 2 + 0x80000000

This is the reverse of get_index() which we will explain shortly.

Examples:

rustindices[0]  → Reserved for NOOP
indices[1]  → address (1-1)*2 + 0x80000000 = 0x80000000
indices[2]  → address (2-1)*2 + 0x80000000 = 0x80000002
indices[3]  → address (3-1)*2 + 0x80000000 = 0x80000004
indices[34] → address (34-1)*2 + 0x80000000 = 0x80000042
indices[76] → address (76-1)*2 + 0x80000000 = 0x80000096

What the value means:

indices[3] = Some((2, 0)) → “At address 0x80000004, bytecode position 2, no virtual sequence” indices[34] = Some((36, 7)) → “At address 0x80000042, bytecode position 36, 8-step virtual sequence” indices[2] = None → “No instruction at address 0x80000002”

So the array is a direct map: Index → Address → Bytecode Position

In more detail, the input to the creating of this map is the bytecode which is just a Vec<Instruction> object representing the entire program’s bytecode in ascending order by memory address:

// Line 1
AUIPC { address: 2147483648, virtual_sequence_remaining: None, ... }
// Line 2  
ADDI { address: 2147483652, virtual_sequence_remaining: None, ... }
// Line 3
AUIPC { address: 2147483656, virtual_sequence_remaining: None, ... }
// ...
// Lines 36-43: Virtual sequence at same address
ADDI { address: 2147483714, virtual_sequence_remaining: Some(7), ... }
ANDI { address: 2147483714, virtual_sequence_remaining: Some(6), ... }
LD { address: 2147483714, virtual_sequence_remaining: Some(5), ... }
// ... (5 more at same address)

The output is a BytecodePCMapper containing a sparse array that maps instruction addresses to bytecode positions:

BytecodePCMapper {
    indices: Vec<Option<(usize, u16)>>
    //        ^^^^^^^^^^^^^^^^^^^^^^^^
    //        (bytecode_position, virtual_sequence_remaining)
}

where The Address-to-Index Formula is given by

pub const fn get_index(address: usize) -> usize {
    (address - RAM_START_ADDRESS) / ALIGNMENT_FACTOR_BYTECODE + 1
}
where 
RAM_START_ADDRESS = 0x80000000 (2147483648)
ALIGNMENT_FACTOR_BYTECODE = 2 (2-byte alignment)

The formula in more detail: Subtract RAM_START_ADDRESS to get offset from program start Divide by 2 because instructions are on even addresses Add 1 because index 0 is reserved for NOOP

The length of this array is given by taking the last instruction in the bytecode array, and pulling its physucal address and computing the index, which gives us the size the array (as bytecode stores instructions in ascending order of memory).

Worked Out Examples

Example 1: Simple Instruction

The second Instruction in Bytecode

rustADDI { 
    address: 2147483652,  // 0x80000004
    virtual_sequence_remaining: None,
    ...
}

Calculation:

address = 2147483652
offset = 2147483652 - 2147483648 = 4
index = 4 / 2 + 1 = 3

Result:

indices[3] = Some((2, 0))
//               |   |
//               |   └─ No virtual sequence (0)
//               └───── Bytecode position 2

Interpretation: The instruction at address 0x80000004 is located at bytecode position 2, with no virtual expansion.

Example 2: None Entry (Address Gap)

Address 0x80000002 (2147483650): Calculation:

address = 2147483650
offset = 2147483650 - 2147483648 = 2
index = 2 / 2 + 1 = 2

Result:

indices[2] = None

Why None? The array is initialised with None values. Then as we iterate through the bytecode, for each instruction address we update the corresponding index of the mapper. Some indices of the mapper never have instructions. Instructions appear at 0x80000000, then 0x80000004—the even addresses. The addresses between them (like 0x80000002) have no code, so their indices remain None. This creates a sparse array where gaps represent unused memory addresses.

Example 3: Virtual Instruction Sequence

Bytecode Lines 36-43:

// Line 36
ADDI { address: 2147483714, virtual_sequence_remaining: Some(7), ... }
// Line 37
ANDI { address: 2147483714, virtual_sequence_remaining: Some(6), ... }
// Line 38
LD { address: 2147483714, virtual_sequence_remaining: Some(5), ... }
// Line 39
XORI { address: 2147483714, virtual_sequence_remaining: Some(4), ... }
// Line 40
VirtualMULI { address: 2147483714, virtual_sequence_remaining: Some(3), ... }
// Line 41
VirtualPow2 { address: 2147483714, virtual_sequence_remaining: Some(2), ... }
// Line 42
MUL { address: 2147483714, virtual_sequence_remaining: Some(1), ... }
// Line 43
VirtualSRAI { address: 2147483714, virtual_sequence_remaining: Some(0), ... }

All 8 instructions share address 0x80000042 (2147483714). Calculation:

address = 2147483714
offset = 2147483714 - 2147483648 = 66
index = 66 / 2 + 1 = 34

Result:

indices[34] = Some((36, 7))
//                 |   |
//                 |   └─ 7 more virtual steps follow this one (8 total)
//                 └───── Bytecode position 36 (first instruction)

Interpretation:

Address 0x80000042 maps to bytecode position 36 This starts an 8-step virtual sequence To execute this instruction, read bytecode positions 36-43 consecutively The mapper stores only the first position, not all 8

And that’s all the bytecode pre-processing we do. Next we get to RAM -pre processing

RAM Pre-processing

 let ram_preprocessing = RAMPreprocessing::preprocess(memory_init);

This simply returns a struct of type

pub struct RAMPreprocessing {
    pub min_bytecode_address: u64,
    pub bytecode_words: Vec<u64>,
}

Input: memory_init: Vec<(u64, u8)> - List of (address, byte) pairs representing initial RAM state Output: RAMPreprocessing with:

min_bytecode_address: Starting address of memory region bytecode_words: Dense array of 64-bit words

which is illustrated with the following picture (note the final lower addresses go at the end)

It takes the memory which a sequence of bytes and chunks it into a sequence of words (64 bits).

We are done with pre-processing; Now Polynomials

Now we know exactly what the inputs are and we are ready to generate polynomials. The right entry point for this seeing how polynomials are generated starts here.

let (jolt_proof, program_io, _, _) = JoltRV64IMAC::prove(
            &preprocessing, // what we discussed above + crypto commitment mallarkey
            elf_contents, // the binary file in bytes
            &serialized_input, // This is the postcard thing we called fib with see part a of day 1 post i.e n 
            &[], // no trusted or untrusted advice for now
            &[],
            None,
        );

Digging in further: the code is divided into the following logical sections:

Bookkeeping (not very interesting): It does things like count number of actual Risc-V cycles in the trace, and how many virtual cycles. Pads the trace appropriately to make things nice powers of 2. Truncate zeros on program on device outputs. We don’t detail this any further.
Commitment setup (won’t bother with this)
Generation of polynomials. Our focus
Committing of the polynomials (we won’t go here)
Rest of proving (which we will get to later in the series)

We intialise the prover state with all the data they’d ever need:

The pre-processing needed to commit to things, which we treat like a magic black box for now.
The trace which we explained in detail
the program_io (which we will explain shortly)
we had no trusted_advice so safe to ignore this
finally the final memory state after the program has done running.

let state_manager = StateManager::new_prover(
                preprocessing, // what we discussed above
                trace, // the dynamic trace discussed in detail in part 1
                program_io.clone(), // see above
                trusted_advice_commitment, // There was no trusted advice in our example
                final_memory_state, // The final Vec<(u64)> state of memory
            );
            // We will spend ALOT of time here
            let (proof, debug_info) = JoltDAG::prove(state_manager).ok().unwrap();
            ```

where the `program_io` is given by the following struct 
```rust
pub struct JoltDevice {
    pub inputs: Vec<u8>, // the post card bit of our input n
    pub trusted_advice: Vec<u8>, // These were []
    pub untrusted_advice: Vec<u8>, // These were []
    pub outputs: Vec<u8>, // the final answer b
    pub panic: bool,
    pub memory_layout: MemoryLayout,
}

It’s ok if a lot of this makes no sense of the bat. We’ll clear things slowly. The main body which we focus on is the following:

    let _guard = (
    AllCommittedPolynomials::initialize(compute_d_parameter(ram_K), bytecode_d), // <--- Focus here
);

let opening_proof_hints = Self::generate_and_commit_polynomials(&mut state_manager)?;

Before we can fill in our polynomials with the right values that match the trace, we need to allocate space in memory for them. This is exactly what initialize does. The work is done in

let opening_proof_hints = Self::generate_and_commit_polynomials(&mut state_manager)?;

which generates the following vectors (from which we can get polynomials).

let mut polynomials = vec![
    CommittedPolynomial::RdInc,
    CommittedPolynomial::RamInc,
    CommittedPolynomial::InstructionRa(0),
    CommittedPolynomial::InstructionRa(1),
    CommittedPolynomial::InstructionRa(2),
    CommittedPolynomial::InstructionRa(3),
    CommittedPolynomial::InstructionRa(4),
    CommittedPolynomial::InstructionRa(5),
    CommittedPolynomial::InstructionRa(6),
    CommittedPolynomial::InstructionRa(7),
    CommittedPolynomial::InstructionRa(8),
    CommittedPolynomial::InstructionRa(9),
    CommittedPolynomial::InstructionRa(10),
    CommittedPolynomial::InstructionRa(11),
    CommittedPolynomial::InstructionRa(12),
    CommittedPolynomial::InstructionRa(13),
    CommittedPolynomial::InstructionRa(14),
    CommittedPolynomial::InstructionRa(15),
];
for i in 0..ram_d {
    polynomials.push(CommittedPolynomial::RamRa(i));
}
for i in 0..bytecode_d {
    polynomials.push(CommittedPolynomial::BytecodeRa(i));
}

So this is the image you should have for what is committed to:

RdInc and RamInc are Vec<i128> of size \(T\) where \(T\) is the number of cycles in our dynamic trace rounded to the next power of 2 (so 64 for our case).
bytecode_ra, instruction_ra and ram_ra are have the same structure: they consists of \(d_1, 16, d_2\) matrices. Each matrix has \(T\) rows and \(255\) columns. For every row exactly 1 column is to set 1 based on the logic discussed below. \(d_1\) and \(d_2\) are dynamically calculated based on the trace.

Details about how we compute each polynomial

The RdInc Polynomial

For each cycle \(i \in [T]\) if register was written to store the difference between the value before and after write. Simple enough!

batch_ref.rd_inc[i] = post_rd as i128 - pre_rd as i128;

The RamInc Polynomial

Quite simple, for each cycle, if the trace does not touch ram store 0, otherwise, if it writes to RAM at some address, then store the difference in value before and after write.

let ram_inc = match cycle.ram_access() {
            tracer::instruction::RAMAccess::Write(write) => {
                write.post_value as i128 - write.pre_value as i128
            }
            _ => 0,
        };

batch_ref.ram_inc[i] = ram_inc;

Finally we get to all the one hot encoded polynomials

One Hot Encoded Polynomials

Click on the links below to see detailed worked out examples of how we get to the polynomials using the trace and the pc-mapper. 1. BytecodeRa

// BytecodeRa indices
if let Some((d, log_K_chunk, K_chunk)) = bytecode_constants {
    let pc = preprocessing.shared.bytecode.get_pc(cycle);

    for j in 0..bytecode_d {
        let index = (pc >> (log_K_chunk * (d - 1 - j))) % K_chunk;
        batch_ref.bytecode_ra[j][i] = Some(index as u8);
    }
}

InstructionRa

// InstructionRa indices
let lookup_index = LookupQuery::<XLEN>::to_lookup_index(cycle);
for j in 0..instruction_lookups::D {
    let k = (lookup_index >> instruction_ra_shifts[j])
        % instruction_lookups::K_CHUNK as u128;
    batch_ref.instruction_ra[j][i] = Some(k as u8);
}

RamRa

// RamRa indices
if let Some(dth_log) = dth_root_log {
    let address_opt = remap_address(
        cycle.ram_access().address() as u64,
        &preprocessing.shared.memory_layout,
    );

    for j in 0..ram_d {
        let index = address_opt.map(|address| {
            ((address as usize >> (dth_log * (ram_d - 1 - j))) % DTH_ROOT_OF_K)
                as u8
        });
        batch_ref.ram_ra[j][i] = index;
    }
}

To be precise, we should say 1 virtual assembly instruction. In jolt some RISC-V instructions such as division are represented by a sequence of virtual instructions. See memory instruction in previous example for a physical example.↩︎