Advent Of Jolt (Day 1): The Dynamic Jolt Trace

jolt

2025-10-25 By Ari

A Rust Program \(\rightarrow\) Binary \(\rightarrow\) Jolt Trace

The starting point of proving is to describe the program which we claim to have execute correctly.

We take this program written in rust and compile it a binary encoding RISC-V instructions. Then we take this binary and generate two Jolt specific data structures – bytecode and the dynamic trace The final artefacts of this blog are the following:

The file in disassembled RISC-V instruction format; and a annotated assembly which we provide to make sense of what’s going on. Note that the annotated file is not generated by the Jolt codebase (it is listed for pedagogical reasons).
The Jolt Bytecode: which is just a Vec<Instruction> object representing the entire program’s bytecode in ascending order by memory address. We will talk about the type Instruction in greater detail later.
The Dynamic Trace: This is just the step by step execution step. At each timestep/cycle, it stores what instruction to run, and how the machine state changes due to the instruction. We will juxtapose the actual assembly file and this trace, and walk through them later in the post.

TODO: fix this figure

For the sake of simplicity, we pick a program that essentially does nothing - it takes as input a 32 bit integer n, proceeds to ignore it completely and then add to variables set to 0 and 1 respectiely. This ensures that the assembly generated is relatively short, which ensures that the bytecode and trace files are short – allowing us to hand verify each step of the process.

#[jolt::provable(memory_size = 10240, max_trace_length = 65536)]
fn fib(n: u32) -> u128 {
    let a: u128 = 0;
    let mut b: u128 = 1;

    start_cycle_tracking("fib_loop"); // Use `start_cycle_tracking("{name}")` to start a cycle span
    b = a + b;
    end_cycle_tracking("fib_loop"); // Use `end_cycle_tracking("{name}")` to end a cycle span
    b
}

To prove Jolt executed this program correctly, the documentation asks to run the following command into the terminal.

cargo run --release -p jolt-core profile --name fibonacci

We created a branch where we overwrote the original fibonacci program with the program above.

Before diving into how we generated the bytecode and trace, we stare at the trace and actual dissambled assembly and compare the two. We have annotated the files with comments to make it easier to see what is going on.

Sanity Check – Trace Vs Assembly Instructions

Address Conversion Reference

The following table will make it easier to translate between raw assembly and the jolt trace output. All addresses in the trace are DECIMAL, while assembly uses HEX.

Decimal	Hex	Description
2147483648	0x80000000	Boot start
2147483652	0x80000004	Boot + 4
2147483656	0x80000008	Boot + 8
2147483660	0x8000000c	Boot + 12
2147483664	0x80000010	Infinite loop
2147483702	0x80000036	Main function
2147483706	0x8000003a	Main + 4
2147483708	0x8000003c	Main + 8
2147483710	0x8000003e	Main + 12 (loop)
2147483714	0x80000042	Load byte (lb)
2147483788	0x8000008c	Store output
2147483792	0x80000090	Return
2147459072	0x7fffa000	Input buffer
2147467272	0x7fffb000	Output buffer

Formula: Hex Address = Decimal Address Example: 2147483648₁₀ = 0x80000000₁₆

Register Names

The name of registers in the RISC-V assembly, and the jolt trace are also slightly different. Here is the map.

Number	ABI Name	Purpose
x0	zero	Always 0
x1	ra	Return address
x2	sp	Stack pointer
x10	a0	Arg 0 / Return value 0
x11	a1	Arg 1 / Return value 1
x12	a2	Arg 2
x13	a3	Arg 3
x14	a4	Arg 4

Memory Map

This table shows where in memory our lies.

Address	Purpose
0x80000000	Code (.text)
0x7fffa000	Input buffer
0x7fffb000	Output buffer
0x7fffc000	Control/status
0x80001120	Stack top

Boot Sequence (Cycles 0-3)

Cycle 0: Stack Pointer Initialization

Assembly (0x80000000):

00001117    auipc  sp,0x1
# sp = PC + (0x1 << 12) = 0x80000000 + 0x1000 = 0x80001000

Trace:

Cycle 0: AUIPC(
    address: 2147483648,  // = 0x80000000 ✓
    operands: FormatU {
        rd: 2,            // x2 = sp (stack pointer)
        imm: 4096,        // = 0x1000 ✓
    },
    register_state: {
        rd: (0, 2147487744)  // sp: 0 → 0x80001000 ✓
    }
)

Cycle 1: Stack Pointer Adjustment

The PC at 0x80000004 adds 288 to the stack pointer. Assembly (0x80000004):

12010113    addi   sp,sp,288
# sp = 0x80001000 + 288 = 0x80001120

What the Jolt trace struct displays is the instruction, and corresponding change in state to registers. There is no memory access here, so there is nothing to store about the memory. See Cycle 8 for an example of memory access.

Trace:


Cycle 1: ADDI(
    RISCVCycle {
        instruction: ADDI {
            address: 2147483652,  // = 0x80000004 ✓ (Program counter)
            operands: FormatI {
                rd: 2,            // sp (based on mapping above)
                rs1: 2,           // sp
                imm: 288,         // = 0x120 ✓
            },
            virtual_sequence_remaining: None,
            is_first_in_sequence: false,
            is_compressed: false,
        },
        register_state: RegisterStateFormatI {
            rd: (
                2147487744, // new value in rd 
                2147488032, // old value in rd
            ),
            rs1: 2147487744, // value in rs1
        },
        ram_access: (), // no ram access
    },
)

Cycle 2: Return Address Setup

Assembly (0x80000008):

00000097    auipc  ra,0x0
# ra = PC + 0 = 0x80000008

Trace:

Cycle 2: AUIPC(
    address: 2147483656,  // = 0x80000008 ✓
    operands: FormatU {
        rd: 1,            // x1 = ra (return address)
        imm: 0,
    },
    register_state: {
        rd: (0, 2147483656)  // ra: 0 → 0x80000008 ✓
    }
)

Cycle 3: Jump to Main

Assembly (0x8000000c):

02e080e7    jalr   46(ra)
# Jump to: ra + 46 = 0x80000008 + 46 = 0x80000036
# Save return address: ra = PC + 4 = 0x80000010

Trace:

Cycle 3: JALR(
    address: 2147483660,  // = 0x8000000c ✓
    operands: FormatI {
        rd: 1,            // ra
        rs1: 1,           // ra
        imm: 46,          // offset ✓
    },
    register_state: {
        rd: (2147483656, 2147483664)  // ra: 0x80000008 → 0x80000010 ✓
        rs1: 2147483656   // Jump base = 0x80000008
    }
)

Main Function Entry (Cycles 4-6)

Cycle 4: Load Input Buffer Address

Assembly (0x80000036):

7fffa5b7    lui    a1,0x7fffa
# a1 = 0x7fffa000 (input buffer address)

Trace:

Cycle 4: LUI(
    address: 2147483702,  // = 0x80000036 ✓ (MAIN START!)
    operands: FormatU {
        rd: 11,           // x11 = a1
        imm: 2147459072,  // = 0x7fffa000 ✓
    },
    register_state: {
        rd: (0, 2147459072)  // a1: 0 → 0x7fffa000 ✓
    }
)

Cycle 5: Initialize Byte Counter

Assembly (0x8000003a):

557d        li     a0,-1
# a0 = -1 (byte counter, starts at -1)

Trace:

Cycle 5: ADDI(
    address: 2147483706,  // = 0x8000003a ✓
    operands: FormatI {
        rd: 10,           // x10 = a0
        rs1: 0,           // x0 (zero register)
        imm: 18446744073709551615,  // = -1 (as unsigned 64-bit) ✓
    },
    is_compressed: true,  // Note: compressed instruction (16-bit)
    register_state: {
        rd: (0, 18446744073709551615)  // a0: 0 → -1 ✓
        rs1: 0
    }
)

Cycle 6: Set Maximum Bytes

Assembly (0x8000003c):

4611        li     a2,4
# a2 = 4 (maximum bytes to read)

Trace:

Cycle 6: ADDI(
    address: 2147483708,  // = 0x8000003c ✓
    operands: FormatI {
        rd: 12,           // x12 = a2
        rs1: 0,           // x0
        imm: 4,
    },
    is_compressed: true,
    register_state: {
        rd: (0, 4)        // a2: 0 → 4 ✓
        rs1: 0
    }
)

Input Validation Loop (Cycles 7+)

Cycle 7: Check Byte Count

Assembly (0x8000003e):

04c50a63    beq    a0,a2,0x80000092
# if (a0 == a2) goto ERROR at 0x80000092
# Check: have we read too many bytes?

Trace:


Cycle 7: BEQ(
    RISCVCycle {
        instruction: BEQ {
            address: 2147483710,  // = 0x8000003e ✓
            operands: FormatB {
                rs1: 10,          
                rs2: 12,          // a2 = 4
                imm: 84,          // Branch offset
            },
            virtual_sequence_remaining: None,
            is_first_in_sequence: false,
            is_compressed: false,
        },
        register_state: RegisterStateFormatB {
            rs1: 18446744073709551615, // a0 = -1
            rs2: 4, // a2 = 4
        },
        ram_access: (),
    },
)

Branch Target Calculation: - If taken, would jump to: 0x8000003e + 84 = 0x80000092 (error handler) - Since not taken, continues to 0x80000042

Cycles 8-15: Load Byte Instruction (VIRTUAL SEQUENCE)

Assembly (0x80000042):

00058683    lb     a3,0(a1)
# a3 = sign_extend(byte at address a1)
# Read one byte from input buffer

This ONE assembly instruction expands into 8 trace cycles!

Understanding the Virtual Sequence

In Jolt’s execution model, the lb (load byte) instruction is decomposed into simpler operations for proof generation. All 8 cycles have the same address (0x80000042).

Cycle 8: Address Copy (Virtual Step 1/8)

Trace:

Cycle 8: ADDI(
    address: 2147483714,  // = 0x80000042 ✓
    operands: FormatI {
        rd: 32,           // Temp register
        rs1: 11,          // a1 (input pointer)
        imm: 0,
    },
    virtual_sequence_remaining: Some(7),  // 7 more steps after this
    is_first_in_sequence: true,           // Start of virtual sequence
    register_state: {
        rd: (0, 2147459072)  // x32 = 0x7fffa000 (copy of a1)
        rs1: 2147459072      // a1 = 0x7fffa000
    }
)

Purpose: Copy input pointer to temporary register x32

Cycle 9: Address Alignment (Virtual Step 2/8)

Trace:

Cycle 9: ANDI(
    address: 2147483714,  // Same address! ✓
    operands: FormatI {
        rd: 33,
        rs1: 32,
        imm: 18446744073709551608,  // = -8 = 0xFFFFFFFFFFFFFFF8
    },
    virtual_sequence_remaining: Some(6),
    register_state: {
        rd: (0, 2147459072)  // x33 = 0x7fffa000 (8-byte aligned)
        rs1: 2147459072
    }
)

Purpose: Align address to 8-byte boundary (clear lower 3 bits) - Input: 0x7fffa000 - Mask: 0xFFFFFFFFFFFFFFF8 (clear bits 0-2) - Result: 0x7fffa000 (already aligned)

Cycle 10: Load 8-byte Word (Virtual Step 3/8)

Trace:

Cycle 10: LD(
    address: 2147483714,  // Same address! ✓
    operands: FormatLoad {
        rd: 34,
        rs1: 33,
        imm: 0,
    },
    virtual_sequence_remaining: Some(5),
    register_state: {
        rd: (0, 2)        // x34 = 2 (8 bytes loaded from memory)
        rs1: 2147459072
    },
    ram_access: RAMRead {
        address: 2147459072,  // Read from 0x7fffa000
        value: 2,             // Memory contains: 0x02
    }
)

Purpose: Load entire 8-byte word from aligned address - ✅ Memory read occurs here! - Address: 0x7fffa000 (input buffer) - Value: 2 (this is the postcard-encoded input byte)

Cycle 11: Calculate Byte Offset (Virtual Step 4/8)

Trace:

Cycle 11: XORI(
    address: 2147483714,  // Same address! ✓
    operands: FormatI {
        rd: 35,
        rs1: 32,
        imm: 7,
    },
    virtual_sequence_remaining: Some(4),
    register_state: {
        rd: (0, 2147459079)  // x35 = 0x7fffa000 ^ 7
        rs1: 2147459072
    }
)

Purpose: Calculate which byte within the 8-byte word - XOR with 7 reverses byte order for little-endian systems

Cycle 12: Multiply by 8 (Virtual Step 5/8)

Trace:

Cycle 12: VirtualMULI(
    address: 2147483714,  // Same address! ✓
    operands: FormatI {
        rd: 35,
        rs1: 35,
        imm: 8,
    },
    virtual_sequence_remaining: Some(3),
    register_state: {
        rd: (2147459079, 17179672632)  // Multiply by 8 for bit shift
        rs1: 2147459079
    }
)

Purpose: Convert byte offset to bit shift amount - Byte offset → Bit offset (multiply by 8)

Cycle 13: Create Bit Mask (Virtual Step 6/8)

Trace:

Cycle 13: VirtualPow2(
    address: 2147483714,  // Same address! ✓
    operands: FormatI {
        rd: 36,
        rs1: 35,
        imm: 0,
    },
    virtual_sequence_remaining: Some(2),
    register_state: {
        rd: (0, 72057594037927936)  // 2^56 = mask for byte extraction
        rs1: 17179672632
    }
)

Purpose: Create power-of-2 mask for isolating the correct byte

Cycle 14: Extract Byte (Virtual Step 7/8)

Trace:

Cycle 14: MUL(
    address: 2147483714,  // Same address! ✓
    operands: FormatR {
        rd: 13,           // x13 = a3 (destination!)
        rs1: 34,          // Loaded data
        rs2: 36,          // Mask
    },
    virtual_sequence_remaining: Some(1),
    register_state: {
        rd: (0, 144115188075855872)  // Extracted byte (shifted)
        rs1: 2,
        rs2: 72057594037927936
    }
)

Purpose: Multiply to extract and position the specific byte

Cycle 15: Sign Extend (Virtual Step 8/8)

Trace:

Cycle 15: SRAI(
    address: 2147483714,  // Same address! ✓
    operands: FormatI {
        rd: 13,           // a3
        rs1: 13,
        imm: 56,          // Shift right arithmetic by 56 bits
    },
    virtual_sequence_remaining: Some(0),  // Last step!
    register_state: {
        rd: (144115188075855872, 2)  // Final value: 2
        rs1: 144115188075855872
    }
)

Purpose: Sign-extend the byte to 64-bit value - Shift right arithmetic by 56 bits - Final result: a3 = 2

The Codepath:

Here’s a rough guide to how these artefacts were generated. Our starting point was the command :

cargo run --release -p jolt-core profile --name fibonacci

After some command line parsing and instrumentation setup Line 143 in jolt/jolt-core/src/bin/jolt_core.rs calls the bench() function which eventually leads us to

fn fibonacci() -> Vec<(tracing::Span, Box<dyn FnOnce()>)> {
    prove_example("fibonacci-guest", postcard::to_stdvec(&400000u32).unwrap())
}

This prove_example function will be the main object of study in our lecture series. Today, we focus on the following code snippet.

    let mut tasks = Vec::new();
    let mut program = host::Program::new(example_name);
    let (bytecode, init_memory_state, _) = program.decode();

All Program::new does is initialise a Program struct with a guest name, and some defaults.

    Self {
            guest: guest.to_string(),
            func: None,
            memory_size: DEFAULT_MEMORY_SIZE,
            stack_size: DEFAULT_STACK_SIZE,
            max_input_size: DEFAULT_MAX_INPUT_SIZE,
            max_untrusted_advice_size: DEFAULT_MAX_UNTRUSTED_ADVICE_SIZE,
            max_trusted_advice_size: DEFAULT_MAX_TRUSTED_ADVICE_SIZE,
            max_output_size: DEFAULT_MAX_OUTPUT_SIZE,
            std: false,
            elf: None,
    }

The interesting work happens in the decode function

pub fn decode(&mut self) -> (Vec<Instruction>, Vec<(u64, u8)>, u64) {
        self.build(DEFAULT_TARGET_DIR); // Line 1 : THIS IS WHERE WE COMPILE THE GUEST RUST PROGRAM
        let elf = self.elf.as_ref().unwrap();
        let mut elf_file =
            File::open(elf).unwrap_or_else(|_| panic!("could not open elf file: {elf:?}"));
        let mut elf_contents = Vec::new();
        elf_file.read_to_end(&mut elf_contents).unwrap(); // LINE 2
        guest::program::decode(&elf_contents) // LINE 3
}

LINE 1: Calls the following function – which essentially just calls builds and runs the guest program and dumps an elf file in tmp/jolt-guest-targets/sha3-guest-/riscv64imac-unknown-none-elf/release/fibonacci-guest. This file is the RISC V Binary in the figure above (not yet disassembled).

The Emulation - Bytecode

LINE 2: Does data formatting to structure the elf file to make it suitable for reading, and hands the elf file contents over to guest::program::decode

LINE 3 decodes binary, which again is the most interesting function of this section

pub fn decode(elf: &[u8]) -> (Vec<Instruction>, Vec<(u64, u8)>, u64) {
    let (mut instructions, raw_bytes, program_end, xlen) = tracer::decode(elf); // <- Heavy Lifiting Done here
    let program_size = program_end - RAM_START_ADDRESS;
    let allocator = VirtualRegisterAllocator::default();

    // Expand virtual sequences
    instructions = instructions
        .into_iter()
        .flat_map(|instr| instr.inline_sequence(&allocator, xlen))
        .collect();

    (instructions, raw_bytes, program_size)
}

Roughly here is it what it does : Input:

elf: &[u8] - Raw bytes of a RISC-V ELF (Executable and Linkable Format) binary

Output:

Vec<Instruction> - Decoded program instructions. The Instruction struct is built using macros. The basic template is linked above. In Jolt, Instruction is an enum

pub enum Instruction {
            /// No-operation instruction (address)
            NoOp,
            UNIMPL,
            $(
                $instr($instr),
            )*
            /// Inline instruction from external crates
            INLINE(INLINE),
        }

The remaining things that it returns are the following :

Vec<(u64, u8)> - Memory initialization data (address, byte pairs)
u64 - Program end address (highest address used)
Xlen - Architecture width (32-bit or 64-bit)

Additionally, some instructions like division are broken down into many simpler instructions. These instructions are called virtual instructions, and they will read/write to virtual registers.

2. The Emulation - Trace

Now we focus on the following lines of prove_example

let (trace, _, program_io) = program.trace(&serialized_input, &[], &[]);

This gives us trace or what is often denoted with the variable \(z\) in Twist/Shout or the Jolt paper.

The trace function does the following:

Builds the binary if it does not exist.
Once again reads the binary file with the right structure and formatting.
Computes the program size, by decoding the instructions again (NOTE: CODE DUPLICATION here and the Decode function above).
We then initialise Jolts representation of RAM via the MemoryConfig struct.
Finally we invoke another auxiliary tracer function, which proceeds to call the actual tracer.

   #[tracing::instrument(skip_all, name = "Program::trace")]
    pub fn trace(
        &mut self,
        inputs: &[u8],
        untrusted_advice: &[u8],
        trusted_advice: &[u8],
    ) -> (Vec<Cycle>, Memory, JoltDevice) {
        self.build(DEFAULT_TARGET_DIR);
        let elf = self.elf.as_ref().unwrap();
        let mut elf_file =
            File::open(elf).unwrap_or_else(|_| panic!("could not open elf file: {elf:?}"));
        let mut elf_contents = Vec::new();
        elf_file.read_to_end(&mut elf_contents).unwrap();
        let (_, _, program_end, _) = tracer::decode(&elf_contents);
        let program_size = program_end - RAM_START_ADDRESS;

        let memory_config = MemoryConfig {
            memory_size: self.memory_size,
            stack_size: self.stack_size,
            max_input_size: self.max_input_size,
            max_untrusted_advice_size: self.max_untrusted_advice_size,
            max_trusted_advice_size: self.max_trusted_advice_size,
            max_output_size: self.max_output_size,
            program_size: Some(program_size),
        };

        guest::program::trace(
            &elf_contents,
            self.elf.as_ref(),
            inputs,
            untrusted_advice,
            trusted_advice,
            &memory_config,
        )
    }

The guest tracer calls another helper which gives us our trace and final memory state

let (trace, memory, io_device) = tracer::trace(
        elf_contents,
        elf_path,
        inputs,
        untrusted_advice,
        trusted_advice,
        memory_config,
    );
    (trace, memory, io_device)

What’s Next

Great! We have the instance and the witness, but if you’ve ever read any of Justin Thaler’s papers, he bangs on about sum-checks. Sumchecks have polynomials. All we have so far is an array of assembly instructions formatted in a somewhat complicated Jolt specific data structures. On Day 2 we get to polynomials.