Advent Of Jolt (Day 1): The Dynamic Jolt Trace
A Rust Program \(\rightarrow\) Binary \(\rightarrow\) Jolt Trace
The starting point of proving is to describe the program which we claim to have execute correctly.
We take this program written in rust and compile it a binary encoding RISC-V instructions. Then we take this binary and generate two Jolt specific data structures – bytecode and the dynamic trace The final artefacts of this blog are the following:
The file in disassembled RISC-V instruction format; and a annotated assembly which we provide to make sense of what’s going on. Note that the annotated file is not generated by the Jolt codebase (it is listed for pedagogical reasons).
The Jolt Bytecode: which is just a
Vec<Instruction>object representing the entire program’s bytecode in ascending order by memory address. We will talk about the typeInstructionin greater detail later.The Dynamic Trace: This is just the step by step execution step. At each timestep/cycle, it stores what instruction to run, and how the machine state changes due to the instruction. We will juxtapose the actual assembly file and this trace, and walk through them later in the post.
TODO: fix this figure
For the sake of simplicity, we pick a program that essentially does
nothing - it takes as input a 32 bit integer n, proceeds to
ignore it completely and then add to variables set to 0 and 1
respectiely. This ensures that the assembly generated is relatively
short, which ensures that the bytecode and trace files are short –
allowing us to hand verify each step of the process.
#[jolt::provable(memory_size = 10240, max_trace_length = 65536)]
fn fib(n: u32) -> u128 {
let a: u128 = 0;
let mut b: u128 = 1;
start_cycle_tracking("fib_loop"); // Use `start_cycle_tracking("{name}")` to start a cycle span
b = a + b;
end_cycle_tracking("fib_loop"); // Use `end_cycle_tracking("{name}")` to end a cycle span
b
}
To prove Jolt executed this program correctly, the documentation asks to run the following command into the terminal.
cargo run --release -p jolt-core profile --name fibonacci
We created a branch where we overwrote the original fibonacci program with the program above.
Before diving into how we generated the bytecode and trace, we stare at the trace and actual dissambled assembly and compare the two. We have annotated the files with comments to make it easier to see what is going on.
Sanity Check – Trace Vs Assembly Instructions
Address Conversion Reference
The following table will make it easier to translate between raw assembly and the jolt trace output. All addresses in the trace are DECIMAL, while assembly uses HEX.
| Decimal | Hex | Description |
|---|---|---|
| 2147483648 | 0x80000000 | Boot start |
| 2147483652 | 0x80000004 | Boot + 4 |
| 2147483656 | 0x80000008 | Boot + 8 |
| 2147483660 | 0x8000000c | Boot + 12 |
| 2147483664 | 0x80000010 | Infinite loop |
| 2147483702 | 0x80000036 | Main function |
| 2147483706 | 0x8000003a | Main + 4 |
| 2147483708 | 0x8000003c | Main + 8 |
| 2147483710 | 0x8000003e | Main + 12 (loop) |
| 2147483714 | 0x80000042 | Load byte (lb) |
| 2147483788 | 0x8000008c | Store output |
| 2147483792 | 0x80000090 | Return |
| 2147459072 | 0x7fffa000 | Input buffer |
| 2147467272 | 0x7fffb000 | Output buffer |
Formula: Hex Address = Decimal Address
Example: 2147483648₁₀ = 0x80000000₁₆
Register Names
The name of registers in the RISC-V assembly, and the jolt trace are also slightly different. Here is the map.
| Number | ABI Name | Purpose |
|---|---|---|
| x0 | zero | Always 0 |
| x1 | ra | Return address |
| x2 | sp | Stack pointer |
| x10 | a0 | Arg 0 / Return value 0 |
| x11 | a1 | Arg 1 / Return value 1 |
| x12 | a2 | Arg 2 |
| x13 | a3 | Arg 3 |
| x14 | a4 | Arg 4 |
Memory Map
This table shows where in memory our lies.
| Address | Purpose |
|---|---|
| 0x80000000 | Code (.text) |
| 0x7fffa000 | Input buffer |
| 0x7fffb000 | Output buffer |
| 0x7fffc000 | Control/status |
| 0x80001120 | Stack top |
Boot Sequence (Cycles 0-3)
Cycle 0: Stack Pointer Initialization
Assembly (0x80000000):
00001117 auipc sp,0x1
# sp = PC + (0x1 << 12) = 0x80000000 + 0x1000 = 0x80001000
Trace:
Cycle 0: AUIPC(
address: 2147483648, // = 0x80000000 ✓
operands: FormatU {
rd: 2, // x2 = sp (stack pointer)
imm: 4096, // = 0x1000 ✓
},
register_state: {
rd: (0, 2147487744) // sp: 0 → 0x80001000 ✓
}
)
Cycle 1: Stack Pointer Adjustment
The PC at 0x80000004 adds 288 to the stack
pointer. Assembly (0x80000004):
12010113 addi sp,sp,288
# sp = 0x80001000 + 288 = 0x80001120
What the Jolt trace struct displays is the instruction, and corresponding change in state to registers. There is no memory access here, so there is nothing to store about the memory. See Cycle 8 for an example of memory access.
Trace:
Cycle 1: ADDI(
RISCVCycle {
instruction: ADDI {
address: 2147483652, // = 0x80000004 ✓ (Program counter)
operands: FormatI {
rd: 2, // sp (based on mapping above)
rs1: 2, // sp
imm: 288, // = 0x120 ✓
},
virtual_sequence_remaining: None,
is_first_in_sequence: false,
is_compressed: false,
},
register_state: RegisterStateFormatI {
rd: (
2147487744, // new value in rd
2147488032, // old value in rd
),
rs1: 2147487744, // value in rs1
},
ram_access: (), // no ram access
},
)
Cycle 2: Return Address Setup
Assembly (0x80000008):
00000097 auipc ra,0x0
# ra = PC + 0 = 0x80000008
Trace:
Cycle 2: AUIPC(
address: 2147483656, // = 0x80000008 ✓
operands: FormatU {
rd: 1, // x1 = ra (return address)
imm: 0,
},
register_state: {
rd: (0, 2147483656) // ra: 0 → 0x80000008 ✓
}
)
Cycle 3: Jump to Main
Assembly (0x8000000c):
02e080e7 jalr 46(ra)
# Jump to: ra + 46 = 0x80000008 + 46 = 0x80000036
# Save return address: ra = PC + 4 = 0x80000010
Trace:
Cycle 3: JALR(
address: 2147483660, // = 0x8000000c ✓
operands: FormatI {
rd: 1, // ra
rs1: 1, // ra
imm: 46, // offset ✓
},
register_state: {
rd: (2147483656, 2147483664) // ra: 0x80000008 → 0x80000010 ✓
rs1: 2147483656 // Jump base = 0x80000008
}
)
Main Function Entry (Cycles 4-6)
Cycle 4: Load Input Buffer Address
Assembly (0x80000036):
7fffa5b7 lui a1,0x7fffa
# a1 = 0x7fffa000 (input buffer address)
Trace:
Cycle 4: LUI(
address: 2147483702, // = 0x80000036 ✓ (MAIN START!)
operands: FormatU {
rd: 11, // x11 = a1
imm: 2147459072, // = 0x7fffa000 ✓
},
register_state: {
rd: (0, 2147459072) // a1: 0 → 0x7fffa000 ✓
}
)
Cycle 5: Initialize Byte Counter
Assembly (0x8000003a):
557d li a0,-1
# a0 = -1 (byte counter, starts at -1)
Trace:
Cycle 5: ADDI(
address: 2147483706, // = 0x8000003a ✓
operands: FormatI {
rd: 10, // x10 = a0
rs1: 0, // x0 (zero register)
imm: 18446744073709551615, // = -1 (as unsigned 64-bit) ✓
},
is_compressed: true, // Note: compressed instruction (16-bit)
register_state: {
rd: (0, 18446744073709551615) // a0: 0 → -1 ✓
rs1: 0
}
)
Cycle 6: Set Maximum Bytes
Assembly (0x8000003c):
4611 li a2,4
# a2 = 4 (maximum bytes to read)
Trace:
Cycle 6: ADDI(
address: 2147483708, // = 0x8000003c ✓
operands: FormatI {
rd: 12, // x12 = a2
rs1: 0, // x0
imm: 4,
},
is_compressed: true,
register_state: {
rd: (0, 4) // a2: 0 → 4 ✓
rs1: 0
}
)
Input Validation Loop (Cycles 7+)
Cycle 7: Check Byte Count
Assembly (0x8000003e):
04c50a63 beq a0,a2,0x80000092
# if (a0 == a2) goto ERROR at 0x80000092
# Check: have we read too many bytes?
Trace:
Cycle 7: BEQ(
RISCVCycle {
instruction: BEQ {
address: 2147483710, // = 0x8000003e ✓
operands: FormatB {
rs1: 10,
rs2: 12, // a2 = 4
imm: 84, // Branch offset
},
virtual_sequence_remaining: None,
is_first_in_sequence: false,
is_compressed: false,
},
register_state: RegisterStateFormatB {
rs1: 18446744073709551615, // a0 = -1
rs2: 4, // a2 = 4
},
ram_access: (),
},
)
Branch Target Calculation: - If taken, would jump to: 0x8000003e + 84 = 0x80000092 (error handler) - Since not taken, continues to 0x80000042
Cycles 8-15: Load Byte Instruction (VIRTUAL SEQUENCE)
Assembly (0x80000042):
00058683 lb a3,0(a1)
# a3 = sign_extend(byte at address a1)
# Read one byte from input buffer
This ONE assembly instruction expands into 8 trace cycles!
Understanding the Virtual Sequence
In Jolt’s execution model, the lb (load byte)
instruction is decomposed into simpler operations for
proof generation. All 8 cycles have the same address
(0x80000042).
Cycle 8: Address Copy (Virtual Step 1/8)
Trace:
Cycle 8: ADDI(
address: 2147483714, // = 0x80000042 ✓
operands: FormatI {
rd: 32, // Temp register
rs1: 11, // a1 (input pointer)
imm: 0,
},
virtual_sequence_remaining: Some(7), // 7 more steps after this
is_first_in_sequence: true, // Start of virtual sequence
register_state: {
rd: (0, 2147459072) // x32 = 0x7fffa000 (copy of a1)
rs1: 2147459072 // a1 = 0x7fffa000
}
)
Purpose: Copy input pointer to temporary register x32
Cycle 9: Address Alignment (Virtual Step 2/8)
Trace:
Cycle 9: ANDI(
address: 2147483714, // Same address! ✓
operands: FormatI {
rd: 33,
rs1: 32,
imm: 18446744073709551608, // = -8 = 0xFFFFFFFFFFFFFFF8
},
virtual_sequence_remaining: Some(6),
register_state: {
rd: (0, 2147459072) // x33 = 0x7fffa000 (8-byte aligned)
rs1: 2147459072
}
)
Purpose: Align address to 8-byte boundary (clear lower 3 bits) - Input: 0x7fffa000 - Mask: 0xFFFFFFFFFFFFFFF8 (clear bits 0-2) - Result: 0x7fffa000 (already aligned)
Cycle 10: Load 8-byte Word (Virtual Step 3/8)
Trace:
Cycle 10: LD(
address: 2147483714, // Same address! ✓
operands: FormatLoad {
rd: 34,
rs1: 33,
imm: 0,
},
virtual_sequence_remaining: Some(5),
register_state: {
rd: (0, 2) // x34 = 2 (8 bytes loaded from memory)
rs1: 2147459072
},
ram_access: RAMRead {
address: 2147459072, // Read from 0x7fffa000
value: 2, // Memory contains: 0x02
}
)
Purpose: Load entire 8-byte word from aligned address - ✅ Memory read occurs here! - Address: 0x7fffa000 (input buffer) - Value: 2 (this is the postcard-encoded input byte)
Cycle 11: Calculate Byte Offset (Virtual Step 4/8)
Trace:
Cycle 11: XORI(
address: 2147483714, // Same address! ✓
operands: FormatI {
rd: 35,
rs1: 32,
imm: 7,
},
virtual_sequence_remaining: Some(4),
register_state: {
rd: (0, 2147459079) // x35 = 0x7fffa000 ^ 7
rs1: 2147459072
}
)
Purpose: Calculate which byte within the 8-byte word - XOR with 7 reverses byte order for little-endian systems
Cycle 12: Multiply by 8 (Virtual Step 5/8)
Trace:
Cycle 12: VirtualMULI(
address: 2147483714, // Same address! ✓
operands: FormatI {
rd: 35,
rs1: 35,
imm: 8,
},
virtual_sequence_remaining: Some(3),
register_state: {
rd: (2147459079, 17179672632) // Multiply by 8 for bit shift
rs1: 2147459079
}
)
Purpose: Convert byte offset to bit shift amount - Byte offset → Bit offset (multiply by 8)
Cycle 13: Create Bit Mask (Virtual Step 6/8)
Trace:
Cycle 13: VirtualPow2(
address: 2147483714, // Same address! ✓
operands: FormatI {
rd: 36,
rs1: 35,
imm: 0,
},
virtual_sequence_remaining: Some(2),
register_state: {
rd: (0, 72057594037927936) // 2^56 = mask for byte extraction
rs1: 17179672632
}
)
Purpose: Create power-of-2 mask for isolating the correct byte
Cycle 14: Extract Byte (Virtual Step 7/8)
Trace:
Cycle 14: MUL(
address: 2147483714, // Same address! ✓
operands: FormatR {
rd: 13, // x13 = a3 (destination!)
rs1: 34, // Loaded data
rs2: 36, // Mask
},
virtual_sequence_remaining: Some(1),
register_state: {
rd: (0, 144115188075855872) // Extracted byte (shifted)
rs1: 2,
rs2: 72057594037927936
}
)
Purpose: Multiply to extract and position the specific byte
Cycle 15: Sign Extend (Virtual Step 8/8)
Trace:
Cycle 15: SRAI(
address: 2147483714, // Same address! ✓
operands: FormatI {
rd: 13, // a3
rs1: 13,
imm: 56, // Shift right arithmetic by 56 bits
},
virtual_sequence_remaining: Some(0), // Last step!
register_state: {
rd: (144115188075855872, 2) // Final value: 2
rs1: 144115188075855872
}
)
Purpose: Sign-extend the byte to 64-bit value - Shift right arithmetic by 56 bits - Final result: a3 = 2
The Codepath:
Here’s a rough guide to how these artefacts were generated. Our starting point was the command :
cargo run --release -p jolt-core profile --name fibonacci
After some command line parsing and instrumentation setup Line 143 in
jolt/jolt-core/src/bin/jolt_core.rs calls the
bench() function which eventually leads us to
fn fibonacci() -> Vec<(tracing::Span, Box<dyn FnOnce()>)> {
prove_example("fibonacci-guest", postcard::to_stdvec(&400000u32).unwrap())
}
This prove_example function
will be the main object of study in our lecture series. Today, we focus
on the following code snippet.
let mut tasks = Vec::new();
let mut program = host::Program::new(example_name);
let (bytecode, init_memory_state, _) = program.decode();
All Program::new does is initialise a
Program struct with a guest name, and some defaults.
Self {
guest: guest.to_string(),
func: None,
memory_size: DEFAULT_MEMORY_SIZE,
stack_size: DEFAULT_STACK_SIZE,
max_input_size: DEFAULT_MAX_INPUT_SIZE,
max_untrusted_advice_size: DEFAULT_MAX_UNTRUSTED_ADVICE_SIZE,
max_trusted_advice_size: DEFAULT_MAX_TRUSTED_ADVICE_SIZE,
max_output_size: DEFAULT_MAX_OUTPUT_SIZE,
std: false,
elf: None,
}
The interesting work happens in the decode function
pub fn decode(&mut self) -> (Vec<Instruction>, Vec<(u64, u8)>, u64) {
self.build(DEFAULT_TARGET_DIR); // Line 1 : THIS IS WHERE WE COMPILE THE GUEST RUST PROGRAM
let elf = self.elf.as_ref().unwrap();
let mut elf_file =
File::open(elf).unwrap_or_else(|_| panic!("could not open elf file: {elf:?}"));
let mut elf_contents = Vec::new();
elf_file.read_to_end(&mut elf_contents).unwrap(); // LINE 2
guest::program::decode(&elf_contents) // LINE 3
}
LINE 1: Calls the following
function – which essentially just calls builds and runs the guest
program and dumps an elf file in
tmp/jolt-guest-targets/sha3-guest-/riscv64imac-unknown-none-elf/release/fibonacci-guest.
This file is the RISC V Binary in the figure above (not yet
disassembled).
The Emulation - Bytecode
LINE 2: Does data formatting to structure the elf file
to make it suitable for reading, and hands the elf file contents over to
guest::program::decode
LINE 3 decodes binary, which again is the most
interesting function of this section
pub fn decode(elf: &[u8]) -> (Vec<Instruction>, Vec<(u64, u8)>, u64) {
let (mut instructions, raw_bytes, program_end, xlen) = tracer::decode(elf); // <- Heavy Lifiting Done here
let program_size = program_end - RAM_START_ADDRESS;
let allocator = VirtualRegisterAllocator::default();
// Expand virtual sequences
instructions = instructions
.into_iter()
.flat_map(|instr| instr.inline_sequence(&allocator, xlen))
.collect();
(instructions, raw_bytes, program_size)
}
Roughly here is it what it does : Input:
elf: &[u8]- Raw bytes of a RISC-V ELF (Executable and Linkable Format) binary
Output:
Vec<Instruction>- Decoded program instructions. The Instruction struct is built using macros. The basic template is linked above. In Jolt,Instructionis an enum
pub enum Instruction {
/// No-operation instruction (address)
NoOp,
UNIMPL,
$(
$instr($instr),
)*
/// Inline instruction from external crates
INLINE(INLINE),
}
The remaining things that it returns are the following :
Vec<(u64, u8)>- Memory initialization data (address, byte pairs)u64- Program end address (highest address used)Xlen- Architecture width (32-bit or 64-bit)
Additionally, some instructions like division are broken down into many simpler instructions. These instructions are called virtual instructions, and they will read/write to virtual registers.
2. The Emulation - Trace
Now we focus on the following lines of prove_example
let (trace, _, program_io) = program.trace(&serialized_input, &[], &[]);
This gives us trace or what is often denoted with the variable \(z\) in Twist/Shout or the Jolt paper.
The trace function does the following:
- Builds the binary if it does not exist.
- Once again reads the binary file with the right structure and formatting.
- Computes the program size, by decoding the instructions again (NOTE: CODE DUPLICATION here and the Decode function above).
- We then initialise Jolts representation of RAM via the
MemoryConfigstruct. - Finally we invoke another auxiliary tracer function, which proceeds to call the actual tracer.
#[tracing::instrument(skip_all, name = "Program::trace")]
pub fn trace(
&mut self,
inputs: &[u8],
untrusted_advice: &[u8],
trusted_advice: &[u8],
) -> (Vec<Cycle>, Memory, JoltDevice) {
self.build(DEFAULT_TARGET_DIR);
let elf = self.elf.as_ref().unwrap();
let mut elf_file =
File::open(elf).unwrap_or_else(|_| panic!("could not open elf file: {elf:?}"));
let mut elf_contents = Vec::new();
elf_file.read_to_end(&mut elf_contents).unwrap();
let (_, _, program_end, _) = tracer::decode(&elf_contents);
let program_size = program_end - RAM_START_ADDRESS;
let memory_config = MemoryConfig {
memory_size: self.memory_size,
stack_size: self.stack_size,
max_input_size: self.max_input_size,
max_untrusted_advice_size: self.max_untrusted_advice_size,
max_trusted_advice_size: self.max_trusted_advice_size,
max_output_size: self.max_output_size,
program_size: Some(program_size),
};
guest::program::trace(
&elf_contents,
self.elf.as_ref(),
inputs,
untrusted_advice,
trusted_advice,
&memory_config,
)
}
The guest tracer calls another helper which gives us our
trace and final memory state
let (trace, memory, io_device) = tracer::trace(
elf_contents,
elf_path,
inputs,
untrusted_advice,
trusted_advice,
memory_config,
);
(trace, memory, io_device)
What’s Next
Great! We have the instance and the witness, but if you’ve ever read any of Justin Thaler’s papers, he bangs on about sum-checks. Sumchecks have polynomials. All we have so far is an array of assembly instructions formatted in a somewhat complicated Jolt specific data structures. On Day 2 we get to polynomials.