Formally Verifying Jolt Bytecode Expansions

Current status: 15 out of 22 FormatR and FormatI bytecode equivalences are fully proved using a unified template.

The multiply variants MULW, MULH, and MULHSU (3 instructions) follow the same template but have open sorry's in their bridge lemmas. System instructions (ECALL, MRET, CSRRS, CSRRW) also remain and should follow the same template. (we have just not gotten around to these).

The DIV/REM family (8 instructions) use advice from an oracle, which requires a different proof template (coming soon).

Goal

The end goal is to formally verify that the Jolt inline sequences faithfully replace certain RISCV instructions. Faithfully replacing instructions amounts to showing that running the sequence of virtual and real instructions in the Jolt-ISA results in the same RISCV CPU state as running the original instruction directly.

This series has several code snippets with file references and line numbers. These are relative to commit e9301c2 on branch modularisation of jolt-qed.

The Crux Of The Problem

When writing a proof on pen and paper, typically the main technical hurdle is to overcome the complexity of the mathematical proof. Here, the actual mathematical claims are relatively simple (for most cases). For example, the mathematical statement at the heart of showing that Jolt's ADDW is equivalent to the RISCV ADDW is just showing the following:

(Bit Vector Equivalence)

RISCV ADDW truncates each operand to 32 bits, adds, and sign-extends the result:

\operatorname{sext}_{64}\bigl(\texttt{rs1}[31:0] +_{32} \texttt{rs2}[31:0]\bigr)

Jolt decomposes this into two steps.

First, a 64-bit ADD writes the full sum to rd.
Then a virtual sign-extend extracts the lower 32 bits and sign-extends back to 64. The final value in rd:

\operatorname{sext}_{64}\bigl((\texttt{rs1} +_{64} \texttt{rs2})\,[31:0]\bigr)

On paper, the following is a perfectly acceptable argument. These are equal because truncation distributes over addition¹:

(a +_{64} b)[31:0] = a[31:0] +_{32} b[31:0]

That's all we'd write in a real submission to a peer reviewed conference.

Even in Lean, proving the above is rather simple (3 lines).

237theorem extractLsb_add (a b : BitVec 64) :
238    Sail.BitVec.extractLsb (a + b) 31 0 =
239    Sail.BitVec.extractLsb a 31 0 + Sail.BitVec.extractLsb b 31 0 := by
240  simp only [Sail.BitVec.extractLsb, BitVec.extractLsb]
241  apply BitVec.eq_of_toNat_eq
242  simp [BitVec.toNat_add, Nat.add_mod]
243

See Did We Need This Modularisation for a glimpse of how bad things can get in Lean without the right abstractions.

However, the final theorem statement is not that these two expressions compute the same 64 bit value. The statement says, if we step a RISC CPU once with ADDW, and if we step the Jolt CPU twice with ADD and VirtualSignExtend, then the net effect on the RISC-CPU state is identical². To formally state this we first need to define the machine state - memory, register files, program counter etc. For the RISCV side, this automatic transpilation from the SAIL description of the RISCV CPU already does this work for us. Then we need machinery to define instructions and compose them. Then we need machinery to show two states are equal. There are 50 or so instructions - so whatever work we do for creating this machinery, must be somewhat general. In what follows we do exactly this.

In the process we demonstrate how a "trivial" bit-vector lemma balloons into hundreds of lines of Lean code — a maze of monadic plumbing forced by the fully specified RISC-V CPU model. Then we give a strategy to make dealing with monads manageable. With this strategy we close out nearly all FormatR and FormatI bytecode equivalences.

A General Strategy

NOTE: Throughout this series, when we refer to "Sail" code, we mean the Lean 4 code generated by the Galois and Cambridge transpiler from the official RISC-V specification written in the Sail language.