Understanding the Yellow Paper's EVM Specifications

evm

Intermediate

qbzzt

15 май 2022 г.

19 минута четене

The Yellow Paperopens in a new tab is the formal specification for Ethereum. Except where amended by the EIP process, it contains the exact description of how everything works. It is written as a mathematical paper, which includes terminology programmers may not find familiar. In this paper you learn how to read it, and by extension other related mathematical papers.

Which Yellow Paper?

Like almost everything else in Ethereum, the Yellow Paper evolves over time. To be able to refer to a specific version, I uploaded the current version at writing. The section, page, and equation numbers I use will refer to that version. It is a good idea to have it open in a different window while reading this document.

Why the EVM?

The original yellow paper was written right at the start of Ethereum's development. It describes the original proof-of-work based consensus mechanism that was originally used to secure the network. However, Ethereum switched off proof-of-work and started using proof-of-stake based consensus in September 2022. This tutorial will focus on the parts of the yellow paper defining the Ethereum Virtual Machine. The EVM was unchanged by the transition to proof-of-stake (except for the return value of the DIFFICULTY opcode).

9 Execution model

This section (p. 12-14) includes most of the definition of the EVM.

The term system state includes everything you need to know about the system to run it. In a typical computer, this means the memory, content of registers, etc.

A Turing machineopens in a new tab is a computational model. Essentially, it is a simplified version of a computer, which is proved to have the same ability to run computations that a normal computer can (everything that a computer can calculate a Turing machine can calculate and vice versa). This model makes it easier to prove various theorems about what is and what isn't computable.

The term Turing-completeopens in a new tab means a computer that can run the same calculations as a Turing machine. Turing machines can get into infinite loops, and the EVM cannot because it would run out of gas, so it's only quasi-Turing-complete.

9.1 Basics

This section gives the basics of the EVM and how it compares with other computational models.

A stack machineopens in a new tab is a computer that stores intermediate data not in registers, but in a stackopens in a new tab. This is the preferred architecture for virtual machines because it is easy to implement meaning that bugs, and security vulnerabilities, are a lot less likely. The memory in the stack is divided into 256-bit words. This was chosen because it is convenient for Ethereum's core cryptographic operations such as Keccak-256 hashing and elliptic curve computations. The maximum size of the stack is 1024 items (1024 x 256 bits). When opcodes are executed they are usually getting their parameters from the stack. There are opcodes specifically for reorganizing elements in the stack such as POP (removes item from top of stack), DUP_N (duplicated N'th item in stack), etc.

The EVM also has a volatile space called memory which is used to store data during execution. This memory is organized into 32-byte words. All memory locations are initialized to zero. If you execute this Yulopens in a new tab code to add a word to memory, it will fill 32 bytes of memory by padding the empty space in the word with zeros, i.e. it creates one word - with zeros in locations 0-29, 0x60 to 30, and 0xA7 to 31.

1mstore(0, 0x60A7)

mstore is one of three opcodes the EVM provides for interacting with memory - it loads a word into memory. The other two are mstore8 which loads a single byte into memory, and mload which moves a word from memory to stack.

The EVM also has a separate non-volatile storage model that is maintained as part of the system state - this memory is organized into word arrays (as opposed to word-addressable byte arrays in the stack). This storage is where contracts keep persistent data - a contract can only interact with its own storage. Storage is organized in key-value mappings.

Although it is not mentioned in this section of the Yellow Paper, it is also useful to know there is a fourth type of memory. Calldata is byte-addressable read-only memory used to store the value passed with the data parameter of a transaction. The EVM has specific opcodes for managing calldata. calldatasize returns the size of the data. calldataload loads the data into the stack. calldatacopy copies the data into memory.

The standard Von Neumann architectureopens in a new tab stores code and data in the same memory. The EVM does not follow this standard for security reasons - sharing volatile memory makes it possible to change program code. Instead, code is saved to storage.

There are only two cases in which code is executed from memory:

When a contract creates another contract (using CREATEopens in a new tab or CREATE2opens in a new tab), the code for the contract constructor comes from memory.
During the creation of any contract, the constructor code runs and then returns with the code of the actual contract, also from memory.

The term exceptional execution means an exception that causes the execution of the current contract to halt.

9.2 Fees overview

This section explains how the gas fees are calculated. There are three costs:

Opcode cost

The inherent cost of the specific opcode. To get this value, find the cost group of the opcode in Appendix H (p. 28, under equation (327)), and find the cost group in equation (324). This gives you a cost function, which in most cases uses parameters from Appendix G (p. 27).

For example, the opcode CALLDATACOPYopens in a new tab is a member of group W_copy. The opcode cost for that group is G_verylow+G_copy×⌈μ_s[2]÷32⌉. Looking at Appendix G, we see that both constants are 3, which gives us 3+3×⌈μ_s[2]÷32⌉.

We still need to decipher the expression ⌈μ_s[2]÷32⌉. The outmost part, ⌈ <value> ⌉ is the ceiling function, a function that given a value returns the smallest integer that is still not smaller than the value. For example, ⌈2.5⌉ = ⌈3⌉ = 3. The inner part is μ_s[2]÷32. Looking at section 3 (Conventions) on p. 3, μ is the machine state. The machine state is defined in section 9.4.1 on p. 13. According to that section, one of the machine state parameters is s for the stack. Putting it all together, it seems that μ_s[2] is location #2 in the stack. Looking at the opcodeopens in a new tab, location #2 in the stack is the size of the data in bytes. Looking at the other opcodes in group W_copy, CODECOPYopens in a new tab and RETURNDATACOPYopens in a new tab, they also have a size of data in the same location. So ⌈μ_s[2]÷32⌉ is the number of 32 byte words required to store the data being copied. Putting everything together, the inherent cost of CALLDATACOPYopens in a new tab is 3 gas plus 3 per word of data being copied.

Running cost

The cost of running the code we're calling.

In the case of CREATEopens in a new tab and CREATE2opens in a new tab, the constructor for the new contract.
In the case of CALLopens in a new tab, CALLCODEopens in a new tab, STATICCALLopens in a new tab, or DELEGATECALLopens in a new tab, the contract we call.

Expanding memory cost

The cost of expanding memory (if necessary).

In equation 324, this value is written as C_mem(μ_i')-C_mem(μ_i). Looking at section 9.4.1 again, we see that μ_i is the number of words in memory. So μ_i is the number of words in memory before the opcode and μ_i' is the number of words in memory after the opcode.

The function C_mem is defined in equation 326: C_mem(a) = G_memory × a + ⌊a² ÷ 512⌋. ⌊x⌋ is the floor function, a function that given a value returns the largest integer that is still not larger than the value. For example, ⌊2.5⌋ = ⌊2⌋ = 2. When a < √512, a² < 512, and the result of the floor function is zero. So for the first 22 words (704 bytes), the cost rises linearly with the number of memory words required. Beyond that point ⌊a² ÷ 512⌋ is positive. When the memory required is high enough the gas cost is proportional to the square of the amount of memory.

Note that these factors only influence the inherent gas cost - it does not take into account the fee market or tips to validators that determine how much an end user is required to pay - this is just the raw cost of running a particular operation on the EVM.

9.3 Execution environment

The execution environment is a tuple, I, that includes information that isn't part of the blockchain state or the EVM.

Parameter	Opcode to access the data	Solidity code to access the data
I_a	`ADDRESS`opens in a new tab	`address(this)`
I_o	`ORIGIN`opens in a new tab	`tx.origin`
I_p	`GASPRICE`opens in a new tab	`tx.gasprice`
I_d	`CALLDATALOAD`opens in a new tab, etc.	`msg.data`
I_s	`CALLER`opens in a new tab	`msg.sender`
I_v	`CALLVALUE`opens in a new tab	`msg.value`
I_b	`CODECOPY`opens in a new tab	`address(this).code`
I_H	Block header fields, such as `NUMBER`opens in a new tab and `DIFFICULTY`opens in a new tab	`block.number`, `block.difficulty`, etc.
I_e	Depth of the call stack for calls between contracts (including contract creation)
I_w	Is the EVM allowed to change state, or is it running statically

A few other parameters are necessary to understand the rest of section 9:

Parameter	Defined in section	Meaning
σ	2 (p. 2, equation 1)	The state of the blockchain
g	9.3 (p. 13)	Remaining gas
A	6.1 (p. 8)	Accrued substate (changes scheduled for when the transaction ends)
o	9.3 (p. 13)	Output - the returned result in the case of internal transaction (when one contract calls another) and calls to view functions (when you are just asking for information, so there is no need to wait for a transaction)

9.4 Execution overview

Now that have all the preliminaries, we can finally start working on how the EVM works.

Equations 137-142 give us the initial conditions for running the EVM:

Symbol	Initial value	Meaning
μ_g	g	Gas remaining
μ_pc	0	Program counter, the address of the next instruction to execute
μ_m	(0, 0, ...)	Memory, initialized to all zeros
μ_i	0	Highest memory location used
μ_s	()	The stack, initially empty
μ_o	∅	The output, empty set until and unless we stop either with return data (`RETURN`opens in a new tab or `REVERT`opens in a new tab) or without it (`STOP`opens in a new tab or `SELFDESTRUCT`opens in a new tab).

Equation 143 tells us there are four possible conditions at each point in time during execution, and what to do with them:

Z(σ,μ,A,I). Z represents a function that tests whether an operation creates an invalid state transition (see exceptional halting). If it evaluates to True, the new state is identical to the old one (except gas gets burned) because the changes have not been implemented.
If the opcode being executed is REVERTopens in a new tab, the new state is the same as the old state, some gas is lost.
If the sequence of operations is finished, as signified by a RETURNopens in a new tab), the state is updated to the new state.
If we aren't at one of the end conditions 1-3, continue running.

9.4.1 Machine State

This section explains the machine state in greater detail. It specifies that w is the current opcode. If μ_pc is less than ||I_b||, the length of the code, then that byte (I_b[μ_pc]) is the opcode. Otherwise, the opcode is defined as STOPopens in a new tab.

As this is a stack machineopens in a new tab, we need to keep track of the number of items popped out (δ) and pushed in (α) by each opcode.

9.4.2 Exceptional Halting

This section defines the Z function, which specifies when we have an abnormal termination. This is a Booleanopens in a new tab function, so it uses ∨ for a logical oropens in a new tab and ∧ for a logical andopens in a new tab.

We have an exceptional halt if any of these conditions is true:

μ_g < C(σ,μ,A,I) As we saw in section 9.2, C is the function that specifies the gas cost. There isn't enough gas left to cover the next opcode.
δ_w=∅ If the number of items popped for an opcode is undefined, then the opcode itself is undefined.
|| μ_s || < δ_w Stack underflow, not enough items in the stack for the current opcode.
w = JUMP ∧ μ_s[0]∉D(I_b) The opcode is JUMPopens in a new tab and the address is not a JUMPDESTopens in a new tab. Jumps are only valid when the destination is a JUMPDESTopens in a new tab.
w = JUMPI ∧ μ_s[1]≠0 ∧ μ_s[0] ∉ D(I_b) The opcode is JUMPIopens in a new tab, the condition is true (non zero) so the jump should happen, and the address is not a JUMPDESTopens in a new tab. Jumps are only valid when the destination is a JUMPDESTopens in a new tab.
w = RETURNDATACOPY ∧ μ_s[1]+μ_s[2]>|| μ_o || The opcode is RETURNDATACOPYopens in a new tab. In this opcode stack element μ_s[1] is the offset to read from in the return data buffer, and stack element μ_s[2] is the length of data. This condition occurs when you try to read beyond the end of the return data buffer. Note that there isn't a similar condition for the calldata or for the code itself. When you try to read beyond the end of those buffers you just get zeros.
|| μ_s || - δ_w + α_w > 1024

Stack overflow. If running the opcode will result in a stack of over 1024 items, abort.
¬I_w ∧ W(w,μ) Are we running statically (¬ is negationopens in a new tab and I_w is true when we are allowed to change the blockchain state)? If so, and we're trying a state changing operation, it can't happen.

The function W(w,μ) is defined later in equation 150. W(w,μ) is true if one of these conditions is true:
- w ∈ {CREATE, CREATE2, SSTORE, SELFDESTRUCT} These opcodes change the state, either by creating a new contract, storing a value, or destroying the current contract.
- LOG0≤w ∧ w≤LOG4 If we are called statically we cannot emit log entries. The log opcodes are all in the range between LOG0 (A0)opens in a new tab and LOG4 (A4)opens in a new tab. The number after the log opcode specifies how many topics the log entry contains.
- w=CALL ∧ μ_s[2]≠0 You can call another contract when you're static, but if you do you cannot transfer ETH to it.
w = SSTORE ∧ μ_g ≤ G_callstipend You cannot run SSTOREopens in a new tab unless you have more than G_callstipend (defined as 2300 in Appendix G) gas.

9.4.3 Jump Destination Validity

Here we formally define what are the JUMPDESTopens in a new tab opcodes. We cannot just look for byte value 0x5B, because it might be inside a PUSH (and therefore data and not an opcode).

In equation (153) we define a function, N(i,w). The first parameter, i, is the opcode's location. The second, w, is the opcode itself. If w∈[PUSH1, PUSH32] that means the opcode is a PUSH (square brackets define a range that includes the endpoints). If that case the next opcode is at i+2+(w−PUSH1). For PUSH1opens in a new tab we need to advance by two bytes (the PUSH itself and the one byte value), for PUSH2opens in a new tab we need to advance by three bytes because it's a two byte value, etc. All other EVM opcodes are just one byte long, so in all other cases N(i,w)=i+1.

This function is used in equation (152) to define D_J(c,i), which is the setopens in a new tab of all valid jump destinations in code c, starting with opcode location i. This function is defined recursively. If i≥||c||, that means that we're at or after the end of the code. We are not going to find any more jump destinations, so just return the empty set.

In all other cases we look at the rest of the code by going to the next opcode and getting the set starting from it. c[i] is the current opcode, so N(i,c[i]) is the location of the next opcode. D_J(c,N(i,c[i])) is therefore the set of valid jump destinations that starts at the next opcode. If the current opcode isn't a JUMPDEST, just return that set. If it is JUMPDEST, include it in the result set and return that.

9.4.4 Normal halting

The halting function H, can return three types of values.

If we aren't in a halt opcode, return ∅, the empty set. By convention, this value is interpreted as Boolean false.
If we have a halt opcode that doesn't produce output (either STOPopens in a new tab or SELFDESTRUCTopens in a new tab), return a sequence of size zero bytes as the return value. Note that this is very different from the empty set. This value means that the EVM really did halt, just there's no return data to read.
If we have a halt opcode that does produce output (either RETURNopens in a new tab or REVERTopens in a new tab), return the sequence of bytes specified by that opcode. This sequence is taken from memory, the value at the top of the stack (μ_s[0]) is the first byte, and the value after it (μ_s[1]) is the length.

H.2 Instruction set

Before we go to the final subsection of the EVM, 9.5, let's look at the instructions themselves. They are defined in Appendix H.2 which starts on p. 29. Anything that is not specified as changing with that specific opcode is expected to stay the same. Variables that do change are specified with as <something>′.

For example, let's look at the ADDopens in a new tab opcode.

Value	Mnemonic	δ	α	Description
0x01	ADD	2	1	Addition operation.
				μ′_s[0] ≡ μ_s[0] + μ_s[1]

δ is the number of values we pop from the stack. In this case two, because we are adding the top two values.

α is the number of values we push back. In this case one, the sum.

So the new stack top (μ′_s[0]) is the sum of the old stack top (μ_s[0]) and the old value below it (μ_s[1]).

Instead of going over all the opcodes with an "eyes glaze over list", This article explains only those opcodes that introduce something new.

Value	Mnemonic	δ	α	Description
0x20	KECCAK256	2	1	Compute Keccak-256 hash.
				μ′_s[0] ≡ KEC(μ_m[μ_s[0] . . . (μ_s[0] + μ_s[1] − 1)])
				μ′_i ≡ M(μ_i,μ_s[0],μ_s[1])

This is the first opcode that accesses memory (in this case, read only). However, it might expand beyond the current limits of the memory, so we need to update μ_i. We do this using the M function defined in equation 328 on p. 29.

Value	Mnemonic	δ	α	Description
0x31	BALANCE	1	1	Get balance of the given account.
				...

The address whose balance we need to find is μ_s[0] mod 2¹⁶⁰. The top of the stack is the address, but because addresses are only 160 bits, we calculate the value moduloopens in a new tab 2¹⁶⁰.

If σ[μ_s[0] mod 2¹⁶⁰] ≠ ∅, it means that there is information about this address. In that case, σ[μ_s[0] mod 2¹⁶⁰]_b is the balance for that address. If σ[μ_s[0] mod 2¹⁶⁰] = ∅, it means that this address is uninitialized and the balance is zero. You can see the list of account information fields in section 4.1 on p. 4.

The second equation, A'_a ≡ A_a ∪ {μ_s[0] mod 2¹⁶⁰}, is related to the difference in cost between access to warm storage (storage that has recently been accessed and is likely to be cached) and cold storage (storage that hasn't been accessed and is likely to be in slower storage that is more expensive to retrieve). A_a is the list of addresses previously accessed by the transaction, which should therefore be cheaper to access, as defined in section 6.1 on p. 8. You can read more about this subject in EIP-2929opens in a new tab.

Value	Mnemonic	δ	α	Description
0x8F	DUP16	16	17	Duplicate 16th stack item.
				μ′_s[0] ≡ μ_s[15]

Note that to use any stack item, we need to pop it, which means we also need to pop all the stack items on top of it. In the case of DUP<n>opens in a new tab and SWAP<n>opens in a new tab, this means having to pop and then push up to sixteen values.

9.5 The execution cycle

Now that we have all the parts, we can finally understand how the execution cycle of the EVM is documented.

Equation (155) says that given the state:

σ (global blockchain state)
μ (EVM state)
A (substate, changes to happen when the transaction ends)
I (execution environment)

The new state is (σ', μ', A', I').

Equations (156)-(158) define the stack and the change in it due to an opcode (μ_s). Equation (159) is the change in gas (μ_g). Equation (160) is the change in the program counter (μ_pc). Finally, equations (161)-(164) specify that the other parameters stay the same, unless explicitly changed by the opcode.

With this the EVM is fully defined.

Conclusion

Mathematical notation is precise and has allowed the Yellow Paper to specify every detail of Ethereum. However, it does have some drawbacks:

It can only be understood by humans, which means that compliance testsopens in a new tab must be written manually.
Programmers understand computer code. They may or may not understand mathematical notation.

Maybe for these reasons, the newer consensus layer specsopens in a new tab are written in Python. There are execution layer specs in Pythonopens in a new tab, but they are not complete. Until and unless the entire Yellow Paper is also translated to Python or a similar language, the Yellow Paper will continue in service, and it is helpful to be able to read it.

Page last update: 25 февруари 2025 г.