Learn Foundational Ethereum Topics with SQL
Many Ethereum tutorials target developers, but there’s a lack of educational resources for data analyst or for people who wish to see on-chain data without running a client or node.
This tutorial helps readers understand fundamental Ethereum concepts including transactions, blocks and gas by querying on-chain data with structured query language (SQL) through an interface provided by Dune Analytics.
On-chain data can help us understand Ethereum, the network, and as an economy for computing power and should serve as a base for understanding challenges facing Ethereum today (i.e., rising gas prices) and, more importantly, discussions around scaling solutions.
A user’s journey on Ethereum starts with initializing a user-controlled account or an entity with an ETH balance. There are two account types - user-controlled or a smart contract (see ethereum.org).
Any account can be viewed on a block explorer like Etherscan. Block explorers are a portal to Ethereum’s data. They display, in real-time, data on blocks, transactions, miners, accounts and other on-chain activity (see here).
However, a user may wish to query the data directly to reconcile the information provided by external block explorers. Dune Analytics provides this capability to anyone with some knowledge of SQL.
For reference, the smart contract account for the Ethereum Foundation (EF) can be viewed on Etherscan.
One thing to note is that all accounts, including the EF’s, has a public address that can be used to send and receive transactions.
The account balance on Etherscan comprises regular transactions and internal transactions. Internal transactions, despite the name, are not actual transactions that change the state of the chain. They are value transfers initiated by executing a contract (source). Since internal transactions have no signature, they are not included on the blockchain and cannot be queried with Dune Analytics.
Therefore, this tutorial will focus on regular transactions. This can be queried as such:
1WITH temp_table AS (2SELECT3 hash,4 block_number,5 block_time,6 "from",7 "to",8 value / 1e18 AS ether,9 gas_used,10 gas_price / 1e9 AS gas_price_gwei11FROM ethereum."transactions"12WHERE "to" = '\xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe'13ORDER BY block_time DESC14)15SELECT16 hash,17 block_number,18 block_time,19 "from",20 "to",21 ether,22 (gas_used * gas_price_gwei) / 1e9 AS txn_fee23FROM temp_table24Rodyti visus
This will yield the same information as provided on Etherscan's transaction page. For comparison, here are the two sources:
EF's contract page on Etherscan.
You can find dashboard here. Click on the table to see the query (also see above).
Breaking Down Transactions
A submitted transaction includes several pieces of information including (source):
- Recipient: The receiving address (queried as "to")
- Signature: While a sender's private keys signs a transaction, what we can query with SQL is a sender's public address ("from").
- Value: This is the amount of ETH transferred (see
- Data: This is arbitrary data that's been hashed (see
- gasLimit – the maximum amount of gas units that can be consumed by the transaction. Units of gas represent computational steps
- maxPriorityFeePerGas - the maximum amount of gas to be included as a tip to the miner
- maxFeePerGas - the maximum amount of gas willing to be paid for the transaction (inclusive of baseFeePerGas and maxPriorityFeePerGas)
We can query these specific pieces of information for transactions to the Ethereum Foundation public address:
1SELECT2 "to",3 "from",4 value / 1e18 AS ether,5 data,6 gas_limit,7 gas_price / 1e9 AS gas_price_gwei,8 gas_used,9 ROUND(((gas_used / gas_limit) * 100),2) AS gas_used_pct10FROM ethereum."transactions"11WHERE "to" = '\xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe'12ORDER BY block_time DESC13Rodyti visus
Each transaction will change the state of the Ethereum virtual machine (EVM) (source). Transactions are broadcasted to the network to be verified and included in a block. Each transaction is associated with a block number. To see the data, we could query a specific block number: 12396854 (the most recent block among Ethereum Foundation transactions as of this writing, 11/5/21).
Moreover, when we query the next two blocks, we can see that each block contains the hash of the previous block (i.e., parent hash), illustrating how the blockchain is formed.
Each block contains a reference to it parent block. This is shown below between the
parent_hash columns (source):
Here is the query on Dune Analytics:
1SELECT2 time,3 number,4 hash,5 parent_hash,6 nonce7FROM ethereum."blocks"8WHERE "number" = 12396854 OR "number" = 12396855 OR "number" = 123968569LIMIT 1010Rodyti visus
We can examine a block by querying time, block number, difficulty, hash, parent hash, and nonce.
The only thing this query does not cover is list of transaction which requires a separate query below and state root. A full or archival node will store all transactions and state transitions, allowing for clients to query the state of the chain at any time. Because this requires large storage space, we can separate chain data from state data:
- Chain data (list of blocks, transactions)
- State data (result of each transaction’s state transition)
State root falls in the latter and is implicit data (not stored on-chain), while chain data is explicit and stored on the chain itself (source).
For this tutorial, we'll be focusing on on-chain data that can be queried with SQL via Dune Analytics.
As stated above, each block contains a list of transactions, we can query this by filtering for a specific block. We'll try the most recent block, 12396854:
1SELECT * FROM ethereum."transactions"2WHERE block_number = 123968543ORDER BY block_time DESC`4
Here's the SQL output on Dune:
This single block being added to the chain changes the state of the Ethereum virtual machine (EVM). Dozens sometimes, hundreds of transactions are verified at once. In this specific case, 222 transactions were included.
To see how many were actually successful, we would add another filter to count successful transactions:
1WITH temp_table AS (2 SELECT * FROM ethereum."transactions"3 WHERE block_number = 12396854 AND success = true4 ORDER BY block_time DESC5)6SELECT7 COUNT(success) AS num_successful_txn8FROM temp_table9
For block 12396854, out of 222 total transactions, 204 were successfully verified:
Transactions requests occur dozens of times per second, but blocks are committed approximately once every 15 seconds (source).
To see that there is one block produced approximately every 15 seconds, we could take the number of seconds in a day (86400) divided by 15 to get an estimated average number of blocks per day (~ 5760).
The chart for Ethereum blocks produced per day (2016 - present) is:
The average number of blocks produced daily over this time period is ~5,874:
The queries are:
1# query to visualize number of blocks produced daily since 201623SELECT4 DATE_TRUNC('day', time) AS dt,5 COUNT(*) AS block_count6FROM ethereum."blocks"7GROUP BY dt8OFFSET 1910# average number of blocks produced per day1112WITH temp_table AS (13SELECT14 DATE_TRUNC('day', time) AS dt,15 COUNT(*) AS block_count16FROM ethereum."blocks"17GROUP BY dt18OFFSET 119)20SELECT21 AVG(block_count) AS avg_block_count22FROM temp_table23Rodyti visus
The average number of blocks produced per day since 2016 is slightly above that number at 5,874. Alternatively, dividing 86400 seconds by 5874 average blocks comes out to 14.7 seconds or approximately one block every 15 seconds.
Blocks are bounded in size. The maximum block size is dynamic and varies according to network demand between 12,500,000 and 25,000,000 units. Limits are required to prevent arbitrarily large block sizes putting strain on full nodes in terms of disk space and speed requirements (source).
One way to conceptualize block gas limit is to think of it as the supply of available block space in which to batch transactions. The block gas limit can be queried and visualized from 2016 to present day:
1SELECT2 DATE_TRUNC('day', time) AS dt,3 AVG(gas_limit) AS avg_block_gas_limit4FROM ethereum."blocks"5GROUP BY dt6OFFSET 17
Then there is the actual gas used daily to pay for computing done on the Ethereum chain (i.e., sending transaction, calling a smart contract, minting an NFT). This is the demand for available Ethereum block space:
1SELECT2 DATE_TRUNC('day', time) AS dt,3 AVG(gas_used) AS avg_block_gas_used4FROM ethereum."blocks"5GROUP BY dt6OFFSET 17
We can also juxtapose these two charts together to see how demand and supply line up:
Therefore we can understand gas prices as a function of demand for Ethereum block space, given available supply.
Finally, we may want to query average daily gas prices for the Ethereum chain, however, doing so will result in an especially long query time, so we’ll filter our query to the average amount of gas paid per transaction by the Ethereum Foundation.
We can see gas prices paid for all transactions made to the Ethereum Foundation address over the years. Here is the query:
1SELECT2 block_time,3 gas_price / 1e9 AS gas_price_gwei,4 value / 1e18 AS eth_sent5FROM ethereum."transactions"6WHERE "to" = '\xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe'7ORDER BY block_time DESC8
With this tutorial, we understand foundational Ethereum concepts and how the Ethereum blockchain works by querying and getting a feel for on-chain data.
The dashboard that holds all code used in this tutorial can be found here.
For more use of data to explore web3 find me on Twitter.