Vlad Zamfir’s Take On Sharding
Ethereum researcher Vlad Zamfir published an article about the scaling solution he says he likes the most: sharding.
For the uninitiated, sharding is a scaling solution currently in development that may allow the number of transactions per second on the Ethereum blockchain to increase by maximizing the number of parallel transactions the network can process.
Sharding works by splitting up the system state into partitions called shards, which have their own transaction history in their simplest form. According to Zamfir, transaction orders of magnitude per second can be achieved, "if everyone isn't validating everything."
Zamfir acknowledges the obstacles standing in the way of sharding the blockchain.
"We need to assign miners to shards so that miners from one shard don't mine from another to produce invalid blocks. So we need to sample the mining power, which presents a problem. We need to split the state space into shards, we need to process transactions within shards, and then deal with attack vectors where not everyone is checking everything."
To sample mining power, Zamfir recommends changing the block header – a string of information which is hashed to generate Proof-of-Work. Zamfir said non-outsourcable Proof-of-Work would allow for the accurate estimation of mining power, which can then be accordingly assigned to shards.
Zamfir suggests estimation be done for hash power using a "treechain approach where anyone can mine parts of the tree, less hashpower mines bottom of the tree, so everyone can get their transactions in." The hashing power anyone uses will be bound by the estimate to avoid overwhelming a shard through false reporting. He said, "First you need to estimate the hashrate of all these miners, sample them, then assign them to shards. We need to securely sample the hashrate, otherwise the hashrate might be overwhelming to a certain shard."
Zamfir explains that a majority of honest shards means that more can be created with a high probability of correct nodes in any given shard. "Just from fault tolerance analysis, if we have enough slices of the pie we can make more shards. So more slices of the pie, we can make more shards, we have a very high probability of correct nodes being on every shard," he said stressing the importance of acknowledging that if a majority of byzantine nodes are on a shard it might make an invalid block other miners would be unaware of. More shards deceases that possibility.
Zamfir goes into greater detail. "We can do a cumulative binomial distribution to see the probability of any one shard having more than half byzantine, and we can make sure we have a very low probability of that when we place samples of mining power into that." Then the unspent transaction output (UTXO) must be sharded. "If they spend the same UTXO, only one of them can get in; you can process concurrent transactions if they spend different UTXOs. There will be sharding of the UTXO set into mutually exclusive sets. Transactions that spend inputs from shard-i are mined by the miners we place in shard-i so basically there are going to be transactions that they process in their shard from other shards, and they only need to check that the spending is valid." A final check is done to ensure outputs are valid, and Proof-of-Work provides assurance that transaction outputs were spent correctly.
This would essentially generate two levels of blocks; a top-level block dealing with the estimated mining power of a given individual, and a low-level block which creates shards to match. Zamfir said, "We split up the UTXOs, then we validate the work on the low-level blocks, then we validate work and format of low-level block-headers. Top-level block is just SPV [Simple Payment Verification] of all low-level blocks. The top-level block- in the transaction location we instead have block headers for different shards, these have groups of transactions."
With a top-level chain managing more overhead, more shards can be added and according to Zamfir, the number of shards is linear to the overhead of the top-level chain. Thus, a linear increase in computational power will result in an increase in the block size, and by virtue, an increase in the number of shards, and so an increase in transactions which can be processed among those shards.
Some of the security concerns that Zamfir relates are that of corralling, and the creation of invalid blocks. In the case of corralling, every shard has an independent market for transaction fees, and corralling many outputs into one shard is expensive. A miner may also generate an invalid block, the outputs of which, if gone unnoticed, may be processed by other miners and be sent to another shard. He said, "This is especially bad because you could convince rational nodes in the shard to go without it, in fault tolerance you have to have them faulty ahead of time before the sampling." To fix it, he recommends challenge-response protocols:
"If you mine an invalid block and someone challenges you, they can prove it, then if people don't agree, then there's big incentive for people to download that block and verify it because they will gain from this market where people are betting whether it is invalid or not by betting on the right side."
On dealing with unavailable blocks, Zamfir said people wouldn't build blocks on top of them due to a risk of an invalid block negating a block reward.
The community still needs to find workarounds for scalability protocols. Zamfir's insights will help generate solutions for the remaining obstacles.