Preface
I originally wrote this piece after Ethereum lost finality back in May 2023 twice when both the Prysm and Teku minority clients encountered bugs. Around then, Vitalik also dicussed the possibility and concerns for staking bailouts in his "Don't overload Ethereum's consensus" article if a catatrophic bug were to happen.
I'm updating and reposting this in light of 2 recent events:
- Vitalik's Keynote speech at EthCC 7 where he warns that Ethereum protocol design needs to be careful of other vulnerabilities besides just the typical 33/50/67% consensus-level attacks. It's a great, humble lecture from Vitalik, and I highly recommend watching it if you haven't already done so.
- Geth client developer Marius van der Wijden making it very clear that he wasn't ready for including EOF in the Pectra update
This is a reminder that there is a reason Ethereum updates are slow and methodical and use multiple testnets.
It only takes one unlucky bug to cause catastrophic damage to the blockchain and cause a mass-slashing event where the majority of stakers will lose their Ether. We got lucky back in 2023 because the bugs were in minority clients and it only halted finality. A bug affecting the majority of clients might not happen now or even in the next decade, but there may be one day where another catastrophic event as damaging as the 2016 DAO hack causes the chain to split again.
Summary
Historically, successful PoW attacks have been numerous, but successful PoS attacks are virtually non-existent.
History has proven that PoS consensus is a more secure alternative to PoW consensus against Sybil attacks like the 51% attack. However, this is at the cost of PoS being less resilient than PoW for disaster recovery. This is because PoW by design allows for miners to re-attack/reorg the blockchain to revert mistakes.
While client bugs are exceptionally rare, they do occur, and most PoS blockchains have no on-chain method to revert past finality. It's important to avoid reorgs in the first place because any transations that finalize off-chain through DEXs, bridges, and CEXs are often irreversible even after the blockchain is reverted.
- Security is the ability to protect against malicious attackers
- Resilience is the ability to restore the chain after an attack or catastrophic bug
Similar to the Blockchain Trilemma where there are trade offs between Security, Decentralization, and Scalability--Resilience is also a tradeoff of Security.
Even the 2 biggest blockchains, Bitcoin and Ethereum (when it was still using PoW), have encountered 51% attacks. Bitcoin (in 2010 and 2013) and PoW Ethereum (in 2016 and 2016) had both been successfully 51% attacked twice each in order to fix catastrophic bugs and issues. It would be extremely difficult if not impossible to accomplish this in reasonable time under PoS Ethereum and most other decentralized PoS blockchains today.
Past finality, it usually requires a DAO-hack like chain split or bailout to undo a catastrophe: i.e. through Layer 0 community consensus and off-chain governance.
Securing a blockchain from Sybil attacks
There are only 2 main categories of exploitable consensus-level blockchain attacks: censorship and reorganizations (which include forks and double-spends). These related to liveness and safety respectively.
- Liveness threshold: the percent of malicious actors above which censorship can occur
- Safety threshold: the percent of malicious actors above which reorgs can occur
If the Safety threshold is N%, then the Liveness threshold is (1-N)%. For PoW, these are both 50%. For traditional BFT, safety is 67%, and liveness is 33%. For PoS, safety is at least 67%. The stronger a network is against safety attacks, the weaker it is against liveness attacks. But there are other bigger factors that can increase security overall, like increasing centralization.
Nearly all crypto networks are alike in that they do not allow for bad transactions with invalid signatures.
This is true for all consensus protocols (PoW, PoS, PoA, etc). Even if the network is reorged, 51%-attacked, 33%/67% attacked, or censored, an attacker still can't add invalid transactions. The bad transaction/block would be ignored and skipped by the rest of the network because no honest node (e.g. validator, node, wallet, CEX, RPC, etc.) would ever accept those transactions.
There have been numerous successful PoW attacks
If you think PoW is safe, you're on the wrong side of history. You can Google "successful 51% attacks" and find many dozens of examples. AFAIK, there have been no successful PoS consensus-level attacks, but please correct me if you know of one.
By now, PoS has been thoroughly battle-tested and proven that it's a safer alternative.
Proof of Work (PoW) Blockchain vulnerabilities
PoW's heaviest weight and longest chain protocols are fundamentally vulnerable to 51% attacks by design.
The security budget of PoW miners is usually orders of magnitude lower than its native token's market cap, so it doesn't cost anywhere near as much to attack a network as the amount of damage done. Also, miners can often jump from chain to chain as long as their hashing protocol is similar. Many successful 51% attacks occurred when large mining operations switched from a larger chain to a smaller one in a form of bullying to disrupt the smaller chain.
Even Bitcoin would be more secure in the long-run if it dropped PoW, switched to PoS, and added tail emissions.
There are ways to reduce the effectiveness of block-withholding attacks, which by far the most common type of 51% attack. One method is to use finality checkpoints for which blocks past a certain time in history are considered final. But this method uses arbitrary factors and only prevents long-range attacks, not short-to-mid range attacks. In fact, it makes short-range attacks much more dangerous and reduces resilience. If an attacker pulled off a successful short-range attack, it would be impossible to revert the chain after the finality checkpoint. Thus checkpoints do not meaningfully increase security under PoW other than for preventing long-range attacks.
The reason PoW has high resilience to attacks is because the method to revert a chain is fundamentally built into PoW. All you have to do is beat the attacker at producing the longest or heaviest chain. Thus PoW blockchains are less secure, but they can undo the changes easier. However, most PoW blockchains that get successfully attacked often lose their reputation even after the chain is restored.
Proof of Stake (PoS) Blockchain vulnerabilities
PoS attacks are very difficult because the amount staked is often orders of magnitude more expensive to obtain than it is to acquire the amount of miners in a mining network. And even if 51% of the staking amount were obtained, it's very unlikely for a PoS attacker to want to attack itself. The only realistic vectors of attack for PoS networks are to exploit staking pools and client bugs.
There are numerous types of PoS networks, and many of them work very differently for security. Some can be taken over and reorged at 67% of stake. Others like Avalanche's Snowman and Algorand require higher percentages above 80-90% and are extremely hard to attack. PoS has one weak point: It has a lower liveness threshold. If an attacker can reorg a network at 67%, it can censor it at 33%. When censored, depending on the network, it will either stop adding or stop finalizing blocks. For example, Ethereum still produces blocks but stops finalizing blocks when attackers obtain 33% of the stake and begins an inactivity leak after 4 epochs without finality.
Using 51% attacks to revert from bugs
Bitcoin was reorged in once in 2010 and once in 2013 via 51% attacks. Ethereum was reorged twice in 2016 using the same method. Unlike the malicious attacks, which are common throughout PoW blockchains, these 4 times were to fix bugs.
Under PoW, it was really easy to gather the top miners (fewer than 5) and convince them to attack and reorg the network. It only took hours to fix the chain with PoW, not days or weeks.
This short turnaround time would be virtually impossible under a decentralized PoS blockchain. Most PoS blockchains have deterministic finality after a fixed (sometimes arbitrary) number of seconds or blocks. By protocol, they cannot reorg past finality, so the community basically would have to collectively agree to split the chain, or bail out the network.
How to revert from a disaster
PoW
- Disaster happens. There is no finality.
- Get honest miners to re-attack the network and build the new heaviest chain
- Either the honest miners win or the chain is f*cked
- Even if the honest miners win, the chain's reputation is likely moderately damaged, but probably not completely catastrophic.
- Any transactions finalized off-chain are f*cked.
PoS
- Disaster happens. Finality occurs.
- Can't revert back past finality.
- In the case of blockchains with checkpoints that allow for reversion (e.g. Solana): there is an outage and recovery.
- In the case of Ethereum, the majority of validators get slashed. Possible bailout event involving a hard fork with a bug fix.
- In the case of most other PoS blockchains, there just is no procedure to revert. They have to get the whole community to agree to move to a new chain starting at a previous block.
- Any transactions finalized off-chain are f*cked.
The biggest problem with reorgs and reversion is that any change that is finalized through cross-chain bridges, on CEXs, or DEXs is f*cked.
Let's go over this in more detail
Slashing on Ethereum
If the current version of PoS Ethereum were to hit a bug today and erroneously finalize a block past an epoch, it would be catastrophic. There would be no way to revert that block without completely splitting the chain, or slashing the majority of PoS stakeholders. Those validators would lose everything.
This is because Ethereum is one of the few blockchains with strict slashing rules. In order to revert the chain after finality, the majority of validators would be slashed. In order to split the chain, all validator and node developer clients would need to release an update, and the whole community and all centralized exchanges would need to agree to support the new chain. Instead of only taking a few hours to revert the chain like under PoW, it would likely take weeks. Ethereum has at least 10 different client developer teams, each making their own clients. Ethereum updates often take quarters and require testing through multiple testnets.
Given that Ethereum has 10 different clients and multiple testnets, it's extremely unlikely that the majority of clients would commit the same error on mainnet. But it isn't impossible, and it only takes one mistake to result in a mass slashing event. Ethereum has lost finality twice before due to a bug in May 2023, and there have been catastrophic bugs that were fortunately discovered on testnets. I wouldn't expect it to happen on mainnet within a decade, but the chances of such a catastrophic bug happening in a human lifetime has a decent chance.
Here are some ways to fix this:
- The easiest way to fix this vulnerability is to reduce the slashing penalty for self-slashing. The tradeoff is marginally-less security. We've had many years of battle-testing PoS blockchains without any successful Sybil attacks, and Ethereum can probably afford to loosen up its high security. A staking/slashing expert can probably come up with a more elegant solution than mine.
- Introduce a protocol for all nodes to revert to a previous checkpoint/epoch. The problem with this is that any transfers that happened off-chain (e.g. bridged assets, CEX transfers) after the reorged block cannot be reverted, and this causes a mess.
- Introduce a protocol to retroactively cancel a set of troublesome transactions past finality without incurring any slashing. This is my preferred method.
Other PoS blockchains without slashing
Other PoS blockchains without slashing have it easier because they aren't pressured to revert minor mistakes in a short amount of time. Reorging would be embarrassing, but it would be easier for the community to take their time to recover through a hard fork update when there is no pressure of slashing. Nevertheless, reverting past finality is not easy because the community would still have to get the supermajority of stakers and nearly all node client developers (validators, wallets, nodes, RPCs, CEXs) to agree to apply updates those clients and revert to a previous blocks.
It would be messy and require much coordination.
Other PoS blockchains with checkpoints
There are some exceptions where PoS blockchains are also resilient. Blockchains like Solana and BSC can be halted and restored to a previous checkpoint. Thus they are resilient to reorgs and bugs because they are centralized in this aspect. It's a tradeoff.
Most PoA blockchains are also similar in that they can freeze and revert, giving them high security and resilience with the tradeoff of having low decentralization.
Conclusion
PoS is more secure than PoW, but at the cost of being more difficult to recover from a rare catastrophic disaster. But there are still ways to mitigate and recover from such disasters. It will just be a bit messier than under PoW, which is used to dealing with reorgs and needing more block confirmations.
No comments:
Post a Comment