troubles.md Writing words and reading dwords

Never Fork Again

It’s no secret in the blockchain world that upgrading an existing blockchain sucks, even more than everything else in blockchain sucks. Blockchains are essentially copy-on-write: there’s no way to actually “upgrade” a blockchain in the traditional sense, only create a new blockchain that retains the old one’s history. Like if you replace your dead dog with another of the same breed and you call it the same name for the sake of the kids. Except in this case the original dog isn’t dead, and also I guess the new dog has all the memories of the original dog surgically implanted into its brain somehow. The metaphor starts to break down if you look at it too hard but anyway, you get the point. No upgrades, only forks. This has been the paradigm for as long as there have been blockchains. There’s probably some old computer in someone’s cupboard still running a Bitcoin miner from 2009, and so Bitcoin hasn’t been upgraded in the traditional sense. Assuming you could find that cupboard and a compatibly-old client you could still connect to Bitcoin 2009 (presumably mostly made up of transactions from people buying and selling pirated copies of X-Men Origins: Wolverine).

§ Upgrading blockchains is hard §

This is a problem with distributed systems in general, but it’s really a problem with decentralised systems like Bitcoin where no one entity controls all the computers on the network. You can’t call up every Bitcoin miner and tell them that they have to upgrade. You can call up every Bitcoin miner that actually matters, but that’s a bug and not a feature of the Bitcoin network. The way many blockchains deal with this nowadays is to upgrade clients way ahead of time, with the upgraded client supporting both versions of the network and an if statement that switches to the new code at a certain block number. Ethereum manages relatively regular updates using this method. This works just fine for now, but it doesn’t really fix the problem of forking every time you want to change anything, it just buys you time. Besides, it’s uncivilised. When London votes for a mayor, it doesn’t split along political lines into London and London Classic, they choose the mayor that makes the most people happy (or more accurately, that makes the most people the least unhappy). This kind of formalised governance system is what is needed to reduce uncertainty in the blockchain ecosystem. Right now, it’s impossible to have a long-running light client such as you’d find in an IoT device without trusting a third party to supply you with regular, trusted updates. Let’s be real though, most IoT manufacturers here in reality will not update at all if they can get away with it. It’s a difficult problem to update an IoT device safely, but more to the point most manufacturers just don’t care. Mirai would not have happened if IoT manufacturers could be trusted to update their software with the reliability that is required for running a blockchain node.

Another solution to this, of course, is to never even try to update the blockchain. That’s Bitcoin’s solution for the most part, although arguably the community cares less about long-running light clients and governance than they do about invalidating the ASIC investment of the 10 miners that get any of the rewards. We’ll invalidate this solution for the sake of this article, though, since I think most people would agree that updating your software is a good thing and usually a good heuristic for what to think about a subject is whatever’s the opposite of what Bitcoin people believe.

§ On-chain upgrades and other horrors §

One way you could get clients to update is to deploy the entire client on-chain. You could have some method of on-chain governance choose whether or not that client is accepted, and then if it is then all the clients across the world can install the new version and (Linux) fork instead of (blockchain) fork. Of course, you can’t just deploy binaries on the blockchain - for a start, that basically grants the government the ability to run arbitrary code on anyone’s computer. This doesn’t just mean they can do anything they like to the blockchain, but they can turn your computer into a botnet node, they can fill your pictures folder with oil paintings of Danny DeVito (seriously, that’s a shockingly well-stocked genre of art) or even - God forbid - mine Bitcoin. Even if you trust the government completely, it’s a big logistical problem to deploy a binary for every OS and every architecture available.

So let’s say that we run the whole client in a virtual machine. We write it in Java or Scala and we deploy Java bytecode on the blockchain, or we do something similar to Apple and write our code in a LLVM-backed compile-to-native language and deploy LLVM bitcode. Well, this is better, but no matter the VM format you use you’re heavily limiting the languages that you can build a client in. Plus you’re still deploying a gigantic blob on the blockchain, where space is at a premium (or should be at a premium, but realistically storage rent isn’t widely implemented). This is ignoring the fact that when you build a LLVM language you generate different bitcode depending on the OS - Apple can only get away with it because their OS’s on the different devices that they deploy to have compatible APIs.

Finally, any strategy that deploys the whole client on-chain is eventually going to hit the problem that handling rollbacks turns the whole thing into a comical farce, because the rollback strategy is defined by the very thing that you’re rolling back. There’s no good answer to defining a fork-choice rule or rollback strategy in the face of on-chain updates.

Let’s say that we separate out the code that defines the blockchain’s logic from the rest of the code and compile just that to a VM bytecode. By “the blockchain’s logic” I mean the code that defines that the blockchain does. Ethereum does accounts, balances and smart contracts. ZCash does zero-knowledge proofs of transactions. Bitcoin does extremely inefficient basic accounting. In the general case, it’s the code that tells us how to change the state in response to a transaction. We can still have competition on networking, speed of implementation etc, and although that means we can’t update the networking code using this method, keeping multiple different versions of nodes in a distributed network is a much better-solved problem than having those nodes achieve consensus on a constantly updating mutable state. We can use an extensible networking negotiation protocol like libp2p to allow newer nodes to use newer communication protocols while still allowing older nodes to participate. To avoid the rollback problem that we mentioned before, we have to separate the logic out below the level of consensus. This means we can change only the code that maps transactions to state changes. This does mean that we can’t change the fork-choice rule, rollback strategy or consensus algorithm, but we have to make an abstraction trade-off somewhere. The consensus algorithm can’t be changed because it could retroactively make blocks invalid, and we can only allow change things on-chain that affect the state going forward.

§ Designing our interface §

For the sake of argument and because you’ve probably worked out that I’m leading up to something here, let’s call this logic the blockchain’s “runtime”. If we’re building a new chain from scratch and we can choose our consensus algorithm and suchlike, we want to make our code as generic as possible. So, we want to allow the runtime to make as many choices as possible. Let’s say we want to support both permissioned, private chains and permissionless, public chains. For example, if we choose a PoA consensus algorithm like Tendermint or PBFT then we can have both, by allowing the runtime to choose the authorities. This means that if we want to change the way that we get consensus between authorities then we still have to hard-fork, but in practice this system still ends up incredibly flexible. Using a PoA system with authorities chosen on-chain allows us to create both permissioned chains (meaning that only certain parties can participate in the network) and permissionless chains (meaning that anyone can participate in the network). We can use any number of strategies for either kind, but for example we could create a proof-of-stake chain by choosing the authority set by a random choice of the stakers weighted by their stake. This solves the problem of upgrade strategies being undecideable - an upgrade can’t retroactively change a previous block’s authority set, only change how future authority sets are chosen.

The runtime should also be able to make its own decisions on when and how to upgrade, since that allows us to use on-chain governance to do these upgrades. Since the governance is also defined in the runtime, that means that you could program the runtime to restrict what the government is allowed to change. Maybe it’s not allowed to change the governance structure itself, but everything else is fair game. Maybe it’s only allowed to tweak small things but the basic rules of the system must remain immutable. Maybe you can change anything but different tiers of power require different levels of majority. It’s entirely up to the developer of the runtime.

This is the approach we’ve taken in Parity Substrate, the blockchain development toolkit that we’ve built at (no prizes for guessing) Parity. The consensus and networking is handled by Substrate and the chain’s logic, authority choice and automatic upgrading is handled by the runtime. Yes, as you’ve probably gathered by now, the “runtime” terminology is copped from Substrate. This also means that you separate the code that affects consensus out, which allows you to have different backends - for example, in Polkadot we can share the Polkadot-specific code between “validators”, “nominators”, “collators” and “fishermen”. Even though the nodes perform different roles they can all agree on what the state of the network should be and act accordingly. As far as the network is concerned they differ only in the kinds of transactions they submit to the network. Anything that takes data from the outside world - whether it’s a report of a node’s misbehaviour, a measurement of the weather or a human’s desire to send money to another human - is modelled as a transaction and the state is updated as a response to that. Our use of Substrate in Polkadot allows us to make changes to the network in a democratic way that doesn’t leave any clients behind, and it’s already been tested: we used the exact method explained in this article to upgrade from Polkadot PoC1 to PoC2. We hope that this model can usher in a new era of experimentation in the blockchain space, but that’s for you to decide.