Wasm on the Blockchain: The Lesser Evil
In the beginning, there was Bitcoin.
Released in 2009 or thereabouts, Bitcoin started the blockchain craze which has now overtaken the technology industry. It wasn’t the first blockchain, it wasn’t the first use of cryptography to secure a monetary system, it wasn’t even the first use of a hashing proof of work algorithm, but it was a usable and practical implementation of all of these things that could be used by real people. Bitcoin is considered the beginning of what is known in the industry as “first generation” cryptocurrencies, cryptocurrencies that have no on-chain scripting ability. This isn’t technically the case since even the very first version of Bitcoin had on-chain scripting abilities, but the Bitcoin scripting platform is and has always been total crap so the founding of “second generation” cryptocurrencies is generally credited to Ethereum.
Again, Ethereum wasn’t the first cryptocurrency with a smart contract platform, but it was the first practical implementation that could be used by real people. Even though second-generation blockchains are considered strictly superior to first-generation blockchains, most modern blockchains are still first-generation, containing no or minimal scripting capability. Some of the largest cryptocurrencies behind Bitcoin and Ethereum - Monero, ZCash, Dash for example - support no scripting whatsoever, even less than Bitcoin supports. This is because writing a smart contract platform is, to put it mildly, fucking hard.
Let’s look at some requirements that a blockchain virtual machine (VM) must satisfy:
- Secure - it must be impossible for anyone to violate their permissions by using the smart contract platform. After all, anyone can deploy a smart contract.
- Deterministic - it must be impossible for smart contracts to produce different results on different nodes, even across differing architectures, operating systems, timezones, phases of the moon et cetera.
- Generic - it should be possible to write essentially anything that you want using the primitives supplied by the VM. Some people would consider this a misfeature, I’ll explain my reasoning later.
- Efficient - the more CPU-time a given contract takes to execute, the more expensive running that contract will be. This might not seem too big an issue since no-one is running raytracing or other CPU-bound operations on the blockchain, but it leads to a cascade of other problems as we’ll see later.
Some of these are similar to non-blockchain VMs, but other concerns are more esoteric. “Normal” VMs don’t need to be deterministic in general and many of them would consider determinism a misfeature (making it impossible to generate random numbers, for example). Many VMs don’t even need to be secure, since they’re generally considered to be running trusted code. They only need to enforce that user input can’t allow the user to violate permissions, not that the code can’t violate its permissions - that’s enforced by the operating system. The exception to this second point is the web platform, which I’ll get to soon.
So let’s look at the Ethereum VM, by far the most-used blockchain VM currently. It’s secure, in that it’s impossible to violate your permissions by writing code1. It’s deterministic, in that it’s not possible for you to write code that runs differently on different nodes. It isn’t, however, particularly generic or efficient. It’s generic in the most literal sense, in that it’s Turing complete2, but as anyone who has heard of the Turing tarpit will know that that is neither necessary or sufficient. The truth is that its low efficiency causes a de facto failure in genericism. It’s simply impractical to write many programs since the amount that a user would have to pay in order to use the software makes it economically infeasible.
To help with this, extra inbuilt functions have been added to the virtual machine over the years that implement efficient versions of commonly-needed functions. These are all cryptography-related functions for now, just because those are the most egregious of performance killers in the EVM. This is partly because the EVM’s word size is 256 bits, large enough that it must be implemented in software. Although at Parity we’ve put a lot of work into optimising our 256-bit integer implementation, it’s still the case that this software implementation is orders of magnitude slower than (for example) hardware 32-bit arithmetic.
Another difficult problem with efficiently executing EVM code is that it must be interpreted. A common way to massively increase the speed of virtual machines is to “just-in-time compile” them. This means compiling them to native code as you run them. This isn’t so simple on the blockchain. The overarching problem with blockchain VMs vs regular VMs is that blockchain VMs must consider the code they are running to be adversarial at all times, because since it’s a financial system, any problems can probably be used for financial gain by someone. With JIT compilation, since you’re compiling to native code essentially all safety rails are removed and you must be extremely careful to avoid major bugs, from consensus issues due to slightly different behaviour on different platforms up to and including arbitrary code execution.
Let’s say that you wrote a perfect JIT compiler, though. One with no bugs. Even then you could have issues, and for why we need to ask what makes most JITs so fast. Sure there’s a speed boost just from using native code, but the real power of a JIT is in optimisation. You can take something that’s in a portable bytecode and convert it to something optimised for the specific platform. The EVM was built to be the backend for Solidity, not to be close to the way that a machine works, and so in order to efficiently be compiled optimisation is somewhat necessary. For why this is a problem, we need to talk about complexity.
Complexity is the study of how much time code takes to run relative to input size. Say you have an algorithm that works on a list. If it takes the same length of time no matter the length of the list, it’s
O(1) - constant time. If it takes twice the length of time for twice the length of list, it’s
O(n) - linear time. If it takes four times the length of time for twice the length of list, it’s
O(n^2) - polynomial time. For the blockchain VM, every fully-validating node will have to do the compilation work and so if you want to deploy a contract you must pay for this compilation work - otherwise an attack vector would be to deploy many contracts that are expensive to compile. With a linear time (or better) algorithm we can price by number of bytes in the contract. We know that the algorithm will take an amount of time proportional to the bytes, so we can use number of bytes as a guess for amount of time. For nonlinear algorithms this is impossible.
Of course, you can price by bytes-squared for an
O(n^2) algorithm, but the truth is that we often don’t really know the complexity of sufficiently complex algorithms, which optimisers almost always are. We must keep it simple enough that we can be sure of the complexity, and enforcing a linear algorithm keeps the contract deployment cheap while also allowing us to easily price deployment. Plus, this bound on complexity essentially throws JIT compilers out the window - we can think of a JIT compiler as the same as an ahead-of-time compiler (AOT) except that it’s lazy instead of strict, and lazy algorithms are notoriously hard to work out complexity for. It’s easier to compile the contract once, when you first encounter it. Optimisations are OK as long as they are linear time.
At Parity Technologies we’re working on designing and building Polkadot, a “blockchain of blockchains”. We consider this and projects like it to be the third generation of blockchains - focussed on interoperability. Instead of smart contracts, we have parachains. Entire blockchains running as applications. For this, we need a better virtual machine than the EVM, and especially we need something that is more efficient. We chose WebAssembly (Wasm). This is a virtual machine specification that is intended to match the semantics of physical machines in the real world today, making it efficiently executable on modern hardware. It’s also built to be easily verifiable, not necessarily for logical correctness but for memory safety. For example, there are no random
gotos, you have to specify an actual function to call or block to break out of. It’s also intended to have no undefined behaviour. This means you can naïvely compile the VM code to native code without having to insert expensive runtime checks.
We often get asked about this, since it is considered by some that using a general-purpose virtual machine is considered a bad move. This was actually brought up to us by Mozilla co-founder Brendan Eich in this tweet. To summarise his argument in what I believe is the most charitable way, he believes that when building a smart contract platform you should enforce that all contracts use a Turing incomplete language that is amenable to formal verification, in order to check for correctness. Turing completeness is an interesting topic, but for the sake of this argument you can think of a Turing incomplete language as one that doesn’t allow you to infinitely recur or infinitely loop. You can always know if the program will finish or not for a given input - it always will. This allows you to make certain guarantees about the execution of the program more easily. You can always have an upper bound on the amount of time it will take to execute a given contract.
I think that his logic is correct on some level - formal verification is important and should be encouraged in the blockchain space - but his is an extremist viewpoint that doesn’t allow for pragmatism. Sure, you can enforce the use of a certain programming language by having the contracts deployed in that language directly to the blockchain (instead of in a virtual machine bytecode), or a compressed form like a serialised AST. This would allow you to do formal verification on existing contracts, but then you either do an extremely expensive AST-walking interpreter or you compile to bytecode in the nodes, which means you hit the problem mentioned above - compilation of a complex language is nonlinear and so then you have to price deployment using gas, the pricing mechanism used for smart contract execution in Ethereum where you add up the cost of execution as you execute it. This is error-prone and expensive. I don’t know for sure if Eich was suggesting this because his argument has been run through the mangle of Twitter’s character count, but at the very least he’s suggesting a bespoke Turing-incomplete VM bytecode format with a bespoke functional programming language on top; let’s run through why we haven’t chosen this.
So let’s say that Turing incompleteness is a good thing, and that you should write smart contracts in a Turing incomplete, easily-verifiable language. Why create your own? Why not use, say Idris. You can already compile Idris to Wasm today using its C backend. The Turing completeness of the virtual machine specification does not preclude you from using a Turing incomplete language on top. You’d have to create an idiomatic library for creating smart contracts, but that’s far easier than creating a new virtual machine and language. The only thing you’d lose is the ability to formally verify already-deployed contracts, except that verifying a contract doesn’t mean that it’s correct. It only means that it satisfies the constraints it puts on itself. These constraints make it significantly easier to write correct code, but not significantly easier to validate existing code without still requiring human intervention.
The fact that Idris already compiles to Wasm hints at a different reason to use Wasm - pretty much everything compiles to it. We already write production-grade high-reliability systems like OS kernels and cryptocurrencies in languages like C, C++ and Rust, all of which have excellent tooling and even some amount of automated verification infrastructure, while also making it easy to write extremely fast code. All of which, too, compile to Wasm. That’s not to mention truly high-integrity, high-performance languages like SPARK - used in aeroplane flight software - which will also compile to Wasm in the future using GCC’s Wasm backend (Ada, of which SPARK is a subset, can be compiled with GCC).
A benefit of Turing incompleteness, as I said before, is that you can have an upper bound on program execution time. This, in theory, means you don’t have to have Ethereum’s “gas pricing” mechanism, where each instruction has a price and the cost of the whole execution is tallied up as you go. Except if you price by this upper bound then the vast majority of contracts will cost far more than the actual time taken to run the contract. As a result, you’ll probably want to implement some kind of gas pricing to make it cheaper for users - so why kneecap yourself with Turing incompleteness if you have to gas it like a Turing complete virtual machine anyway.
I’ve saved the coolest part about WebAssembly in Substrate for last: it allows us to deploy a Wasm implementation of the blockchain logic on the blockchain itself. By using a virtual machine specification that C++ and Rust can efficiently compile to, we can write chain logic once and have it compile to both native code for the client executable and Wasm to deploy on-chain. This means that even if a client doesn’t get updated it won’t start rejecting new blocks generated by clients that have been updated - all of them upgrade the runtime in lock-step. If you never get the client update you’ll use the slower Wasm version, and you’ll get a nice speed boost when you do update (and you’ll receive updates to the parts of the client that don’t affect consensus). This avoids Ethereum’s problem of having to hard-fork every time they want to upgrade and Bitcoin’s problem of avoiding updates entirely except in dire circumstances for fear of any hard forks. I do think this is very cool, but it’s not enough of a reason on its own since you could have this Wasm runtime be seperate from the smart contract VM if you wanted to.
So now that I’ve thoroughly defended WebAssembly, let’s talk about implementation. A question we hear a lot is that we should use V8 or SpiderMonkey, an off-the-shelf Wasm engine written by a third party. You’d think that with our love of standing on the shoulders of giants that we’d be fully in favour of this, but unfortunately it’s not possible to use these. Let’s see those requirements for a blockchain VM again:
V8 and SpiderMonkey are certainly secure, generic and efficient, but there’s an issue when it comes to determinism. While it’s likely that they aim for determinism, I have no doubt that they would give up determinism in obscure corner cases for the sake of efficiency as long as the Wasm spec allows it. For blockchain we can’t allow that. Even the most obscure of obscure corner cases must be precisely emulated across architectures, since if they are not then an attacker can deliberately use that to force a fork or a shutdown of the network. Although even if V8 and SpiderMonkey are deterministic, they’re still optimising compilers, and JIT compilers at that. As I mentioned before, optimisers are nonlinear and JITs are difficult to even work out the complexity of, so there is no easy way to price deployment of a contract. V8 and SpiderMonkey are not just theoretically nonlinear: real-world “compiler bombs” (pieces of code that cause the compiler to take an exponentially long amount of time) have been found and there is no reason to believe that even if they are fixed that more will not be found in the future. The determinism issue is surmountable I think, but this issue of complexity is thornier. If we do find a solution to it then it’s possible that a future version of Substrate will be able to use V8 or SpiderMonkey as its Wasm runtime, but the threat of bad actors using issues in the runtime to exploit the system means that we must tread lightly.
So for now if we are to create a Wasm-to-native compiler, we will have to do it ourselves, with strong guarantees about runtime behaviour and determinism. This isn’t easy, but in the meantime we have an efficient Wasm interpreter called wasmi. This a correctness-oriented interpreter for Wasm on the blockchain, not performance-oriented, but it’s already fast and we’re constantly working to improve its performance.
In summary, the main points are:
- Wasm allows us to reuse existing tooling;
- Wasm is easily verifiable for correctness;
- Wasm allows us to effectively balance price of deployment against price of execution;
- Wasm allows us to have isomorphic blockchain logic code for fork-free upgrades;
- The benefits of using the formal-verification-oriented option dissolve when you try to apply it to a VM format - a Turing-incomplete language is more suitable as a frontend used by humans instead of a backend interpreted by the blockchain client;
- We can’t easily use existing Wasm runtimes since our requirements are much stricter than theirs.
I hope I’ve answered most of the questions that people have asked about our choice to use Wasm and our subsequent choice not to use existing Wasm engines at the core of our Parity Substrate product (and, subsequently, in Polkadot). If you have more questions, you can send them to my Twitter account @meatwithdreams.
- A funny exploit in the early days of Bitcoin - before it had any real value - was that you could steal anyone’s coins by having a script containing only
return true;(or the Bitcoin Script equivalent). This was because they concatenated scripts together, treating them all as one long script. [return]
- It technically isn’t Turing complete because executing contracts requires gas and there is only a finite amount of gas in the system, but it’s enough to satisfy the colloquial definition of Turing completeness. [return]