WebAssembly Troubles part 4: Microwasm
This is the final part of a 4-part miniseries on issues with WebAssembly and proposals to fix them. Part 1 here, part 2 here, part 3 here.
This article assumes some familiarity with virtual machines, compilers and WebAssembly, but I’ll try to link to relevant information where necessary so even if you’re not you can follow along.
Also, this series is going to come off as if I dislike WebAssembly. I love WebAssembly! I wrote a whole article about how great it is! In fact, I love it so much that I want it to be the best that it can be, and this series is me working through my complaints with the design in the hope that some or all of these issues can be addressed soon, while the ink is still somewhat wet on the specification.
Wasm is mostly a great spec, but it has some serious problems. I’ve detailed some of these issues in the previous three articles in this series, but it might seem like actually fixing these problems is untenable. Are you really going to deprecate locals? You’re going to deprecate
Well, turns out we don’t need to - we can get many of the desired benefits without dropping support for WebAssembly as it exists now.
Microwasm (working title) is Wasm-compatible format that can be efficiently consumed by runtimes and efficiently produced by compilers like LLVM. It’s currently implemented in the Microwasm branch of Lightbeam. The main goals are as follows:
- It should be relatively easy to implement each of the following three steps:
- Compiler IR->Microwasm;
- It shouldn’t sacrifice any of WebAssembly’s guarantees on safety or determinism.
- We should maximise the amount of useful information transferred from the compiler producing the Microwasm to the runtime consuming the Microwasm.
- We should optimise for performance when consuming a stream of Microwasm, unless it conflicts with the performance goals of optimising compilers.
- Converting Wasm to Microwasm must be possible to do in a streaming way, you shouldn’t need to block on loading a whole Wasm function before you’re able to produce Microwasm.
- Wasm to Microwasm and then that Microwasm to native code should be precisely as performant as compiling the Wasm to native directly.
The last two points are the most important in my opinion. Basically what it means is that in the backends of Wasmtime we can just wrap the incoming Wasm stream in a streaming converter to Microwasm and consume that instead. This means our backends have the benefit of consuming a simpler language while not producing worse code. This means that while Wasm can enjoy the same performance that it already does, if a compiler wants to make use of Microwasm’s abiliy to allow improved performance then it can. Writing a Microwasm backend for most compilers would be much, much less costly than writing a Wasm backend and so it’s not like we have to convince compiler developers to maintain two equally huge backend codebases.
So how does it compare to WebAssembly? Well here’s a simple function from the Wasm specification tests:
(module (func (param i32) (param i32) (result i32) get_local 1 (block (result i32) get_local 0 get_local 0 br_if 0 unreachable ) i32.add ) )
Here’s that Wasm compiled to Microwasm:
.fn_0: pick 0 pick 2 pick 3 br_if .L1_end, .L2 .L2: unreachable .L1_end: i32.add br .return
The immediate differences in the format as it exists now:
- No locals - arguments are passed on the stack when entering a function and locals are emulated by adding
pickinstructions. This essentially means that
tee_localare a no-op at runtime, they only affect the virtual stack;
- Only CFG control flow, no hierarchical blocks like Wasm - this was modelled on the Funclets proposal for Wasm;
- No block returns - only calling new blocks. Returning from a function is
br .return. This isn’t proper continuation-passing style, but it’s close enough that we get many of the simplicity benefits.
There’s another change that I’m considering where instructions that need data from the environment (for example, instructions that access the linear memory or the “table section”) have the environment passed in as an explicit argument. This reduces the special-casing in much of the translation code, but more importantly it allows us to free the register that this environment pointer would be stored in when we’re executing blocks that don’t need it. This would be a complex change to implement in the Wasm->Microwasm step though, so we’d want to work out for sure how it affects complexity and performance before making a firm decision either way.
The difference in quality of the generated code is immediately visible. Here’s Lightbeam’s assembly output for the function above before the implementation of Microwasm. I should say that the implementation of the backend that produced this is significantly more complex than the implementation using Microwasm:
push rbp mov rbp, rsp sub rsp, 0x18 mov rax, rsi test eax, eax je .L0 mov rax, rsi jmp .L1 .L0: jmp .L2 .L1: add eax, edx mov rsp, rbp pop rbp ret .L2: ud2
Now here’s the output after Microwasm:
mov rax, rsi mov rcx, rdx mov r8, rsi test esi, esi jne .L0 ud2 .L0: add ecx, eax mov rax, rcx ret
You can see that the control flow is much less noisy and the register usage is much better. The main problem you can see is that some registers are unnecessarily duplicated. In this case this can’t be avoided. We don’t know if the
block will be broken out of again by a later instruction when we’re translating the
br_if - remember, this is a streaming compiler and so we’re translating instruction by instruction - so we must assume that all arguments to the
end label are disjoint even if currently we’ve only encountered jumps to it that give arguments including duplicates. The precise limitations of a streaming compiler in comparison to a traditional compiler deserve an article of their own, but for now the only important thing to say is that an optimising compiler producing Microwasm directly would be able to avoid this issue.
For comparison, here’s the assembly produced by Firefox’s optimising WebAssembly compiler. You can see that it’s much the same as our streaming compiler can produce:
sub rsp, 8 mov ecx, esi test edi, edi jne .L0 ; .fail defined elsewhere jmp .fail .L0: mov eax, ecx add eax, edi nop add rsp, 8 ret
Why not just a normal MIR?
The idea is that we can make changes that improve the simplicity of our codegen backend, keeping the format internal so we can see where the positives and pitfalls might lie. At this stage we’re compiling directly from Wasm so there’s an upper limit to how high-quality the code we generate can be; we’re still working with the same source information. It’s more like a streaming compiler-compatible MIR, except that it keeps all the security and sandboxing guarantees that WebAssembly has.
Once we’ve got a good idea of how we can get the most out of this format, we can allow frontends like LLVM to start generating it directly, which should give us an increase in performance with no extra work on our end. An LLVM Microwasm backend would be relatively simple to implement - where it differs from WebAssembly it’s simpler and where it’s similar to WebAssembly we can just reuse code from that backend.
Why not just implement these changes in WebAssembly?
Well, that would be ideal. Maintaining a format, even a mostly-compatible one, is not an easy task. Besides, although you could have users supply code in this format in standalone environments, it will never be reasonable for multiple browsers to support both WebAssembly and a similar-but-different fork of the format. For code on the web to get the same benefits, these changes would have to be rolled into WebAssembly. So why won’t they be?
So instead we can circumvent this issue by implementing our own compatible format. Maybe the improved freedom that we have to change this format will allow us to better prototype ideas that we’d like to include into WebAssembly proper. We can only hope.
This is the final part, so I don’t have a “join us next time” sequel tease. If you want more, read some of the other articles that I’ve posted or go watch YouTube or something. Either way, thanks for reading.