WebAssembly Troubles part 2: Why Do We Need the Relooper Algorithm, Again?

Preamble

This is part 2 of a 4-part miniseries on issues with WebAssembly and proposals to fix them. Part 1 here, part 3 here, part 4 here. This article assumes some familiarity with virtual machines, compilers and WebAssembly, but I’ll try to link to relevant information where necessary so even if you’re not you can follow along. Also, this series is going to come off as if I dislike WebAssembly. I love WebAssembly! I wrote a whole article about how great it is! In fact, I love it so much that I want it to be the best that it can be, and this series is me working through my complaints with the design in the hope that some or all of these issues can be addressed soon, while the ink is still somewhat wet on the specification.

So if there’s one thing people know about WebAssembly control flow, it’s that it doesn’t allow goto. goto, we’re told, is considered harmful, and so WebAssembly only implements simple control flow constructs:

block, which you can break to the end of;
loop, which you can jump to the header of;
if, which allows you to choose either the then or else branch depending on a condition;
br, which unconditionally jumps to a block’s “target” (so the end for a block or if..else and the beginning for a loop);
br_table, which can jump to the target of one of many blocks based on a runtime value;
and br_if, which either jumps to the block’s target or continues depending on a condition.

Redundancy

The eagle-eyed among you may have noticed that there is already some redundancy in those constructs. if..else can be implemented easily using block and br_if. It looks something like so:

; Version using `if`
(if (result i32) $cond
  (then
    ; ..then branch..
  )
  (else
    ; ..else branch..
  ))
; Version using `block` and `br_if`
(block (param $cond i32) (result i32)
  (block (param $cond i32) (result i32)
    (br_if 0 $cond)
    ; ..else branch..
    (br 1))
  ; ..then branch..
)

Except whoops! That doesn’t work! In the current version of WebAssembly, blocks can’t take arguments and so the only way to implement if with block/br_if is with locals, which without extra optimisations are significantly less efficient. I touch on this in my previous article but there’s a proposal to fix this. Let’s say this gets fixed though, and blocks gain the ability to take arguments. In this case, not only are these two pieces of code identical, but even a baseline non-optimising compiler will produce identical assembly for both of them. The second one takes more bytes to represent but it’s a simple recurring pattern, which makes it easy to write domain-specific compression for it¹.

Deeper down the rabbit hole

It’s not news to most programmers that really all of these constructs are redundant if WebAssembly had goto. If you had goto and some goto_if variant that conditionally jumped to a label you could implement all of these constructs. So why doesn’t WebAssembly support goto?

The classic argument against goto is that it leads to spaghetti code. WebAssembly isn’t designed as a front-end language for general programming though, so what’s the problem? Well, WebAssembly does have some constraints, specifically it must produce code that is valid. WebAssembly is a stack machine, and you can’t jump to just any label since that label might point to code that pops too many values off the stack, or pops the wrong types off the stack, or pushes too many values onto the stack². Essentially raw gotos allow you to do all kinds of evil stuff. What you really want is a list of blocks like WebAssembly has now, each taking some number of arguments and returning some number of values onto the stack, and that you can call like function. Now you can have arbitrary flow between these blocks, like a typed version of goto.

Now I’m not some visionary genius and I didn’t come up with this idea, nor was I the first to apply this to WebAssembly. The concept has been implemented in compilers for a very long time, where it is known as a control flow graph or CFG. A CFG is usually combined with a code representation in SSA form, and if you’ve read my previous article you probably know where this is going by now but let’s work through it together anyway. As for implementing it in WebAssembly, I was introduced to this as it applies to WebAssembly by the author of the WebAssembly Funclets proposal, which implements this basic idea.

Why bother?

CFGs are the backbone of almost every static language compiler out there. Both LLVM and GCC use a CFG to combine the different control flow constructs like for, while, if and even goto into a single representation that can be used to define them all. This causes a problem for compiling to WebAssembly. Since WebAssembly doesn’t allow arbitrary control-flow graphs, compilers have to detect when some set of nodes in the CFG can be represented using WebAssembly’s control flow constructs and emit them, falling back to a complicated algorithm like Relooper or Stackifier (no link for the latter, it’s only mentioned by this name in LLVM internal discussions) to emulate complex control flow. Essentially the fallback algorithm converts the CFG into a the equivalent of a loop combined with a switch statement, which can be extremely slow.

Currently compilers that convert WebAssembly to native cannot generate performant assembly for this “fallback” code, despite the fact that supporting arbitrary CFGs in WebAssembly comes with no downside in terms of runtime performance, security or correctness. WebAssembly control flow is a lossy conversion - we start with a CFG in LLVM which gets converted to WebAssembly control flow, then the native compiler converts that control flow back into a less-efficient CFG which then has optimisations applied and is then compiled to native. For most compilers the only way to get efficient code is to detect the loop-and-switch emitted by Relooper and Stackifier and convert it back into the arbitrary CFG that it started as. You’re basically having to write a slow, error-prone algorithm to convert a CFG to Wasm and then another slow, error-prone algorithm to convert the Wasm back into the CFG that it started as. No wonder GCC developers simply gave up implementing a WebAssembly backend as it was much too difficult to correctly implement a relooper-style algorithm over their CFG representation.

Why is this not already implemented?

Well for a start it requires arguments to blocks and multiple returns to be supported, neither of which are in the MVP. As for why those aren’t supported in the MVP, I don’t really have a good answer. As with my previous article, the question really comes down to WebAssembly’s history as a statically-typed binary encoding of JavaScript and the lack of serious implementations while the specification was being written. Unlike the previous article, I find it utterly baffling that the issues with compiling to WebAssembly’s inexpressive control flow from any modern compiler weren’t so glaringly obvious as to prevent the release of the MVP until they were fixed. The development of emscripten should be enough proof that structured control flow like this is a very poor compilation target. I wasn’t there for the development of the specification though and so I can’t speak for what was going on behind the scenes.

There is one thing that may have been a major factor to the decision not to adopt arbitrary CFGs for WebAssembly control flow. I believe that V8 is an exception to most compilers in that it doesn’t represent code as a CFG at all - it maintains roughly JS-compatible control flow all the way from front-end to codegen. In order to support this V8 would have to be converted to use CFGs, they’d have to implement something like Relooper internally, or they’d have to write a new WebAssembly runtime from scratch. Google was and is a major roadblock to getting this implemented as they have multiple people on the committee in charge of WebAssembly who have veto power.

Join us next time for a peek at the stack. I promise that this next one will be the last negative post.

The simplest compression method would be to implement macros - in the header you define a macro that takes some arguments and produces one or more instructions. I’m not certain if there’s an implementation of this that would provide better results than simply using a general-purpose compression algorithm, however. ↩︎
The JVM is a stack machine that uses goto with untyped blocks for control flow, but to maintain safety it must disable the JIT entirely when hitting code that can’t be converted into control flow that resembles the restricted control flow of current WebAssembly. Instead, it falls back to an interpreter for the bytecode. Unlike the JVM, however, we have the option of building typed blocks into our format from the start. ↩︎