Supercharging Lambdas, Node and web with Rust

I recently decided to learn and try out Rust. On StackOverflow survey Rust also has been the most beloved language for 5 years in a row, so there must be something good going for this language that probably is worth my time to warrant evaluation. For awhile, I haven’t had a low level language skill in my portfolio, mostly because I haven’t had to do low level programming in a long time, but I figured, having a skill to achieve higher performance than more higher level languages are capable of, is something that may become handy one day. And having done some C/C++ a long time ago reminded me the pain of getting memory management wrong that made me look elsewhere, which happens to be Rust’s most compelling promise, to be able to write high performance, compiled to native code language without having to worry about getting memory management wrong. I’ll be coming back to what makes Rust so cool in the latter part of this blog post, but for now let me demonstrate how Rust can be practical in our line of business since we’re not in the business of writing operating systems, kernels, drivers, firmware, boot loaders, hypervisors, high performance applications such as browsers, or any other sort of thing that usually only C/C++ or Assembly language historically have been used for.

In our AutoBlocks runtime, that runs untrusted end-user JavaScript code in cloud, we have to parse source maps so we could trace back error stack traces to original positions and show them to end user. As with all of our services, we exclusively use Lambdas, which so far have worked out great. Currently our Lambdas are configured to use 512MB memory tier, which seems to be a sweet spot in terms of price vs performance, since increasing Lambda memory limit also increases CPU computation limit implicitly. But as with most of the functions, most time is spent on IO, waiting on some other service to do its thing and send back a result. Therefore increasing Lambda tier isn’t economically sound, since it won’t speed up our functions any further, unless you actually need to use more memory, or when you actually have CPU intensive tasks to run, in which cases you will benefit increasing Lambda memory.

But in AutoBlocks runtime, when we do parse source maps, we are actually doing CPU intensive work, but beyond that we are spending most of the time waiting on other services, just like every other service. To parse source maps I’m using source-map library written by Mozilla, and what I’m noticing is that the first time around (during cold starts) parsing source maps is fairly slow, but after initial parsing, speed will increase about 10 times. This is due to combination of V8 having to load WASM (Web Assembly) module that this library is using, and letting JIT (Just in Time) compiler run its course by optimising JS and WASM code (yes WASM code gets optimised by JIT also). And the result is that once everything has been optimised, parsing source maps is blazingly fast. But the slowness during cold starts annoys me enough to look for solutions to get around it. [Previously I tried to optimise that inefficiency by pre-emptively warming the code by letting JIT run its course ahead of time], which is possible with concurrency provisioned Lambdas, and it yields pretty good results. However, achieving this required me to monkey patch the AWS Lambda runtime and also add additional bloat to the function to short-circuit certain code paths that I didn’t want to have optimised. In short, this approach introduces needless bloat into the function code, that can break any time.

This time around, instead of trying to speed up a high level language I figured what if I write that particular CPU intensive code in a low level language that virtually has no overhead, does not need to be warmed up and does not suffer from garbage collection pauses. And this is where Rust comes to the rescue, so let’s look at the benchmark results.

Benchmark of 5 combinations:

  • Node+source_map@0.7.3 - This benchmark uses the Mozilla source-map library version that the AutoBlocks runtime also uses. This library is already heavily optimised and compiled to WASM. They also seem to be doing some pre-emptive JIT optimisation magic.
  • Node+source_map@0.5.5 - This is slightly older version of the same library that does not make use of WASM to boost performance. It is entirely written in JS, so it’s a better benchmark since I want to compare Rust to pure JS and not trying to beat a version that is already heavily optimised by the same language.
  • Rust-native - This is native Rust binary that is packaged as Lambda and deployed as Lambda with custom runtime. Node isn’t used here.
  • Node+Rust+Neon - This is where I wrote the source map interop library in Rust and used Neon bindings to be able to invoke my Rust code from Node. Code can be found here.
  • Node+Rust+WASM - This is the same as above, but instead of calling native Rust code from Node, I compiled Rust part into WASM using wasm-bindgen and wasm-pack, which then gets called from Node. Purpose of this benchmark is to test agains Node+source_map@0.7.3 benchmark because now we’re targeting the same V8 WASM runtime, and compare the performance difference between WASM and native Rust code that do exactly the same thing. Code can be found here.

Here are the performance results (vertical axis is time in milliseconds and horizontal axis is number of iterations):

  • Node+source_map@0.7.3 - Cold start was 398ms and average thereafter was 56ms. As mentioned before, this library greatly suffers from cold starts but afterwards becomes blazingly fast due to heavy optimisations.
  • Node+source_map@0.5.5- Cold start was 617ms and average thereafter was 389ms. Since this version is purely written in JS it demonstrates how much WASM can improve the performance if used correctly.
  • Rust-native - Cold start was 100ms and average thereafter 121ms. As expected, native Rust code does not need any JIT optimisations to become fast, it is fast from the start. Interestingly cold start was faster, but it seems to be within margin of error. Overall performance is very stable, no GC (garbage collector) caused hiccups. Interestingly my Rust version was not able to beat fully warmed up JS+WASM combination that Mozilla has written.
  • Node+Rust+Neon - Cold start was 118ms and average thereafter 99ms. Performance looks to be the same as native Rust version, which means that Neon bindings don’t seem to incur any noticeable overhead, which is nice. Interestingly average is even faster than native Rust Lambda benchmark, but again within margin of error.
  • Node+Rust+WASM - Cold start was horrible 990ms and average thereafter 408ms. This is quite disappointing, as on average even regular JS version managed to beat it. I didn’t submit compiled WASM to any further optimisations, that could be done with tools such as Binaryen. There are also other WASM runtimes out there that could, in theory, achieve better performance than V8’s engine, such as Lucet and SSVM, but then again I wouldn’t be able to beat Mozilla’s version by switching out WASM runtime.

Here are results for memory usage:

Native Rust or Rust compiled WASM versions use significantly less memory. This logic also applies to WASM binaries that are compiled from Rust. Rust is known to be memory friendly, so it is no surprise.

Benchmark conclusion

  • source-map package is already heavily optimised, and is hard to beat even with native code when JIT has had a chance to do its magic.
  • Performance difference between warmed up pure JS and native Rust function is noticeable, almost 4 times, as expected from all high level languages when compared to C/C++ or Rust. When comparing warmed up difference for Mozilla library the difference is staggering 7 times (389ms vs 56ms). Hiccups in JS code are also visible, garbage collector or something else was doing its thing. Such things don’t affect Rust code, unless imposed upon by the host runtime by pausing native function call for running stop the world garbage collector pause.
  • sourcemap crate does not seem to be particularly well optimised. Since JS+WASM combination beat it in native test and when it was compiled to WASM then equivalent library written by Mozilla utterly annihilated it (7.4 times the difference in warmed up average). But still, even unoptimised native code can massively beat well optimised JS, or any other high level language for that matter (most of the time).
  • As expected, native code has very stable performance characteristics and does not suffer from cold starts.
  • Rust based code consumed significantly less memory than pure JS version. Handy if you want to fit more stuff into a low tiered Lambda without having to increase memory. These days you can increase Lambda memory up to 10gb, but single threaded CPU scaling still probably levels off around 2GB mark (needs verification).
  • Writing performance critical functions fully in Rust, or augmenting existing high level function code with native Rust code, or compiling Rust into WASM where only WASM can run (such as browsers) can yield noticeable gains in speed. Worth knowing your options.
  • Right now I’m not going to swap out the existing source map parsing implementation because it isn’t actually hurting us, and if ignoring cold starts then it actually offers unparalleled performance, which my own version cannot match other than beating it in cold starts, so it’s a double edged sword. But generally it is good to know that we do have alternatives, and how to use them.

What is JIT and why it matters

JIT stands for Just in Time compiler, as opposed to AoT (Ahead of Time) compiler. In simplified terms the purpose of a JIT compiler is to observe the running, often times intermediate code, and re-compile resulting machine code to more efficient versions over time, so your high level code can run faster. But in order for JIT to do its magic, it first has to observe, and thats why cold starts are slower. Low level languages generally don’t have a JIT compiler since they have extremely small runtime (said to have no runtime). Languages like C/C++ and Rust don’t have JIT compilers, not at least in the form like high level language runtimes have, but instead their AoT compiler will produce machine code directly, often times highly optimised, and this is where their speed comes from, since there is very little overhead, because for JIT to be effective, additional logic has to be evaluated while your code runs, so JIT would know when and what to optimise. For example, initial version of JVM didn’t come with JIT compiler, and it was awfully slow. When talking about JIT, it is important to know what “happy path” means. Happy path means that JIT has figured out a way to produce the most optimal machine code for your high level code, but if you’re not careful, you can ■■■■ up your happy path, especially with dynamic languages, such as JS. Blowing up JIT happy path means that JIT either has to re-compile to less efficient code, or has to fetch less efficient version from the cache. Either way, your performance will take a hit.

Here’s an example in JS:

function add(a, b) {
    return a + b;
}
 
add(1, 2); // JIT: nice, keep it up
add(1, 2.0): // JIT: why?
add(1, "2"): // JIT: oh come on
add(NaN, Infinity): // JIT: ■■■?
add([], {}) // JIT: for the love of god and all that is holy

We have defined a simple function that adds 2 numbers together. V8’s JIT then recompiles it internally using 31 bit integers, which leads to really fast compiled code. Least significant bit in V8 is reserved for tags (V8 gubbins, don’t worry about it). Then you call that function again, but then you represent the secondary argument as fractional number, which otherwise could have been an integer. JIT now need to re-compile that function, either to convert first argument to 64 bit floating point number, or convert second argument back to integer. It probably goes with 64 bit floats, because it suspects you may actually start using floating point numbers. And unless you really will, then you just blew your performance for no reason, because generally speaking calculating with 64 bit floating point numbers, instead using 32 bit integers (ignoring V8 specifics here), is slower for a CPU to perform. Rest of the function call are all legitimate JS code. As you can expect, things get much worse from here. Do you even know what’s the outcome in JS when you evaluate 1 + “2”, or NaN + Infinity, or + {}? To avoid most of these pitfalls in JS, simply use TypeScript, and use it properly (in strict mode). While TS can’t protect you against everything, such as the difference between integer and floating point numbers, then it can protect you against all other examples I demonstrated that makes JIT just want to give up.

You can check this extremely well presented talk about how V8 internals work. Previously TurboFan was also used to optimise WASM code, but these days that task is offloaded to a dedicated compiler called Liftoff. If you previously checked my link to Mozilla’s implementation, where they are pre-emptively providing information for JIT by using a hidden class/shape, then this video also helps to understand how V8 internally uses these shapes.

In more classical computing where your language’s virtual machine or a runtime is allowed to run for years at a time, this really isn’t a noteworthy concern. But in FaaS (Functions as a Service) world, especially with Lambdas, colds starts and cold code become more noticeable, simply because runtimes in this world are short lived, and every time a new runtime needs to be spooled up, JIT has to start from scratch. In AWS there are 2 distinct parts to cold starts, gubbins that AWS has to spool up for you, and then your code that needs to be warmed up. While there isn’t too much you can do to speed up AWS cold starts, other than certain optimisations and using provisioned concurrency, then you have full control over your code, and if waiting on JIT to warm up your code is not an option, you can simply write some parts of the code in a language that does not need to be warmed up, and Rust can be one of these languages.

About Rust

Rust is relatively new language designed for systems programming by Mozilla that primarily competes in the domain that C and C++ are being exclusively used for. This language can be considered strongly typed with nominal type system with good type inference, and supports following paradigms: concurrent, functional, generic, imperative and structured. Mozilla initially built it for enhancing Firefox browser with an emphasis on memory safety and safe concurrency. The core value proposition of Rust is memory safety, because C or C++ which are being used in this domain for almost half a century are inherently unsafe languages, and in that industry it’s a huge and very expensive problem. We who have the luxury of writing high level languages really don’t have to worry about getting our memory management wrong and accidentally causing billion dollar security vulnerabilities that are very difficult to track down, or find in the first place. Runtimes of high level languages check our code safety and prevent us from doing such mistakes. We also don’t have to worry about cleaning up our memory, garbage collector (GC) will do it for us. Closest thing we have to worry about when it comes to memory management is leaking memory, but low level languages, where you have direct control over how you work with the memory, are prone to having a class of bugs that are related to memory safety. These bugs include, but are not limited to: pointer arithmetic errors, dangling pointers, use after free, buffer overflows, out of bound accesses, etc… Recent studies by Microsoft and Google point out that whopping 70% of all security vulnerabilities are related to memory safety. Reportedly Mozilla released a similar study, but right now I can’t find the source. Over the years industry has tried to address these issues, by introducing safer features into the language, building good practices and standards to follow, building myriad of static code analyser, and all sort of things in between, but that 70% number does not seem to be going down. The main problem is that all the unsafe features are still lingering around, and we’re humans after all and we make mistakes. I recommending reading this excellent 3 part blog post by someone who has been doing C for decades and decided to start using Rust to avoid pitfalls of C and C++. A TLDR quote from his blog post:

Okay, I get it. But I like C. Can’t we just fix C/C++?
A ton of work has been poured into making warnings smarter, improving linters, documenting best practices, and so on. There are shiny new helpful bits in C++11/14/17: unique_ptr, range-based for loops, and RAII all can help prevent bugs (if you use them). There are standards organizations like MISRA and security organizations like CERT to help you find and fix critical safety and security issues, if everyone on your team follows those recommendations to the letter without fail. But the sharp detritus of K&R C is still littering the floor, and nothing stops you or the guy next to you from ignoring the caution tape and tripping onto it.

The K&R C refers to original dialect of the C language, K&R are the initials of the authors of C, Brian Kernighan and Dennis Ritchie respectively.

While Java and other languages that came thereafter solved the memory safety problem by introducing a runtime and a garbage collector, then because of the overhead, and the need for a runtime makes these languages still unsuitable for domains where low level languages are still used today. For the correctness sake, Java wasn’t a first to introduce a virtual machine, there were others who made a use of virtual machines before, for example Erlang, but Java really took off with that sort of implementation, probably because Java borrowed C style syntax and OOP paradigm from C++, so it was already familiar to C/C++ programmers.

And this is where Rust comes in. Rust aims to solve all of these problems without introducing a runtime or a garbage collector, but still being able to ensure memory safety and perform automatic memory management, but with zero or very little overhead that is comparable to C and C++. So you can say that if all the software that currently C/C++ is being used for were written in Rust henceforth we would see a reduction in memory related security vulnerabilities drastically. We’ll see, there probably still will be memory related issues, since for certain advanced use cases you still need to use raw pointer which Rust allows you to do by using unsafe Rust, but when things ■■■■ up, you only have to look into these unsafe parts of the code to find the problem, because everything else is audited by a Rust compiler to be memory safe, absent of data races, and use of correct concurrency/multithreading. The way Rust achieves it is by having a really powerful AoT compiler that forces you to write a code in a manner that compiler can understand and verify that the code is memory safe. Rust compiler also implicitly inserts memory deallocation statements, which you don’t have to do manually, like in C/C++, although you still can clean up memory pre-emptively if you want by calling drop.

Apart from making low level programming safe again I can recall following notable attributes of Rust:

  • As mentioned, Rust guarantees absence of data races.
  • Fearless concurrency (getting multithreading right)
  • Zero cost abstractions, for example generics in Rust are zero cost by default, as long as monomorphism is used, though for certain scenario a dynamic dispatch can be used as well for flexibility, which incurs a little runtime overhead.
  • Great documentation, if you want to learn the basic of Rust you only have to read The Book
  • Great tooling, for me personally great VS Code support, but many different editors and IDEs are supported, including IDEA. Default package manager, linter, docs generator, etc…, in short, batteries are included.
  • Modern language with modern features.
  • Editions guarantee backwards compatibility, but allow breaking changes in form of introducing new editions, so the language could evolve and wouldn’t have to carry around legacy cruft. Currently a new edition is released once every 3 years.
  • First class WASM support
  • Non-nullable data types, no variable can be null by default or assigned null to, if you want to mimic null value you have to explicitly use Option type which forces you to always check if the value is None or Some. This is similar to how TypeScript strict mode forces you to check that the value is not undefined in JS. This is important because null pointers are a billion dollar mistake. Dart 2, which came out recently, went to non-nullable by default also.
  • Immutable variables by default.
  • Really powerful enums, enums can contain values and are implemented as algebraic data types, this works really well with pattern matching.
  • No classical object oriented paradigm (good for some, bad for others). No inheritance for example, you’re encouraged to use composition over inheritance. Though people abuse Deref trait to simulate classical inheritance, which is considered to be an anti-pattern. Rust makes use of Structs and Traits, and on paper this approach looks more flexible than inheritance based OOP. For example in higher level OOP languages, such as Java and C# you cannot inherit from multiple parent classes to prevent diamond problem. Since Rust does not use inheritance you can technically achieve similar result, but by other means. Structs contain data and Traits are like interfaces that can implement methods on Structs. Methods in Rust (not functions) allow you to define methods that work with some specific struct data, which allows you to compose fluent code instead of use pipe operator that is more common in pure functional languages. This is actually really similar to C#'s extension methods, which is really powerful feature.
  • Healthy influence from functional programming domain.

I recommend watching this video if you really want to know more what Rust is about.

So, overall looks too good to be true, but where’s the catch, there must be one? Catch is that to learn to program in Rust you have to understand Rust’s ownership model. Since this is a novel idea, you have to rewire your brain a bit. If you’re coming from C/C++ then this is a paradigm shift for you when it comes to memory management. If you’re coming from high level language then this is new paradigm that you need to understand, a new domain that was previously taken care for you without you having to invest any mental capacity in. But people who have managed to learn Rust claim that they don’t want to go back, similarly when JS dev learns TS, they generally don’t want to go back.

Just being able to read Rust can be beneficial too, because new systems will be built increasingly more in Rust, for obvious benefits, and if you have an access to source code you can answer questions that are not publicly documented, things that happen under the hood for example. For example Deno, which hopefully will become Node 2.0 and Firecracker which powers AWS Lambda and Fargate are written in Rust, so you just can go and read their source code if you need to know more how they work.

For me personally, I have found this journey to be quite pleasant so far, but I’ve only done so little (benchmarks code that I built), and if I need to do more, I certainly will need to fight with the borrow checker a bit more until I learn all the quirks/rules of the compiler.

Screenshot of Rust in VS Code:

Rust and WASM (WebAssembly)

One of the compelling features of Rust is that it has an excellent first class support for WASM. Rust and WASM after all are both Mozilla’s projects. asm.js which is a predecessor to WASM was also invented by Mozilla.

To get started with WASM in Rust you just have to install wasm-pack that pulls in wasm-bindgen. I used this approach to build Rust+WASM benchmark, and it was extremely straight forward. I only needed to generate a starter project and add my code, and then used wasm-pack compiler to bundle it all up, that generated me a WASM file and JS glue code that I then got to use directly from Node (it can also generate for browsers). All the code that I had to write can be seen here. wasm-pack also converts (most of the time) between Rust and WASM data types implicitly, so you don’t have to do anything extra.

Rust native in Node with Neon bindings

Another technology I used to be able to call native Rust code from Node is Neon bindings. This generally was also a pleasant journey, Neon also provides a CLI that can generate a starter template and build the final output. However, compared to wasm-bindgen I needed to perform explicit data types conversion between Rust and JS, for example here I’m building JS object to be returned from Rust function. Another thing to keep in mind with Node native modules is that the distribution can be a faff. Historically Node’s ABI (Application Binary Interface) has had 3 dimensions, a Node major version, an operating system and a CPU architecture, which makes native modules distribution a pain. N-API somewhat relaxes these rules, but doesn’t remove them entirely. Thats why bundling into WASM can be much better alternative since WASM compilation target is universal, but as you can see from the benchmarks, a native module can be much faster. Neon bindings do offer pre-compilation option if you’re willing to integrate with Travis and GitHub. They seem to also wanting to build a pre-compilation option into the CLI, but that’s a work in progress, and has been for a while. I just compiled while installing the module/package via npm or yarn, which requires having a Rust compiler installed, which for me wasn’t a problem. For running my benchmark in Lambda, I had to go and compile it on Amazon Linux 2 that Lambdas internally use as the operating system.

Deploying Rust directly to Lambda

While AWS does not offer official runtime for Rust, then you can still package Rust code and run it as with custom runtime in Lambda. Reason why I think they don’t offer an official runtime it is that Rust technically has no runtime or an interpreter, however AWS has a wrapper library to make building Lambda functions in Rust quite straight forward. I followed these instructions and they worked perfectly and was pretty easy to get started. This potentially can be the lowest overhead option when it comes to running Rust code in Lambdas, but as benchmarks show, Neon bindings don’t seem to incur any noticeable overhead so you can still easily augment your existing Node code with native Rust.

Conclusion

Hopefully you learned something new, and now you know more about Rust and know when it can be beneficial, especially in Lambdas and/or Node world, since the interop options are quite straight forward and yield huge gains. And not to forget WASM in general, since in browsers you cannot run native Rust code, but you still can build certain parts of your webpage in Rust and compile your high performance code to WASM for ultimate performance.