Pre-jitting in AWS Lambda functions

Using CrossGen (part of CoreCLR) to reduce the critical cold start latencies from a .NET Lambda function by pre-jitting the assemblies and storing it in a Lambda layer.

Compiled languages such as C# tend to be faster than interpreted code. This is because translating code at run time adds overhead. Performing the translation before the execution of the code in order to end up with just a binary or native code is what distinguishes 'compiled' from 'interpreted'. The binary or native code code can be run over and over again with no need of re-compilation or minimal compilation which has much less overhead than interpreting.

Everything good in technology comes with a tradeoff. Interpreting code is the art of compiling code line by line in memory and executing it on the fly. This means code interpreting does not have to make a translation at all prior to executing our code. Saving the time we would've spend on compiling it.

In this article we'll explore the benefits of pre-compiling some of our assemblies and wrapping it in a Lambda layer. I can imagine that if you're new to this concept it all still sounds very abstract. Fear not, we'll dive into this.

Lambda cold starts

In a previous article we uncovered the magic behind Lambda. We now understand how execution environments are provisioned and how the execution path looks like. If you've not read the article, I highly recommend doing so as you'll get a better understanding of why newly provisioned execution environments take a bit longer to serve an invocation compared to an execution environment that already served one.

Provisioning a new instance has some additional overhead as it needs to take care of a few things before being able to serve your invocation:

  • Create the execution environment.
  • Provide a runtime, in our case dotnetcore3.1.
  • Download and unpack the source code from S3.
  • And some more.

You can imagine provisioning a new execution environment that is able to serve your invocation takes longer than an execution environment that already served one. A Lambda execution environment is bound to a function, execution environments will never be re-used across different functions. Though the same execution environment may be re-used for multiple invocations. If an execution environment is re-used for an invocation we save ourselves all the time we spend on unpacking and providing a runtime as it's already done.

The time we spend on provisioning a new lambda instance is what adds this additional execution time to your invocation. This additional time is what we refer to as a 'cold start'. The most time is spend on compiling the code down to native code, you can imagine why so many Lambda functions run Python.

The AWS Lambda team has put a tremendous amount of effort into reducing cold starts or not hitting a cold start at all. From improved VPC networking to provisioned concurrency to lambda layers and many more features.

A .NET application recipe

'dotnet publish'. This command does very important things. It publishes an executable result set including its dependencies. It compiles it into common intermediate language (CIL) with a .dll extension. It generates a .deps.json file that includes all of the dependencies of the project. It also generates a .runtimeconfig.json file that specifies the shared runtime that the application expects, as well as other configuration options for the runtime (for example, the garbage collection type). And publishes the application its dependencies which are copied from the NuGet cache as CIL into the output folder.

All of this together makes it possible to run a .NET application. Obviously there's some more going on behind the schenes but we're rather interested in the big picture. We mentioned that the output of dotnet publish is an 'executable result'. To run this executable result set we are using a JIT (Just-In-Time) compiler in order to translate the CIL into native code, something that can be understood by the CPU. More on JIT and translations will follow in a dedicated section on compiling. This should just give u an idea of what happens before you are able to run your application.

Runtime package stores

The .NET team provided us a way to store packages in a directory on disk. The so called 'runtime package store'. Under this directory there's a hierarchy of CPU architectures and target frameworks storing pre-compiled packages into CIL or native code. Lets explore the benefits of this store and how to work with it. Below is a visual of a runtime package store directory its hierarchy.

The runtime package store is introduced to optimize apps for faster deployments and giving them a lower disk space footprint. Every machine can have a runtime package store. The idea of the runtime package store is that the host machine is responsible for having the assemblies already in its own runtime package store so that they don't need to be part of your deployment package. We can store assemblies in our own runtime package store by creating a package store manifest file which holds a list of (e.g. NuGet) packages. This file uses a .csproj extension and only contains package references, in our case a few popular NuGet packages.

I've setup a sample C# Lambda function in a Github repository containing such a package store manifest file for this article. This file can be found under the /Dependencies directory. The Sample.Lambda.csproj has a reference to this Dependencies.csproj.

To write our packages against a specific runtime and framework target we use the 'dotnet store' command. I'm targeting our Dependencies.csproj with my linux distribution version (ubuntu.20.04-x64) and pass netcoreapp3.1 as a value for the '--framework' argument.

The output of the dotnet store command can be either CIL or native code under its respective directory. Along with a 'artifacts.xml' manifest file that holds references to all the packages including implicit package dependencies. As our output below confirms, an additional implicit dependent package is included.

Now that the packages are stored in the runtime package store we can publish our .NET applications against it. Lets run 'dotnet publish' and pass a '--manifest' argument with the path to the manifest file. Something like below.

When running dotnet publish with a '--manifest' argument the target manifest is used to trim the set of packages published with your app. Publishing a trimmed result set and running it on a host that misses one of the packages listed in the target manifest will result in an application that fails to start. The host is expected to have the packages in its runtime package store.

Lets take a look at the package size from a sample C# Lambda function, with and without offloading our dependencies to a package runtime store. Running dotnet publish against our sample csproj without using a runtime package store results in a .zip file with the size of 482 kb. This would run out of the box as all of our dependencies are present in the publish folder.

Time to do the same, but this time we'll offload our dependencies to the package runtime store by passing a '--manifest' argument referring to the artifact we had build earlier with our 'dotnet store' command. This will leave out all of the dependencies from the /publish directory.

Our /publish directory is now only 48 kb! That's exactly 10 times less than the /publish directory containing the dependencies. Our zipped Lambda function would be very lightweight.

Lambda layers

With Lambda layers you are able to inject any additional data in your Lambda function. A Lambda layer is a .zip file containing that data. This can be a custom runtime image, additional code or config files your function relies on. The layer can be shared between multiple Lambda functions. Lambda layers are usually stored and retrieved from S3. When deploying your Lambda function you can pass a Lambda layer along to the function by referencing the ARN of the Lambda layer. Lambda extracts the layer contents into the /opt directory when provisioning the execution environment of your function and refers to this location through the environment variable 'DOTNET_SHARED_STORE'. Which is set when deploying a .NET Lambda function along with a .NET compatible Lambda layer.

Why are we interested in Lambda layers? To answer that question we'll have to do a step back and talk about the runtime package store again. Remember how the runtime package store contains our pre-compiled packages in a directory containing a hierarchy of CPU architectures and target frameworks? Our Lambda layer is going to become our runtime package store! Think of the Lambda function as the 'host' machine that we publish against. During the deployment of the Lambda function we inject a Lambda layer that under the hood extracts the pre-compiled dependencies and stores it under the /opt directory. This is good because we have a smaller memory footprint of our published application as we offload our dependencies to a Lambda layer.

Lets create a layer from the Dependencies.csproj that we earlier stored in our local runtime package store. Before publishing a Lambda layer to AWS you'll need to have an S3 bucket, I called mine 'my-lambda-layer-bucket'. You'll also need to install the Amazon Lambda Tools for .NET by running 'dotnet tool install -g Amazon.Lambda.Tools'. The command 'dotnet lambda publish-layer --layer-type runtime-package-store' is essentially just a wrapper around the 'dotnet store' command.

Lets navigate to the /Dependencies directory in our sample repository. The command will look for a .csproj in the current directory. Lets run it.

As you can see, it's implicitly creating .dll files and adding them to the zip. When the zip is created under the /tmp directory it's than uploaded to S3.

Lets do a get on the Lambda layer in order to see the result of publishing one.

As you can see the output of 'dotnet lambda get-layer' by ARN returns us details about the runtime package store we've created. The manifest is located in S3 and the package directory is set to '/opt/dotnetcore/store'. We also have an overview of the manifest its contents, which contains the same list of dependencies we had in our local runtime package store before. There's also a property called 'Packages Optimized' which is set to False. This indicates that the packages are pre-compiled into CIL. If the property was set to True it would indicate that the packages are pre-compiled into native code, more on that later!

Time to deploy our lightweight Lambda function along with the Lambda layer. We simply do so by running 'dotnet lambda deploy-function' and pass the Lambda layer by arn in the '--function-layers' parameter.

Our Lambda function is deployed. Lets validate that the layer is attached to the Lambda function and that our Lambda function code size is considerably small.

Looks like it is, 18 kb for the size of the function and our published layer is there. Lets send a sample request to it in order to test the cold start.

"Duration: 962.54 ms" and "Init Duration: 232.83 ms" for the first invocation (cold start) and "Duration: 0.98 ms" for the second invocation. Adding this layer would not improve anything. Deploying a Lambda function will always do an implicit dotnet publish in order to build .dll files. By using the Lambda layer we have now only stored our .dll files in a different directory: 'opt/dotnetcore/store' that we refer to when compiling the code into native code when they are first loaded into the .NET Core Process.

A C# compilation story

Lets get a little closer to the 'bare metal' of our machine, but not too close! We'll dive into the compilation process of our C# Lambda function. The compilation process can be split up in 3 states (C#, Common Intermediate Language and native code) and 2 stages (C# to CIL and CIL to native code).

C# to CIL

The code we as engineers write is very human-readable and logical. The reason we translate our C# code into CIL (which is a CPU-independent set of instructions that can be efficiently converted to native code) is because we need to be platform agnostic. Maybe it's better to refer to 'we need an intermediate state'. We need this intermediate state because we want to be able to run our C# code on multiple platforms (Windows, Linux, etc). Anyone who wants to create a .NET applications only needs to know how to translate the C# code into CIL. When we're on a specific platform we need to convert the CIL into its own specific native code. If we would leave this stage out, we would have to do something very inefficient: recompile for every platform we wanted to support.

What is JIT?

JIT stands for Just-In-Time. Just in time becaus we make the translation from CIL to native code on demand at runtime. JIT compilation takes into account that some code might never be called at runtime. Instead of allocating extra time and memory to convert all the CIL into native code ahead of time, it converts the CIL into native code when a particular piece of code is requested.

JIT to native code

The JIT compiler sits between the generated CIL and the CPU. It takes CIL (platform agnostic) and translates it into platform specific native code. At runtime when a method is requested for the first time the JIT compiler will translate the CIL into native code and store that resulting native code in memory so that any subsequent call to that method can be accessed faster.

What is pre-jitting?

The difference between 'normal' JIT and pre-jitting is that for the latter we use CrossGen (part of the CoreCLR). We already mentioned that 'normal' JIT compiling is done per method, it only returns us the bits we need during runtime and it then caches the resulting native code. CrossGen does this up-front, it compiles the CIL into native code in a single compilation cycle and installs them into the native image cache. The advantage is that we don't have the initial compilation latency that we do have when using 'normal' JIT. The runtime can use native images from the native image cache instead of using the 'normal' JIT compiler.

Pre-jitting our layer

Fundamentals are everything, is what one of the architects at my first software engineering job kept telling me over and over. I barely understood what 'static void Main(string[] args)' meant and I was already trying to wrap my head around how one could achieve parallel programming. 'Back to basics Bruno', is what I heard every other evening. Since then I've appreciated fundamentals so much, I force all my readers to brush up on their fundamentals before touching the topic. Thanks for the reminders Jos.

Anyhoo, back to Lambda layers. Remember the property 'Packages Optimized' which was False? Lets understand the difference and try setting it to 'True'.
By reading a paper from @normj on Github he states the following.

"A feature of a runtime package store is that .NET assemblies placed into the the store can be 'optimized' for the target runtime by pre-jitting the assemblies. In order to create an optimized runtime package store layer you must run the publish-layer command in an Amazon Linux environment." - @normj

I've been watching Norm for quite some time now on Github and as his bio says he 'makes .NET great on AWS'. Which is very true, the community is happy with his work and always looks forward to more of it. Please check out his Github contributions, it's truly amazing and inspirational.

So Norm states that the packages can be optimized by pre-jitting them, but we'll have to do it on an Amazon Linux distro. Lets give it a try. I've launched a t3.micro EC2 instance and cloned our demo repository. Be sure that you are able to access the S3 bucket from your EC2 instance. I had configured my profile on the machine by using Ec2InstanceMetadata as a credential_source.

It took a few minutes to setup the machine and configure the necessary dependencies, obviously you could automate all of this. Lets publish the layer by running 'dotnet lambda publish-layer' and passing 'true' to the '--enable-package-optimization' argument. Passing 'true' to that argument indicates that the .NET assemblies should be pre-jitted.

Great, the new Lambda layer is published and the ARN is at the bottom of the output. By looking at the logs the necessary .map files were created. Lets take a look at the content of the Lambda layer in S3. Compared to our previous published Lambda layer we would expect not to only see .dll files anymore.

More files, .map files to be specific! Everything looks good, the .NET assemblies are pre-jitted and when running 'dotnet lambda get-layer-version' we can validate that the property 'Packages Optimized' is now set to 'True'.

Lets update our Lambda function. This time we'll pass the new Lambda layer ARN to the 'dotnet lambda deploy-function' command.

Looks ok. The logs in the output indicate that the function is being updated. I'm very excited to test the cold start latency from the Lambda function after we've pre-jitted the content from the Lambda layer (our .NET assemblies).

Lets revisit our cold start

I'm sure by the time you've reached this you won't remember the durations from the Lambda layer that was not pre-jitted. Let me refresh your memory.

"Duration: 962.54 ms" and "Init Duration: 232.83 ms" for the first invocation (cold start) and "Duration: 0.98 ms" for the second invocation."

We would expect the "Duration" to be significantly reduced as we took away an extra step (we've now compiled everything into native code during 1 compilation cycle). We've updated the Lambda function, so lets invoke it. The duration on the second invocation should be the same as the first function its second invocation as all the necessary native code has already been produced in the first invocation and is retrieved from the cache.

"Duration: 613.28 ms" and "Init Duration: 218.69 ms" for the first invocation (cold start) and "Duration: 0.98 ms" for the second invocation. Snap! we saved around 350ms. That's roughly one-third of the not pre-jitted Lambda function its init duration. We also confirmed our expectation about the second invocation, the duration of the invocation remained the same. 'Cool!'

Make it part of your deployment!

Now that we know what pre-jitting is and what the benefits are I must say it was quite some work to setup, especially a lot of manual tasks. There must be a quicker way right? I did some digging and I saw that SAM supports it, but you'll need to reference the Layer by ARN which is a bummer. I want to publish my Lambda layer everytime I deploy my serverless.template (I like my layers versioned). This is currently not supported out of the box unless you prior to the SAM deployment publish your Lambda layer and pass the ARN to a variable in the serverless.template. I've created a Github issue that will hopefully get picked up by the team.

I have also experimented with SAM in another way, you can define a LayerVersion in the serverless.template by passing something like a git version in the S3Key. Note, you must push the content (your data in the local runtime package store) to S3 in a step prior to the SAM deployment (which is a hassle).

So, is there something else? Yes, remember that Norm mentioned we could only publish a pre-jitted Lambda layer from an Amazon Linux platform? AWS published a docker image from their Amazon Linux 2 distribution on docker hub. After playing with it I managed to setup something that works.

I first install the dependencies, declare 2 build arguments that I pass to the environment variables when running 'docker build' and then clone the demo repository. I set the current path of the container to where I stored my 'Dependencies.csproj' and we then publish the pre-jitted .NET Lambda layer. Note, you can set the name of the layer to whatever you want, including a git version for example.

We then take the ARN from the output and pass it to a variable in the serverless.template, voilĂ . These were the only things I was able to come up with, if you have a better idea please let me know.

If you're keen on trying this out yourself, you can run the dockerfile by running 'docker build' and passing in the appropriate values as per below.

Btw I'm not saying that dockerizing should be the preferred way. There's more ways to Rome, give it a try in Jenkins and see how far you get. The moment you build and package your .NET application is maybe also the moment you should build and publish the Lambda layer?

Conclusion

Pre-jitting your .NET assemblies is a great way to reduce the cold start latency of your Lambda function. Going down the pre-jitting and reducing cold start latency road is also a great way to learn a thing or two. I had a blast working on this article and managed to automate it with Jenkins in the end. The managed execution process of .NET is very complex and I encourage you to read about it as it's full of interesting features. Lets hope that the AWS .NET team will look into the Github issue I raised to make this a bit easier.


Footnotes

  • I've setup a demo repository for this article if you would like to give it a try. Feel free to check it out and raise questions as issues or just ping me.

  • The pre-jitting is done by a tool called CrossGen, which is part of the CoreCLR. Read more about CrossGen here on Github.

  • Jort Rodenburg described in Code like a Pro in C# how C# is compiled, which is a great read. Manning published this chapter on their website.

  • The AWS docs have a detailed explanation of what Lambda layers are. I recommend to read up on this as it explains everything you should know to get started.


Did you enjoy this read? Feel free to buy me a coffee! :)

Contact me? You can do that through blog@bschaatsbergen.com or LinkedIn.

If you're looking for other articles I recommend you to look in the library.