Optimising asset compilation during deployment

Although modern solutions like Docker provide most of the tools we need for the deployments of nowadays web applications, there's no one-size-fits-all solution to optimise asset compilation during deployment, especially for legacy projects.

In this blog post, I'm going to explain one solution I came up with.

Background: My motivation

We have been working on a long-running Rails + Angular web application which uses tonnes of content in multiple forms: text, images, attached files and so forth. This content is visualised and edited on the platform, thus forcing us to optimise the sending and receiving of this data to keep users waiting as less as possible.

When assets are changed, deployments in this platform take around one hour, of which around 60% of the time corresponds to assets compilation. Not only that, but we often see that when assets are compiled in a server, the deployment process hangs because of this long wait.

For this one, we're running it on Cloud66, which stops listening for the assets compilation process and, even though it eventually finishes, the deployment process keeps on waiting forever.

When this happens, we need to start the whole deployment again. In a few occasions, we have spent more than five hours to deploy a change, hence needing to restart the deployment multiple times.

So, my main motivation has been to improve the deployment process to make it faster and more reliable. However, changing this will have another great benefit: we will be able to solve an ugly bug that generates different file digests per server.

The new process

The new process consists of various steps. Let's see what it is like.

The first step is to calculate a digest of the content of all the asset files that are going to be compiled. This is how I do it:

# Get all asset folders/files where Rails will look for assets to compile
paths = Rails.application.assets.paths.uniq

# Add all Webpacker files
paths = paths + [Webpacker.instance.config.source_path.to_s] + Webpacker.instance.config.resolved_paths

# Checking the contents of ALL files in node_modules is painful. Instead, just check yarn.lock
paths = paths - [Rails.root.join('node_modules').to_s] + [Rails.root.join('yarn.lock').to_s]

# From the list of files/folders, get the list of all files and files within the folders
list = `find #{paths.uniq.map(&:shellescape).join(' ')} -type f`.split("\n")

# Now calculate an MD5 digest from the content of all these files
raw_files_digest = `md5sum #{list.join(' ')} | cut -c1-32 | md5sum | cut -c1-32`.chomp

The digest will be the same as long as asset files remain the same, and will be different when any asset changes. This calculation is surprisingly fast. For 1200 files, it takes about a second to compute in my computer.

The next step is to compare this value to a value stored in Redis.

If the values don't match (or there is no value, which is what happens the first time the process is run), then we compile the assets normally and upload the assets to a S3 bucket. The upload is smart enough to only upload the needed files (those referenced in the Sprockets and Webpacker manifests) and only upload files that haven not already been uploaded before. This means that the first upload will take time but only a small number of files will be uploaded during the subsequent uploads. Lastly, the calculated digest is stored in Redis.

If the values do match, it means that assets are already compiled, so we just need to download them from S3. Again, the download is smart enough to only download files referenced in the manifest and it only downloads files that are not present, except manifests which need to be downloaded always.

In practice, with this process, when assets don't change, it will do nothing on any server. Why? Because it will "download" assets but since all are already present, it will skip all of them.

When assets change, compilation will be done only once in the first server and the rest will only need to download changed files, which - in practice - it's normally between three and ten files, at most.

Conclusion

With this change, we can improve deployment time and make it more reliable.

Moreover with the change we will be able to:

Remove sticky sessions from the load balancer. Right now, they are needed because each server has a different digest for files.
Improve error reporting in Sentry. What happens now is that when a JS error is reported, the JS file links to the source map. But then, Sentry tries to get the source map from a different server and it does not find it because the digest is different.
Another point is that container-based deployments don't have these problems with assets at all, as assets are part of the built images. But with old-school deployments it's still an unsolved problem.

That's all, folks! Hope you've found this useful!

Optimising asset compilation during deployment

Background: My motivation

The new process

Conclusion

Related articles

Introduction to Kubernetes

How I use Docker for Rails development: running services

Speeding up views rendering in Rails 4