Websites like JPEGmini.com have a purpose. They tell a story. JPEGmini.com tells the story of image optimization through a web-based, easy-to-use demo. Users are able to experience the product online, hands-on, without further ado. This is the magic of JPEGmini.com.
But our web-based optimization service is not limited to a single photo. JPEGmini also has a free web service, where users upload large amounts of photos, and download the optimized versions of their photos.
From the user’s point of view, optimizing their photos is very simple. They upload their photos to JPEGmini.com, and we send the optimized photos back, reduced by up to 80%, yet without any visible difference in quality.
Behind the scenes there is a lot going on in order make this happen. Just like any other online service, we run clusters of servers hosted in multiple data-centers in order to process millions of photos.
Being a long-time fan of AWS, it was an easy decision to build upon their infrastructure.
Using the AWS building blocks (EC2, ELB, etc.), it is relatively straightforward to setup multiple web servers behind a load balancer, queue tasks to multiple servers, scale the work force dynamically, manage storage, etc. We let Amazon handle infrastructure, so we can focus on the user experience.
Almost. We still need to configure, manage and monitor all these services and components.
That got me thinking — how could we simplify the backend?
Static websites hosted on S3 provide highly-available, fault-tolerant and scalable websites, with literally zero DevOps required. By combining client-side processing with backend-hosted 3rd party services (e.g. directly accessing AWS services with the js sdk) it is possible to build many dynamic applications. Yet, in our case, one part was still missing — the ability to run our customized photo optimization algorithm on the backend.
Well, not anymore — thanks to AWS Lambda functions.
Briefly, AWS Lambda is a service that runs your code in response to events, managing the compute resources automatically. With AWS Lambda, there is no need for infrastructure management. Say goodbye to the task-queues, servers, load-balancers, and autoscaling. There would even be no more need to monitor servers. It is essentially “serverless” processing. Very cool.
In contrary to my first impression, Lambda is not limited to just JavaScript and Java. Any native code that can run in a container can also be packaged into a Lambda function (more on this below).
This would also mean less expenses. Lambda pricing is in increments of 100 milliseconds (as opposed to paying by the hour for EC2 servers). Essentially, better optimization of the pay-per-use model.
A Lambda-based backend means less effort for developers, IT and system engineers.
It is both serverless and simple. If your website requires back-end processing, and this can be broken into small compute tasks, you should think about making use of Amazon Lambda.
The technical details
The JPEGmini Lambda function is intended to replace the backend servers performing the actual image optimization. With the Lambda-based architecture, users upload their images directly to S3, which triggers our Lambda function for each new image. The function optimizes the image, and places the resulting image back on S3.
Out of the box, AWS Lambda supports the Node.JS and Java 8 runtimes, and those are the only two options you get to choose from when defining the function in the AWS Console. The less known fact is that you can bundle any code (including native compiled binaries) and execute it from within the JavaScript or Java Lambda function.
When defining the Lambda function, you can either edit the code inline (on the website), which is probably good enough for small hello-world type functions, or upload a pre-packaged zip file with all the code. The latter makes a lot more sense when the code uses external dependencies, grows in size, or when you manage your code with git (or similar). Packaging a zip file also lets you include native compiled binaries into the zip, and then execute them from within your code.
We used the AWS JS SDK for Node.js to handle moving files from S3 to the local system and back. The running Lambda function has permission to write into /tmp. Execution of the pre-compiled JPEGmini binary is done with shelljs, which simplifies waiting for the subprocess to finish, and error handling.
To avoid dynamic dependency issues, we made sure that the JPEGmini binary was statically linked to all dependencies, and verified it works well on an Amazon Linux EC2 instance before trying to get it working within the Lambda context. During development, the console.log function proved to be a very useful debugging tool,l which helped figure out how things were behaving on the file system.
Tying it all together, the resulting function downloads an image from S3 to /tmp, optimizes the image using the native JPEGmini binary, and uploads the result back to S3. We configured an S3 event to trigger our Lambda function when new images are uploaded to the bucket, and we monitor the process via CloudWatch — serverless processing.