Let’s set the scene. We’re looking for scaling a PHP
application. Googling around take us to find out that
AWS Lambda is the most scalable service out there. It doesn’t
support PHP natively, but we got https://bref.sh. Not only
that, we also have Serverless Visually Explained
which walk us through what we need to know to get PHP up
and running on AWS Lambda. But we have a 8 year old
project that was not designed from the ground up to be
serverless. It’s not legacy. Not really. It works well,
has some decent test coverage, a handful of engineers
working on it and it’s been a success so far. It just has
not been designed for horizontal scaling. What now?
Bref has two primary goals: Feature parity with other
AWS Lambda runtime (the Event-Driven layer) and also
to be a replacement for web hosting (the Web Apps layer).
In a way, with the Web App layer, Bref allows us to lift-and-shift
from our current hosting provider into AWS Lambda and have
PHP-FPM working the same way as we’re all used to love.
So if we just take a large codebase and redeploy it on
AWS Lambda, will everything just work?
Here are some caveats to pay close attention to help with
this journey.
30 seconds API Gateway Timeout
When deploying to AWS Lambda, it’s very common to use API
Gateway as the routing solution. It’s like an nginx for
our PHP-FPM, but completely scalable and managed by AWS.
It does come with a hard limit of timeout at 30 seconds.
If your application never takes longer to process any
HTTP request, then this is not a big deal, but if it does,
even if very rarely, API Gateway will kill the request.
If this is a deal-breaker for your application, a workaround
could be to use Application Load Balancer instead.
PHP Extensions
Not every PHP extension is available easily. Tobias Nyholm
maintains a great community project for PHP Extensions at
https://github.com/brefphp/extra-php-extensions. A lot of
great extensions are available by default
(as documented in https://bref.sh/docs/environment/php.html#extensions),
but if you need an extension that is not available by default
or not provided by Tobias, you’ll either have to build an
AWS Lambda layer yourself or you’ll have to find a way
without that extension.
Incoming Payload limit
When someone sends a HTTP request to your application,
they’re usually sending in some raw data with it, typically
on the body of the request. If the body of the request
goes above AWS limit, your code will never even be executed
and AWS will kill the request upfront. The limits
are currently as follows:
- 1MB for Application Load Balancer
- 10MB for API Gateway
A very big portion of use cases can fit in both of these
offerings. The most common use case that may pose a threat
is file uploads. If the file is bigger than the allowed
size, AWS will not accept the request. A common workaround
is to refactor the application so that the backend returns
an S3 Signed Url for the frontend to do the file upload
directly to S3 and then notify the backend once it’s done.
Without going into complex use-case, S3 can take 100MB
of upload in a plain and simple request, but it also
support multi-part upload with a much bigger limit.
HTTP Response Limit
AWS Lambda has a limit of 1MB for the Http Response. When
I started getting 502 Gateway Timeout due to large response
size, the workaround that worked best for me was to gzip
the Http Response. That brought down the response size
from 1MB to roughly 50~100kb. I never had to work on
a use-case where gzipped responses would be bigger than
1MB, so if you have such a case, let me know on Twitter
because I’m very interested in it!
AWS Lambda execution limit
AWS Lambda is capped at 15 minutes of execution. If you
use API Gateway, that’s capped at 30 seconds anyway, but
if you use Application Load Balancer, then it’s possible to
have Http requests going up to 15 minutes. Although that’s
not very common / likely to be useful, one thing that
does come up is background jobs. Sometimes they can go
above 15 minutes. One use case I worked on was importing
CSV file. The Http layer would return a Signed Url, frontend
would upload to S3 and notify the backend of a new file
ready for processing. A message for SQS would be produced
and a new AWS Lambda would pick up the message as the worker.
Downloading the CSV file from S3 and processing each row
individually could take more than 15 minutes. The way
I handled that was to create a recursive Job. What I did
was:
- Download the file from S3
- Process 1k rows on the file (for loop)
- Each row processed increments a record on the database
- Produce a new SQS message IDENTICAL to the one currently being worked on
- Finish the job successfully
With this approach, the SQS message that gets picked up will
finish before 15 minutes and it will produce a new SQS message
with the exact same work unit. The job can always load
the pointer from the database to know where it left off
(the starting point of the for loop). If we don’t have 1k
records to run through, we can stop producing SQS messages
because we reached the end of the file. If there are
still rows to be processed, the new IDENTICAL SQS message
will make a new Lambda start, which means another 15 minutes
limit.
5 Layers per function
If your project uses too many exotic PHP extensions, you
may end up hitting the limit of 5 layers per function.
Bref itself will consume 2 of 5 layers. Each extension
is a separate layer. If you need, e.g., GMP for verifying
JWT tokens, Imagick for image processing and Redis, you’ve
reached your limit and the next extension you need might
be problematic.
50 MB of zipped Codebase / 250 MB of unzipped codebase
Your project source code cannot have more than 50MB of
zipped content or 250MB of unzipped. The 250MB unzipped also
include the Lambda layers extracted into your Lambda environment.
It sounds like a lot of code, but there are some PHP packages
out there that put binaries on your vendor folder which
may take a lot of space. There are some workaround
documented by Bref which involves zipping your vendor,
uploading it to S3 and deploying your code without
the vendor folder. The vendor then gets downloaded
on-the-fly when your Lambda starts. It adds a little
bit of overhead on your cold-start but overcome AWS
limitation on the codebase size.
read-only file-system
The entire file-system on AWS Lambda is read-only with
the exception of /tmp
. Any process that is writing
files into disk will need tweaking to work with the temporary
folder. Furthermore, two different requests will not be
guarantee to hit the exact same Lambda, so relying
on local disk is not a safe bet. Any data that
may need to be accessed in the future must go to S3
or a durable storage.
500MB on /tmp
Another thing to keep in mind is that /tmp
only has
500MB of disk space. If you fill that up, the Lambda environment
will start to get errors for full disk.
SQS messages cannot have more than 256kb of payload
Maybe on your server you use Redis as the Queue Storage
for your background jobs. If your serialized job class
has more than 256kb, it will not fit into AWS SQS.
A common workaround is to pass to the job a reference
to something big instead of the big thing itself.
For instance, if you have a job class that takes a collection
of 1 million rows of a database, that can be mitigated
by passing the instructions to query the database instead.
An SQL Query would be extremely small and produce a somewhat
similar result.
Every-minute CRON
When you have a server, putting a CRON to run every
minute and doing some clean up or tidy-up tasks sounds
trivial. But since AWS Lambda charges per code execution,
having a CRON running every minute can add up a bill
and/or be an unwanted side-effect. AWS offers
Scheduler for AWS Lambda, which means code can still
run on a regular basis, but it’s important to keep in
mind that everytime Lambda starts, it adds up to your
bill. Things like cleaning up old/stale files might not
even be necessary since every request may hit a clean
AWS Lambda environment.
Conclusion
This may not be an extensive list of gotchas, but I tried
to include everything that I may have faced while working
with AWS Lambda for the past 3 years. A lot of these limitations
have acceptable workarounds and overall hosting a Laravel
application on a single fat AWS Lambda has brought more
benefits than problems, so I would still advise in favour
of it if you’re looking to scale and ditch server maintenance.
As always, hit me up on Twitter with any
questions.
Cheers.
Laravel News Links