Linz/Urfahr
Hauptstraße 58, 4040 Linz
+43 50 353-300
Linz/Urfahr
Hauptstraße 73, 4040 Linz
+43 50 353-300
Neufelden
Marktplatz 1, 4120 Neufelden
+43 50 353-300
Gleisdorf
Franz Bloder Gasse 4, 8200 Gleisdorf
+43 50 353-300

Craft CMS meets High Scalability – a journey into the Unknown

We love Craft CMS – that much is clear. It's a wonderful Content Management System with a dedicated core development team that gives us the flexibility to tailor its use exactly to our needs. Be it headless or with the classic Twig templating approach, we have used Craft CMS in many of ours projects, and will happily continue to do so. However, there is one area where our CMS of choice doesn't shine right away, and that is scalability. This article aims to list some of the pitfalls we've come across when scaling Craft CMS, and solutions on how to fix them.

I'll take your brains to another dimension – pay close attention.

The Prodigy

It all starts with "dimension". In our case, the dimensions of three key metrics: multi-sites, entries and assets. For each of these metrics, but especially for the first, the complexity of the whole site grows exponentially to its increase.

But let's circle back for a moment: which project are we actually talking about? Surely we must have a real-world example up our sleeve? Of course we do: the project in question is a website for the Catholic Church of one of Bavaria's eastern regions, accessible at https://www.bistum-passau.de. While the main page doesn't look particularly tricky, the real complexity manifests itself when looking at the myriad of subpages such as https://djk.bistum-passau.de or https://pfarrverband-passau-st-anton.bistum-passau.de. While separate pages in themselves, all of them are powered by the same installation of Craft CMS, bringing the total number of multi-sites to 70 (and counting). Paired with 6,700+ entries and 21,000+ assets, we are presented with a beast of a system.

Not surprisingly, a beast of a system requires a beast of a server. Therefore, we've decided on a machine with 8 Dedicated CPU Cores, 32 GB of RAM and 160 GB of disk space, running solely the Craft CMS installation in question. Requests are served by NGINX in conjunction with PHP 7.2, coupled with MySQL 8.0 powering a database whose size is currently at a couple of Gigabytes. Maybe now you'll get a sense of what we're actually dealing with here.

As you can imagine, simply serving a website of this dimension isn't that big of a deal. The problem arises when it needs to be served to hundreds or thousands of users – often at the same time. In light of recent events, providing content on the Internet has become more important than ever, all the while presenting new challenges for the people who produce it. Since the Holy Mass couldn't be celebrated with physical presence in the past couple of weeks, tech-savvy church outlets such as our client simply had to switch to live streaming.

All fine and dandy – if it weren't for Easter. Not only did the number of users skyrocket because of the unique situation at hand, but also because it happened at a time most holy for many Christians around the world. For us, this meant we had to deal with a record user numbers in addition to maintaining an already complex system. Not an easy task, but one we were able to pull off in the end. How did we do that, you ask? After quite some introduction, let's get down to business.

Craft Cms Meets High Scalability 1

Chapter 1: Managing high user numbers

First of all, we had to deal with the ever-growing number of user accesses that we were presented with in the weeks leading up to Easter. After some research and close monitoring of our systems, we found out that pages cached by Craft are written into the database and queried therefrom upon user access. However, when hundreds of users tried to open the same page simultaneously, the MySQL query process was consuming a lot of the available CPU, with the rest thereof eaten up by lots of PHP child processes (as seen in the screenshot above). As a result, NGINX didn't have enough computing power left to do its thing, which more often than not resulted in a 502 Bad Gateway or a 504 Gateway Timeout error.

An upscale of a server – the first idea we've tried – only helped marginally, especially since the MySQL query was still consuming way too much CPU in too little time. So we had to think outside of the box and come up with a better solution.

This is where static HTML caching comes into play. While this idea is initially counter-intuitive to the flexibility approach of Content Management Systems, it makes sense if you'll expect a large number of users to be visiting the same page. This was exactly the situation in our case, where the vast majority of users accessed the same couple of pages (mainly the ones related to the livestreams). When serving the pages in question as static HTML files created prior to the rush, no MySQL process has to fetch anything from the database, leaving the NGINX web server (which is now working the hardest) lots of CPU. With this notion implemented by using the excellent Blitz plugin, we could easily serve a couple of hundred of users at a time without the server ever teetering on the brink of collapse.

The only downside to this approach is that once pages are statically cached, no changes being made in CMS are shown to users unless the specific order to rebuild the cache is given. For now, we're only using this approach on the most visited pages while triggering a cache rebuild on each deployment pipeline as well as once per night with a Cron Job. The exact strategy when to cache what is probably subject to each system's individual needs, but for us this solution worked wonders when it comes to fast and problem-free serving of Frontend templates.

Chapter 2: Let's get tunin'

Not surprisingly, a server upscale isn't the only possibility to make a system faster. On each server, there are countless crooks and nannies that can be analysed and tuned. For the sake of the argument, we'll focus on a couple of core ideas. And – since we haven't been properly nerdy until now – this chapter will contain mostly code. However, it should be noted that the following snippets are specifically tailored to our use case and merely present excerpts of the respective configurations. Should you want to copy them, please proceed with care.

Start off by splitting the communication between NGINX and PHP into two different sockets – one for the Frontend and one for the Backend. This separation of concerns makes sure that, while the Frontend is under lots of heat, the Backend still keeps its cool. Enough with the words though, let's see it in action:

      
        # NGINX Config Snippet
# /etc/nginx/sites-available/www.bistum-passau.de

set $fpm_socket "unix:/var/run/php/php7.2-fpm-frontend.sock";

if ($uri ~* "^/admin/") {
  set $fpm_socket "unix:/var/run/php/php7.2-fpm-backend.sock";
}
      
    
      
        # PHP Pool Config Snippet
# /etc/php/7.2/fpm/pool.d/www.conf

[frontend]

user = forge
group = forge

listen = /run/php/php7.2-fpm-frontend.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0666

pm = static
pm.max_children = 64
pm.max_requests = 3000

request_terminate_timeout = 3600

[backend]

user = forge
group = forge

listen = /run/php/php7.2-fpm-backend.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0666

pm = static
pm.max_children = 64
pm.max_requests = 3000

request_terminate_timeout = 3600
      
    

The second idea is hidden in plain sight in the above PHP Pool Config Snippet, but doesn't become obvious right away: why are we using pm = static instead of pm = dynamic or pm = ondemand? The answer is that only so many PHP child processes "linger" around at a time when using the later two. Faced with a sudden increase of traffic, spawning new child processes takes time – precious time, which sometimes determines make or break for a system. With a server such as ours, keeping alive 64 * 2 = 128 PHP children at all times doesn't hurt, but helps us a lot in times of need. Note that while the maximum number of children and requests each child can serve varies greatly from system to system, these values seemed to be a good compromise in our case.

      
        # PHP Config Snippet
# /etc/php/7.2/fpm/php.ini

max_execution_time = 300
max_input_time = 300
max_input_vars = 10000
memory_limit = 8G

default_socket_timeout = 300
mysql.connect_timeout = 300
mysql.allow_persistent = 1
      
    
      
        # MySQL Config Snippet
# /etc/mysql/mysql.cnf

innodb_buffer_pool_size=8G
innodb_log_files_in_group=2
innodb_log_file_size=1G
innodb_flush_log_at_trx_commit=0
innodb_thread_concurrency=4
innodb_stats_on_metadata=0

max_allowed_packet=512M

wait_timeout=300
net_read_timeout=300
net_write_timeout=300
interactive_timeout=300
connect_timeout=300
      
    

Last but not least, let's have a look at the PHP and MySQL Config Snippets above. In the case of PHP, the most important parameters are max_execution_time and memory_limit. You don't want the maximum execution time set too high – if there's an error, letting the system try forever wouldn't solve it anyways. The memory limit, on the other hand, can be set to a higher value, provided you have enough RAM at your disposal. It's probably a good idea to set that in the general Craft CMS general config as well, using the parameter phpMaxMemoryLimit. It would be a shame to generously allow PHP to consume lots of RAM, only for the CMS to limit its use thereof.

As for MySQL, the parameters innodb_buffer_pool_size and max_allowed_packet are the most interesting. We've had a good experience with setting the former along the lines of PHP's memory limit. The latter really depends on what kind of packages are sent my back and forth. Should you get any errors like "Packet Too Large" – you know what to do.

Chapter 3: A queue is not just for billiard

After we've made sure our users have a smooth surfing experience and our server is purring like a kitty, we took a look how to make life easier for our client's content editors. Specifically, we wanted to make sure everyday editing of entries and assets caused as little problems as possible. Although working in the background, one thing is as important as anything else in a CMS: the queue.

Whenever an entry or an asset gets created / saved / deleted, a number of queue jobs are triggered. Be it the update of search indexes, the deletion of stale template caches or the generation of sitemaps: queue jobs will happen, and they happen all the time. And with 70 multi-sites and thousands of entries and assets, the multiplication game doesn't make things easier. Before beating around the bush even more, we have to admin: yes, we've had problems with our queue. Lots of problems.

Mostly, these problems arose because queue jobs are treated with the same priority as other MySQL queries happening around the CMS. But for the content editor, the fact that an entry is saved quickly (prepare for a wild ride on that topic in this blog post) is more important than the deletion of template caches. That's why it's always a good idea to make sure queue jobs are not only running in the background, but behaving like they are inferior as well. In other words: we want to "nice" them to make sure to let the more important queries be executed first.

While this can be achieved by ways of a plugin (try AsyncQueue if you're interested in that), we actually leveraged the advantage of Forge – which our servers are provisioned with – and installed a Daemon to listen to all incoming queue jobs and executing them in a "nice" manner. This is quite straight-forward and a good way to achieve async queue handling in Craft. Just make sure to set the parameter runQueueAutomatically to "false" in your Craft general config so that the CMS doesn't try to run the queue automatically.

      
        # Forge Async Queue Daemon

/usr/bin/nice -n 10 /usr/bin/php /home/forge/www.bistum-passau.de/current/craft queue/listen --verbose
      
    

But, as you can imagine, we didn't stop there. We wanted to keep the impact of the queue on the MySQL database to a minimum, so other processes can use the available resources instead. Since each queue job actually creates a new row in the database's queue table, updates the row periodically (with its progress etc.) while being executed and deletes the row when its execution is done, a significant amount of MySQL queries can be saved by letting the queue be executed somewhere else. Enter: Redis.

With the risk of you reading the exact same thing when you click on the link, Redis is an open source, in-memory data structure store with key-value pairing, used as a database, cache and message broker. As we now know from experience, it's lightning-fast and can take care of all your queue needs without problems. Using Redis only requires a single MySQL query to be executed, which writes the "result" of the queue job into the database (for example by updating a search index or deleting a template cache).

With Craft CMS hosted on Forge, it's as easy as pie to include Redis in your setup. The following code snippet, taken from the app.php config file of Craft, gives you an idea of how it's done:

      
        # Craft CMS App Config File Snippet

<?php

return [
  '*' => [
    'components' => [
      // Initialize Redis component
      'redis' => [
        'class' => yii\redis\Connection::class,
        'hostname' => 'localhost',
        'port' => 6379,
        'password' => null,
      ],

      // Use queue with Redis
      'queue' => [
        'class' => yii\queue\redis\Queue::class,
        'redis' => 'redis',
        'channel' => 'queue',
      ],
    ],
  ],
];
      
    

There's really only one downside to moving the queue to Redis that we've come across so far: the progress of the queue jobs isn't shown in the Craft Control Panel any more, since there is currently no driver for that purpose. While this might change in the future, be prepared to check on your queue jobs with the Redis Monitor in the console when you are interested in their whereabouts.

Famous last words

In the previous chapters, a lot of ideas and notions have been presented and talked through. Although they might not apply to each and every system and configuration, they certainly helped us a lot and made our Craft CMS installation fast and highly scalable in the face of complexity. While the real power of a CMS lies in its flexibility, this doesn't necessarily have to be traded against scalability – it just requires some research and thinking outside of the box to take care of issues stemming from a large number of multi-sites, entries and assets.

We certainly hope this article has been of some interest to you, and would love to hear your inputs and feedback concerning our thoughts! Let us know on a social media outlet of your choice – we will write you back, and invite you to our office for an after-work beer. :-)