Proposal: restructuring Drupal's internals

Image
Three bars of light. From left to right: Red, yellow, and cyan. The bars of light are three dimensional showing some depth in a rectangular shape.

Sending an email, triggering a notification, serving websockets, or kicking off an AI task in the background — these things are trivial in frameworks built on NodeJS or Rust, but in Drupal, they can involve a lot of effort and external tools. Drupal's design for the web's traditional request and response cycle tells us why. By tackling key problems at the heart of Drupal, we can make these jobs possible and expand Drupal's reach.

How this started

Turning Drupal async began with what seemed like a straightforward question: where should Drupal run the event loop? An event loop is the control center of an asynchronous application, coordinating all the different tasks that need to happen. Only one loop can exist at a time – its placement determines how flexible the application can be.

Initially, it looked like we could put this inside the Drupal Kernel. However, that quickly revealed a larger issue: the Kernel assumes that the single request-response cycle is the only thing that's happening. For asynchronous applications, that assumption is a blocker, because handling many tasks at once, including multiple requests, is exactly the point.

Interconnected challenges

This is where things got interesting. The moment I tried to solve this problem, I ran into other long-standing issues in the Drupal architecture:

  • Kernel overload: The DrupalKernel tries to handle too many responsibilities and always expects a request to bootstrap from. It currently handles: configuring the runtime environment; loading application settings; determining site directory for multisite functionality; setting up the application service container; as well as handling the request response flow.

  • Front controllers as bottlenecks: In web apps, the loop might sit after kernel->handle() and kernel->terminate(), one of the few lines in index.php. But Drush doesn’t even have an index.php, and multi-request applications will have a different entry point too.

  • Developer control vs. Drupal control: The places we could insert the loop — like index.php or Drush’s entry points — are considered to be owned by the application/site builder rather than Drupal core. There have, rightly, been previous issues to keep the code in these places small.

Solving the individual problems as a whole

One challenge I ran into is that from Drupal's own perspective in the role of serving webpages, each individual problem seems to be quite complicated for the value it provides. However, when taken together, they push Drupal forward and allow it to be used as an even more powerful framework. When taken as a whole, the issues below make a powerful push towards integrating Drupal into other applications, like long-running workers. I'll go through each issue individually and explain the value that making that change will provide to Drupal. The architectural theme that you'll see is a decoupling of environment-aware logic from the DrupalKernel to improve maintainability and possibilities for developers using Drupal.

Understanding the front-controller and DrupalKernel relationship

In a request-response application, a front-controller is the name for the initial PHP script that takes your request, loads and runs your application, and then sends a response. Within a Drupal application, this would be your index.php or update.php scripts. The same pattern can also be applied to command line applications. For Drush for example, it would be the drush.php file that loads the application and runs the right command.

There are a few important tasks that a front-controller does:

  1. Configure the runtime environment for the application;
  2. Configure last-resort error and exception handling;
  3. Instantiate a Request or Task object for the kernel or application to handle;
  4. Instantiate the Kernel or Application;
  5. and pass the request/task to the kernel/application and return the response.

There's a few things to make note of here.

First, when looking at Drupal's index.php file you'll notice that points 1 and 2 aren't actually happening in this file. Because the index.php is itself not controlled by Drupal core but is considered a part of the application using Drupal. This means that the amount of code in it must be small to ensure upgrading Drupal does not cause the need for constant manual churn in this file. With this in mind a lot of the code was removed from this file in the past, and it's purposefully kept small.

These configuration and set-up tasks currently live in the Drupal Kernel. Among these tasks are things like determining the application root, figuring out what multi-site–if any–we're in, setting specific PHP session configurations that Drupal relies on, and loading the application configuration.

Secondly, there is coupling between the front-controller and its environment to be aware of. Task 3, to instantiate a request or task instance, will depend on where your application is running. Within the traditional set-up using nginx + PHP-FPM, or Apache + mod_php, the server that runs your PHP script should have set-up environment variables with request information. Creating the Symfony Request instance that the DrupalKernel needs is as simple as calling Request::createFromGlobals(). However, when using other runtime environments like AMPHP or FrankenPHP, these environment variables won't exist or might be made available differently and you might need a translation layer to create a Symfony Request from the runtime's own request object.

Finally, the environment may need to be configured differently depending on whether it's running as webserver, or whether it's a command line application. For example DrupalKernel::bootEnvironment contains a bunch of settings that are only set outside of the command line using a if (PHP_SAPI !== 'cli') { check. Moving that responsibility to a front-controller ensures that it can be tailored exactly to where the code is running.

Making front-controllers manageable with Symfony runtime

Ensuring the responsibility for tasks like environment set-up and initial request creation live in the front-controller allow these tasks to be adapted to the type of application (request/response vs task runner) as well as its environment (traditional webserver, persistent webserver, or CLI) by the project that are using Drupal. The need for this can be seen by reading some of the discussions in Issue #2218651: [meta] Make Drupal compatible with persistent app servers like ReactPHP, PHP-FPM, PHP FastCGI, FrankenPHP, Swoole.

However, moving all these tasks into the index.php or update.php file would complicate them. That would make updates more difficult if changes need to be made to these components and that's exactly why they were moved into DrupalKernel in the first place.

Thankfully, Symfony has run into this same problem and has created the Runtime component. Issue #3313404: Use symfony/runtime for less bespoke bootstrap/compatibility with varied runtime environments, is the Drupal.org issue that discusses adopting this component.

The Runtime Component decouples the bootstrapping logic from any global state to make sure the application can run with runtimes like PHP-PM, ReactPHP, Swoole, FrankenPHP etc. without any changes.

Understanding Symfony Runtime

Importantly the Symfony Runtime Component introduces some new concepts and changes the look of our front-controllers. We'll go through these concepts one by one to show how the current index.php will change and how these different concepts work together. The two new classes are the "runner" and the "runtime". These are accompanied by a change in how you build your front-controller. Symfony's generic runtime, on which you can build your own, provides powerful functionality to be able to hand these environment-dependent elements (e.g. the environment name) into your front-controller so you can wire it up to your application.

The runner – this is the piece of code that knows how to map the inputs (e.g. the Request instance) to the application (e.g. the DrupalKernel instance). So it contains most of the code that's currently in index.php. The run function of a  DrupalKernelRunner (implementing the only method of RunnerInterface) could look like:

    $response = $this->kernel->handle($this->request);
    $response->send();

    if ($this->kernel instanceof TerminableInterface) {
      $this->kernel->terminate($this->request, $response);
    }

    return 0;

Importantly this doesn't contain any logic that determines how the Kernel is instantiated or how the request is formed.

The front-controller – this is the index.php or update.php file that we discussed before, it is responsible for choosing the runtime and assembling its application. Drupal's out of the box front-controller may look like this

<?php

use Composer\Autoload\ClassLoader;
use Drupal\Core\DrupalKernel;
use Drupal\Core\Runtime\DrupalRuntime;

$_ENV['APP_RUNTIME'] ??= DrupalRuntime::class;
require_once 'autoload_runtime.php';

return static function (string $environment) {
  return new DrupalKernel($environment, require 'autoload.php');
};

It would instantiate the DrupalKernel with arguments provided by the runtime and then provide that Kernel to the runtime so that it can be used for request handling.

The runtime – this is what the Symfony Component is actually named after and is the meat of the operation. The Runtime will take the application provided by the front-controller (e.g. a Kernel instance, a Symfony console application/command, or even a simple function) and wire this up to the runner. While doing so it can configure the environment (e.g. through ini_set or by registering error handlers) and it'll choose the correct runner based on the provided application.

A simplified Drupal runtime could look as follows. Leveraging Symfony's GenericRuntime as a base.

class DrupalRuntime extends GenericRuntime {

  /**
   * {@inheritdoc}
   */
  public function getRunner(?object $application): RunnerInterface {
    if ($application instanceof DrupalKernelInterface) {
      return new DrupalKernelRunner($application, Request::createFromGlobals());
    }

    return parent::getRunner($application);
  }

}

You should note that the runtime is now the part of the application that actually couples our application to a specific environment. Our front-controller only specifies what our application should do (it should handle requests using the DrupalKernel), whereas it's our runtime that is coupled to our environment and uses the globals to create the Symfony request. If I have an application that provides the request variables in a different manner then without changing the front-controller I can now support this in two steps: Create a custom Symfony Runtime (e.g. extending GenericRuntime) and set the APP_RUNTIME environment variable to my custom runtime class so that Drupal's front-controller picks it up.

Resolver, Resolvable Arguments, and Resolvable Applications – You may be wondering where the argument to our closure (string $environment) in our front-controller comes from. This is some of the magic the the Symfony Runtime component hides for us, but the method implemented by GenericRuntime for us is RuntimeInterface::getResolver. This will take the closure returned by our front-controller and use reflection to inspect the arguments it needs. GenericRuntime will then call its own (or overridden) getArgument function. In our DrupalRuntime we could make the environment available through implementing something along the lines of:

  protected function getArgument(\ReflectionParameter $parameter, ?string $type): mixed {
    if ($type === 'string' && $parameter->name === 'environment') {
      return $_ENV[$this->options['env_var_name']];
    }

    return parent::getArgument($parameter, $type);
  }

$this->options is another flexible feature provided by Symfony's GenericRuntime to move power to the end-user application.

Configuring the runtime environment

In 2015 damiankloip opened Issue #2501901: Allow configuring the environment via an environment variable. The main problem that the issue describes is that prod is hardcoded as the environment value in index.php, making it unavailable to be used for configuring application behavior (e.g. enabling verbose logging). The issue describes various ways of implementing this, but all of them would have to touch the front-controller and adjust to load the value (e.g. add $_ENV['DRUPAL_ENV'] to the DrupalKernel invocation).

As you've hopefully noticed in the explanation of Symfony Runtime, it provides us with a standardised way to get the environment name regardless of what kind of application you're building. For the initial Runtime implementation we may even resolve the $environment variable statically to prod to mimic the current behavior. That would allow work to be done in Issue #2501901 to determine the name of the variable that sets the environment and then provide this as an update to Drupal websites without having to touch their front-controllers again. Alternatively if they're unhappy with how Drupal determines the environment, then they can overwrite the runtime to resolve this variable themselves, without having to touch the front-controller.

Loading of application settings and Drupal multisite

This encompasses three closely related issues:

The thing that the issues have in common is that there's a discussion about how basic properties (like the application root and site path), as well as the settings for the site should be loaded. The DrupalKernel can only really do this with certain heuristics, looking at whether files exist and possibly guessing at the application root as needed. This is exactly the kind of logic that the front-controller, runtime, or even the server configuration (through either of those) can make a decision on without the Kernel having to guess. 

In Issue #2529170 I provided a proposal of an alternative solution to the one being worked on. My proposal is to introduce a new AppContext class that can be passed into the DrupalKernel and provide it with context (app_root, base_path, etc.) about where it is operating and how to load different settings. This would decouple the bootstrapping of Drupal and the Kernel from a request. For web applications, a front-controller could generate this AppContext from a Request as needed. A Kernel test could generate this based on the PHPUnit configuration. Meanwhile, an application like Drush could generate this information from its own configuration without needing to fake a request.

By moving the configuration of the app_root, base_url, and site settings path out of the DrupalKernel we can also tie this into the Symfony Runtime component. This would provide multiple ways to determine which site of the Drupal multisite setup to load, or even bypass the functionality entirely. The current implementation could be moved into the default Drupal Runtime, but it'd be trivial for developers to create more streamlined implementations based on $_SERVER variables, or even experiment with different site selection rules by creating or installing an alternative runtime.

Building the service container

In 2014 donquixote opened Issue #2354475: [meta] Refactor the installer, (multi)site management, and pre-container bootstrap. The issue rightly points out that there is a challenge in Drupal's flexibility. The functionality and the code that should be loaded for a Drupal application depends on configuration stored in the database. In turn the database connection details are part of the settings. This bootstrapping problem is similar to the start-up sequence for a PC, where a low level bit of code (the BIOS) is needed to read the actual program to be run (your operating system).

I don't currently have a proposal for how we might tackle this issue. However, if we view this issue within the context of the broader componentization of the DrupalKernel then I do expect this issue to become much easier. With the changes described so far, we're moving a large amount of logic out of the Kernel. 

Having these components outside of the Kernel will make their true dependencies clearer and makes it easier to change how they're instantiated or introduce new components altogether. For example we may be able to move the building of the container out of the Kernel, setting only certain requirements for the parts of the container that the Kernel itself needs but leaving other aspects of the container up to the application.

This is a discussion that fits into the broader dialogue of how we can leverage Drupal for more complex applications. A long running application that may perform background tasks or handle multiple Drupal requests at once for example may need its own container to set-up the socket server or other services outside of Drupal. By rethinking where the container is built these applications and the multiple Drupal requests that it handles, may be able to build the container only once*.

* Many Drupal services currently rely on handling only a single global request.. For example the current_user service. In order to make the container truly reusable across requests Drupal would need to introduce a "task" (e.g. "request") concept that parts of the code can use to get the current user.

Placing the Revolt event loop

The final part of this is actually the puzzle that started it all: Issue #3425210: Ensure asynchronous tasks that run beyond the length of a request have the chance to complete before process exit. This issue's goal is making sure that, within Drupal, the Revolt Event Loop runs at the right time to enable background processes to run without blocking the request, or without Drupal having shutdown. 

It is likely that there are actually in two places that are currently in the index.php file:

  1. Between $response->send(); and $kernel->terminate($request, $response); to allow background tasks to execute that may be required to send the response or that can happen after the response but require the Kernel.
  2. Optionally after $kernel->terminate($request, $response); so that any asynchronous processes that respond to Kernel cleanup can finish before the process is exited.

There may be caveats to this. For example, we may decide that Kernel termination should always be synchronous but non-blocking (i.e. it should clean up only and may not perform database/HTTP requests or other IO operations). Alternatively we may decide that the handling of the request itself should be a task on the event loop and require only a single EventLoop::run call.

Regardless of the answer, any change like this currently requires all Drupal installations to update their index.php – which Drupal considers to be part of the project, not part of Drupal core. This also means that it's not possible for contrib to experiment with alternative placements (think about a long-running AMPHP application that may handle multiple requests at the same time).

This realization that we need to change something that Drupal considers separate from Drupal as a framework, is what initially led me to find a solution to bring this into Drupal core, which led me to Symfony Runtime. This becomes easy once we control the flow via Symfony Runtime.

The impact for application developers leveraging Drupal

Adopting Symfony Runtime provides a way to break up the set-up logic in Drupal's front-controller. This allows us to provide end-users with building blocks, shipped as part of Drupal core or contrib, that allows Drupal to run as it does now while making it easier to swap out bootstrapping logic for different environments – whether that's dev vs prod or webserver vs Drush.

Simultaneously it allows decoupling Drupal's bootstrapping logic from a request where one isn't needed. In places where that logic is still needed it can be easily pulled in from the runtime using argument resolution from Drupal's default implementation.

These changes provide Drupal Core with the ability to easily ship async support using the Revolt Event Loop and provides a giant leap towards allowing Drupal to be used for any sort of application – whether that is the current Content Management System serving requests, a robust and personalized notification system, or AI model orchestrators.

Next steps

The issues described in this post are all still open at the time of writing. That's for good reason: they're hard problems. 

The next step is to tackle these issues with an eye on the overarching goal of pushing Drupal's boundaries. I plan to work through these issues in the order they’re presented here. However, I can use your help in refining the solution, finding related work that might benefit from these changes, and of course helping review the work to getting it across the finish line.

Share your thoughts with me on the Drupal Slack, on BlueSky, or in the issue queue for one of the issues.

Key issues

The following issues have been referenced in this article or are relevant for the proposed work.

Historic issues

The following issues have been referenced in this article and may be useful if you want an understanding of how things came to be, or to evaluate past technical decisions.

AI Disclaimer: LLM tools were used to help me structure, review and shorten this article. The thoughts contained in the article are my own and have formed through reading the linked issues, working on the topic, and a few months of working on this article.