Decoupled Content Store based on Redis

This is the 2nd generation of a Two-Stack CMS package for Neos.

This Package is currently work-in-progress and heavily developed right now. It is not yet ready for general usage, but will be soon.

The Content Store package is one part of a Two-Stack CMS solution with Neos. A Two-Stack architecture separates editing and publishing from the delivery of content. This is also an architecture that's suitable to+ integrate Neos content in various other systems without adding overhead during delivery.

The first iteration was not open source; developed jointly by Networkteam and Sandstorm and is in use for several large customers. The second iteration (this project) is developed from scratch, in an open-source way, based on the learnings of the first iteration. Especially the robustness has been greatly increased.

What does it do?

The Content Store package publishes content from Neos to a Redis database as immutable content releases. These releases can be atomically switched and a current release points to the active release.

The delivery layer in the Two-Stack architecture uses the current release and looks for matching URLs in the content store and delivers the pre-rendered content. A delivery layer is decoupled from the actual Neos CMS and can be implemented in any language or framework. It is also possible to integrate the delivery layer part in another software (e.g. a shop system) as an extension.

Features

Publish a full, read-only snapshot of your live content to Redis in a so-called Content Release
allows for incremental publishing; so if a change is made, only the needed pages are re-rendered. This is integrated with the Neos Content Cache; so cache flushings work correctly. -Integration with Neos workspace publishing for automatic incremental publishing to the Content Store
Configurable Content Store format, decoupled from the internal representation in Neos.
Extensibility: Enrich content releases with your custom data.
Allows parallel rendering
Allows copying the content releases to different environments.
Allows rsyncing persistent assets around (should you need it)
Backend module with overview of content releases (current release, switching releases, manual publish)

This project is using the go-package prunner and its Flow Package wrapper as the basis for orchestrating and executing a content release.

Requirements

Redis
Sandstorm.OptimizedCacheBackend recommended
Prunner

Start up prunner via the following command:

prunner/prunner --path Packages --data Data/Persistent/prunner

Copy the pipelines_template.yml file into your project and adjust it as needed (see below and the comments in the file for explanation).

Approach to Rendering

The following flow chart shows the rendering pipeline for creating a content release.

                       ┌─────────────────────┐                                                      
                       │   Node Rendering    │                                                      
     ┌───────────┐     │   ┌─────────────┐   │     ┌───────────┐     ┌───────────┐     ┌───────────┐
     │   Node    │     │   │Orchestrator │   │     │  Release  │     │Transfer to│     │  Atomic   │
     │Enumeration│────▶│   └─────────────┘   │────▶│Validation │────▶│  Target   │────▶│  Switch   │
     └───────────┘     │┌────────┐ ┌────────┐│     └───────────┘     └───────────┘     └───────────┘
                       ││Renderer│ │Renderer││                                                      
                       └┴────────┴─┴────────┴┘

At the beginning of every render, all nodes are enumerated. The Node Enumeration contains all pages which need to be in the final content release.
Then, the rendering takes place. In parallel, the orchestrator checks if pages are already fully rendered. If no, he creates rendering jobs. If yes, the rendered page is added to the in-progress content release.

The renderers simply render the pages as instructed by the orchestrator.

The orchestrator tries to render multiple times: It can happen that after a render, the rendering did not successfully work, because an editor has changed pages at the same time; leading to content cache flushes and "holes" in the output.
During validation, checks can happen to see whether the content release is fully complete; to check whether it really can go online.
During the transfer phase, the finished content release is copied to the production Redis instance if needed. This includes copying of assets if needed.
In the switch phase, the content release goes live.

The above pipeline is implemented with prunner which is orchestrating the different steps.

Infrastructure

Here, we explain the different infrastructure and setup constraints for using the content store.

The Neos Content Cache must use Redis. It can use the OptimizedRedisCacheBackend.
The Content Store needs a separate Redis Database, but it can run on the same server.

It is crucial that Redis is available via lowest latency for Neos AND the Delivery Layer. See the different setup scenarios below for how this can be done.

Minimal Setup

The minimal setup looks as follows:

Neos writes into the Content Store Redis Database, and the Delivery Layer reads from the Content Store Redis Database.
Assets (persistent resources) are written directly to a publicly available Cloud Storage such as S3.

┌──────────────┐   ┌──────────────┐            
│ Neos Content │   │Content Store │            
│Cache Redis DB│   │   Redis DB   │◀───┐       
└──────────────┘   └──────────────┘    │       
        ▲                  ▲           │       
        └────────┬─────────┘           │       
                 │                     │       
             ╔══════╗          ╔══════════════╗
             ║ Neos ║          ║Delivery Layer║
             ╚══════╝          ╚══════════════╝
                 │                             
                 │                             
                 │       ┌──────────────┐      
                 │       │Asset Storage │      
                 └──────▶│   (S3 etc)   │      
                         └──────────────┘

In this case, the transfer phase does not need to do anything, and you need to configure Neos to use the cloud storage (e.g. via Flownative.Google.CloudStorage or Flownative.Aws.S3) for resources.

This is implemented in the default pipelines_template.yml.

This Setup should be used if:

the Delivery Layer and Neos are in the same data center (or host), so both can access Redis via lowest latencies
you want the easiest possible setup.

If you use Cloud Asset Storage, ensure that you never delete assets from there. For Flownative.Aws.S3, you can follow the guide on "Preventing Unpublishing of Resources in the Target".

Manually Sync Assets to the Delivery Layer via RSync

If you can not to use a Cloud Asset Storage, there's a built-in feature to manually sync assets to the delivery layer(s) via RSync.

To enable this, you need to follow the following steps:

Configure in Settings.yaml:

Flowpack:
  DecoupledContentStore:
    resourceSync:
      targets:
        -
          host: localhost
          port: ''
          user: ''
          directory: '../nginx/frontend/resources/'

In pipelines.yml, underneath 4) TRANSFER, comment-in the transfer_resources task.

Copy Content Releases to a different Redis instance

This Setup should be used if:

the Delivery Layer and Neos are in different data centers, so that there is a higher latency between one of the instances toward Redis
Or you need multiple delivery layers with different content states, with e.g. a staging delivery layer and a live delivery layer.

┌──────────────┐   ┌──────────────┐                   ┌──────────────┐
│ Neos Content │   │Content Store │                   │Content Store │
│Cache Redis DB│   │   Redis DB   │  ┌ ─ ─ ─ ─ ─ ─ ─ ▶│   Redis DB   │
└──────────────┘   └──────────────┘    Higher         └──────────────┘
        ▲                  ▲         │ Latency                ▲       
        └────────┬─────────┘                                  │       
                 │                   │                        │       
             ╔══════╗                                 ╔══════════════╗
             ║ Neos ║─ ─ ─ ─ ─ ─ ─ ─ ┘                ║Delivery Layer║
             ╚══════╝                                 ╚══════════════╝
                 │                                                    
                 │                                                    
                 │       ┌──────────────┐                             
                 │       │Asset Storage │                             
                 └──────▶│   (S3 etc)   │                             
                         └──────────────┘

In this case, the content store Redis DB is explicitly synced by Neos to another Delivery layer.

To enable this feature, do the following:

Configure the additional Content Stores in Settings.yaml underneath Flowpack.DecoupledContentStore.redisContentStores. The key is the internal identifier of the content store:

Flowpack:
  DecoupledContentStore:
    redisContentStores:
      live:
        label: 'Live Site'
        hostname: my-redis-hostname
        port: 6379
        database: 11
      staging:
        label: 'Staging Site'
        hostname: my-staging-redis-hostname
        port: 6379
        database: 11

In pipelines.yml, underneath 4) TRANSFER, comment-in and adjust the transfer_content task.
In pipelines.yml, underneath 5) TRANSFER, comment-in the additional contentReleaseSwitch:switchActiveContentRelease commands.

Alternative: Redis Replication

Instead of the explicit synchronization described here, you can also use Redis Replication to synchronize the primary Redis to the other instances.

Using Redis replication is transparent to Neos or the Delivery Layer.

To be able to use Redis replication, the Redis secondary (i.e. the delivery-layer's instance) needs to connect to the primary Redis instance.

For the explicit synchronization described here, the Redis instances do not need to communicate directly with each other; but Neos needs to be able to reach all instances.

Incremental Rendering

As a big improvement for stability (compared to v1), the rendering pipeline does not make a difference whether it is a full or an incremental render. To trigger a full render, the content cache is flushed before the rendering is started.

What happens if edits happen during a rendering?

If a change by an editor happens during a rendering, the content cache is flushed (by tag) as a result of this content modification. Now, there are two possible cases:

the document (which was modified) has not been rendered yet inside the current rendering. In this case, the rendered document would contain the recent changes.
the document was already rendered and added to the content release. In this case, the rendered document would not contain the recent changes.

The 2nd case is a bit dangerous, in the sense that we need a re-render to happen soon; otherwise we would not converge to a consistent state.

For use cases like scheduling re-renders, prunner supports a concurrency limit (i.e. how many jobs can run in parallel) - and if this limit is reached, it supports an additional queue which can be also limited.

So the following lines from pipelines.yml are crucial:

pipelines:
  do_content_release:
    concurrency: 1
    queue_limit: 1
    queue_strategy: replace

So, if a content release is currently running, and we try to start a new content release, then this task is added to the queue (but not yet executed). In case there is already a rendering task queued, this gets replaced by the newer rendering task.

This ensures that we have at most one content release running at any given time; and at most one content-release in the wait-list waiting to be rendered. Additionally, we can be sure that scheduled content releases will be eventually executed, because that's prunner's job.

Extensibility

Custom `pipelines.yml`

Crafting a custom pipelines.yml is the main extension point for doing additional work (f.e. additional enumeration or rendering).

Custom Document Metadata, integrated with the Content Cache

Sometimes, you need to build additional data structures for every individual document. Ideally, you'll want this structure to be integrated with the content cache; i.e. only refresh it if the page has changed.

Performance-wise, it is clever to do this at the same time as the rendering itself, as the content nodes (which you'll usually need) are already loaded in memory. You can register a Flowpack\DecoupledContentStore\NodeRendering\Extensibility\DocumentMetadataGeneratorInterface in Settings.yaml:

Flowpack:
  DecoupledContentStore:
    extensions:
      documentMetadataGenerators:
        'yourMetadataGenerator':
          className: 'Your\Extra\MetadataGenerator'

When you implement this class, you can add additional Metadata which is serialized to the Neos content cache for every rendered document.

Often, you'll also want to add another contentReleaseWriter which reads the newly added metadata and adds it to the final content release. Read the next section how this works.

Custom Content Release Writer

You can completely define how a content release is laid out in Redis for consumption by your delivery layer.

By implementing a custom ContentReleaseWriter, you can specify how the rendered content is stored in Redis.

Again, this is registered in Settings.yaml:

Flowpack:
  DecoupledContentStore:
    extensions:
      contentReleaseWriters:
        'yourMetadataReleaseWriter':
          className: 'Your\Extra\MetadataWriter'

Writing Custom Data to the Content Release

In case you write custom data to the content release (using $contentReleaseIdentifier->redisKey('foo')), you need to register the custom key also in the settings:

Flowpack:
  DecoupledContentStore:
    redisKeyPostfixesForEachRelease:
      foo: true

This is needed so that the system knows which keys should be synchronized between the different content stores, and what data to delete if a release is removed.

Development

You need pnpm as package panager installed: curl -f https://get.pnpm.io/v6.js | node - add --global pnpm
Run pnpm install in this folder
Then run pnpm watch for development and pnpm build for prod build.

We use esbuild combined with tailwind.css for building.

Rendering Deep Dive

Testing the Rendering

TODO

clean up of old content releases error handling better tests force-switch

Missing Features from old

data-url-next-page (or so) not supported

License

GPL v3

FEATURE: Add signal for enumerated node for enumerating additional nodes with arguments
Adds a new signal NodeEnumerator::nodeEnumerated for each enumerated node

Adds the node type name to EnumeratedNode DTO for inspecting it in slots (it also makes debugging from logs a bit easier)

Improved logging a bit
opened by hlubek 2
TASK: Check primary domain has scheme set
URL mapping will be wrong if primary domain has no scheme set, so better check this earlier and throw a descriptive error

Change code to use package exception and add code
opened by hlubek 1
DecoupledContentStore could stay compatible to Neos 7.3 and 8 on the same Version

Hi @Sebobo :)

can you add some hints here how you want to achieve this?

The main idea: We don't want to have (and maintain) separat branches for Neos 7.3 and 8. Sebastian mentioned, that it is possible (and not tooo much afford) to have both Neos Versions supported.

Main Issue would be the transitive dependency to optimized redis cache backend in 7.3 vs Core redis cache backend in Neos 8.

Thanks + Cheers
enhancement

opened by erickloss 0
BUGFIX: add "requestUriHost" to fake-cli request

We encountered a problem every time we tried to create a new release. The exception was Could not resolve a route and its corresponding URI for the given parameters. After some debugging with @hlubek we found the original exception thrown was a NoSiteException which told us, that the request was missing the requestUriHost, which is required for Neos to determine which site to render. After adding the requestUriHost to the fake-cli-request the rendering works!

opened by Pingu501 0
BUGFIX: ensure in cloud environments, only one content release is building at any given time

We usually rely on prunner to ensure that only one build is running at any given time.

However, when running in a cloud environment with no shared storage, the prunner data folder is not shared between instances. In this case, during a deployment, two containers run concurrently, with two separate prunner instances (the old and the new one), which do not see each other.

We could fix this in prunner itself, but this would be a bigger undertaking (different storage backends for prunner), or we can work around this in DecoupledContentStore. This is what we do here.

opened by skurfuerst 0