Decoupled Content Store based on Redis
This is the 2nd generation of a Two-Stack CMS package for Neos.
This Package is currently work-in-progress and heavily developed right now. It is not yet ready for general usage, but will be soon.
The Content Store package is one part of a Two-Stack CMS solution with Neos. A Two-Stack architecture separates editing and publishing from the delivery of content. This is also an architecture that's suitable to+ integrate Neos content in various other systems without adding overhead during delivery.
The first iteration was not open source; developed jointly by Networkteam and Sandstorm and is in use for several large customers. The second iteration (this project) is developed from scratch, in an open-source way, based on the learnings of the first iteration. Especially the robustness has been greatly increased.
What does it do?
The Content Store package publishes content from Neos to a Redis database as immutable content releases. These releases can be atomically switched and a current release points to the active release.
The delivery layer in the Two-Stack architecture uses the current release and looks for matching URLs in the content store and delivers the pre-rendered content. A delivery layer is decoupled from the actual Neos CMS and can be implemented in any language or framework. It is also possible to integrate the delivery layer part in another software (e.g. a shop system) as an extension.
Features
- Publish a full, read-only snapshot of your live content to Redis in a so-called Content Release
- allows for incremental publishing; so if a change is made, only the needed pages are re-rendered. This is integrated with the Neos Content Cache; so cache flushings work correctly. -Integration with Neos workspace publishing for automatic incremental publishing to the Content Store
- Configurable Content Store format, decoupled from the internal representation in Neos.
- Extensibility: Enrich content releases with your custom data.
- Allows parallel rendering
- Allows copying the content releases to different environments.
- Allows rsyncing persistent assets around (should you need it)
- Backend module with overview of content releases (current release, switching releases, manual publish)
This project is using the go-package prunner and its Flow Package wrapper as the basis for orchestrating and executing a content release.
Requirements
- Redis
- Sandstorm.OptimizedCacheBackend recommended
- Prunner
Start up prunner via the following command:
prunner/prunner --path Packages --data Data/Persistent/prunner
Copy the pipelines_template.yml
file into your project and adjust it as needed (see below and the comments in the file for explanation).
Approach to Rendering
The following flow chart shows the rendering pipeline for creating a content release.
┌─────────────────────┐
│ Node Rendering │
┌───────────┐ │ ┌─────────────┐ │ ┌───────────┐ ┌───────────┐ ┌───────────┐
│ Node │ │ │Orchestrator │ │ │ Release │ │Transfer to│ │ Atomic │
│Enumeration│────▶│ └─────────────┘ │────▶│Validation │────▶│ Target │────▶│ Switch │
└───────────┘ │┌────────┐ ┌────────┐│ └───────────┘ └───────────┘ └───────────┘
││Renderer│ │Renderer││
└┴────────┴─┴────────┴┘
-
At the beginning of every render, all nodes are enumerated. The Node Enumeration contains all pages which need to be in the final content release.
-
Then, the rendering takes place. In parallel, the orchestrator checks if pages are already fully rendered. If no, he creates rendering jobs. If yes, the rendered page is added to the in-progress content release.
The renderers simply render the pages as instructed by the orchestrator.
The orchestrator tries to render multiple times: It can happen that after a render, the rendering did not successfully work, because an editor has changed pages at the same time; leading to content cache flushes and "holes" in the output.
-
During validation, checks can happen to see whether the content release is fully complete; to check whether it really can go online.
-
During the transfer phase, the finished content release is copied to the production Redis instance if needed. This includes copying of assets if needed.
-
In the switch phase, the content release goes live.
The above pipeline is implemented with prunner which is orchestrating the different steps.
Infrastructure
Here, we explain the different infrastructure and setup constraints for using the content store.
- The Neos Content Cache must use Redis. It can use the OptimizedRedisCacheBackend.
- The Content Store needs a separate Redis Database, but it can run on the same server.
It is crucial that Redis is available via lowest latency for Neos AND the Delivery Layer. See the different setup scenarios below for how this can be done.
Minimal Setup
The minimal setup looks as follows:
- Neos writes into the Content Store Redis Database, and the Delivery Layer reads from the Content Store Redis Database.
- Assets (persistent resources) are written directly to a publicly available Cloud Storage such as S3.
┌──────────────┐ ┌──────────────┐
│ Neos Content │ │Content Store │
│Cache Redis DB│ │ Redis DB │◀───┐
└──────────────┘ └──────────────┘ │
▲ ▲ │
└────────┬─────────┘ │
│ │
╔══════╗ ╔══════════════╗
║ Neos ║ ║Delivery Layer║
╚══════╝ ╚══════════════╝
│
│
│ ┌──────────────┐
│ │Asset Storage │
└──────▶│ (S3 etc) │
└──────────────┘
In this case, the transfer phase does not need to do anything, and you need to configure Neos to use the cloud storage (e.g. via Flownative.Google.CloudStorage or Flownative.Aws.S3) for resources.
This is implemented in the default pipelines_template.yml
.
This Setup should be used if:
- the Delivery Layer and Neos are in the same data center (or host), so both can access Redis via lowest latencies
- you want the easiest possible setup.
If you use Cloud Asset Storage, ensure that you never delete assets from there. For Flownative.Aws.S3
, you can follow the guide on "Preventing Unpublishing of Resources in the Target".
Manually Sync Assets to the Delivery Layer via RSync
If you can not to use a Cloud Asset Storage, there's a built-in feature to manually sync assets to the delivery layer(s) via RSync.
To enable this, you need to follow the following steps:
-
Configure in
Settings.yaml
:Flowpack: DecoupledContentStore: resourceSync: targets: - host: localhost port: '' user: '' directory: '../nginx/frontend/resources/'
-
In
pipelines.yml
, underneath4) TRANSFER
, comment-in thetransfer_resources
task.
Copy Content Releases to a different Redis instance
This Setup should be used if:
- the Delivery Layer and Neos are in different data centers, so that there is a higher latency between one of the instances toward Redis
- Or you need multiple delivery layers with different content states, with e.g. a staging delivery layer and a live delivery layer.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Neos Content │ │Content Store │ │Content Store │
│Cache Redis DB│ │ Redis DB │ ┌ ─ ─ ─ ─ ─ ─ ─ ▶│ Redis DB │
└──────────────┘ └──────────────┘ Higher └──────────────┘
▲ ▲ │ Latency ▲
└────────┬─────────┘ │
│ │ │
╔══════╗ ╔══════════════╗
║ Neos ║─ ─ ─ ─ ─ ─ ─ ─ ┘ ║Delivery Layer║
╚══════╝ ╚══════════════╝
│
│
│ ┌──────────────┐
│ │Asset Storage │
└──────▶│ (S3 etc) │
└──────────────┘
In this case, the content store Redis DB is explicitly synced by Neos to another Delivery layer.
To enable this feature, do the following:
-
Configure the additional Content Stores in
Settings.yaml
underneathFlowpack.DecoupledContentStore.redisContentStores
. The key is the internal identifier of the content store:Flowpack: DecoupledContentStore: redisContentStores: live: label: 'Live Site' hostname: my-redis-hostname port: 6379 database: 11 staging: label: 'Staging Site' hostname: my-staging-redis-hostname port: 6379 database: 11
-
In
pipelines.yml
, underneath4) TRANSFER
, comment-in and adjust thetransfer_content
task. -
In
pipelines.yml
, underneath5) TRANSFER
, comment-in the additionalcontentReleaseSwitch:switchActiveContentRelease
commands.
Alternative: Redis Replication
Instead of the explicit synchronization described here, you can also use Redis Replication to synchronize the primary Redis to the other instances.
Using Redis replication is transparent to Neos or the Delivery Layer.
To be able to use Redis replication, the Redis secondary (i.e. the delivery-layer's instance) needs to connect to the primary Redis instance.
For the explicit synchronization described here, the Redis instances do not need to communicate directly with each other; but Neos needs to be able to reach all instances.
Incremental Rendering
As a big improvement for stability (compared to v1), the rendering pipeline does not make a difference whether it is a full or an incremental render. To trigger a full render, the content cache is flushed before the rendering is started.
What happens if edits happen during a rendering?
If a change by an editor happens during a rendering, the content cache is flushed (by tag) as a result of this content modification. Now, there are two possible cases:
- the document (which was modified) has not been rendered yet inside the current rendering. In this case, the rendered document would contain the recent changes.
- the document was already rendered and added to the content release. In this case, the rendered document would not contain the recent changes.
The 2nd case is a bit dangerous, in the sense that we need a re-render to happen soon; otherwise we would not converge to a consistent state.
For use cases like scheduling re-renders, prunner
supports a concurrency limit (i.e. how many jobs can run in parallel) - and if this limit is reached, it supports an additional queue which can be also limited.
So the following lines from pipelines.yml
are crucial:
pipelines:
do_content_release:
concurrency: 1
queue_limit: 1
queue_strategy: replace
So, if a content release is currently running, and we try to start a new content release, then this task is added to the queue (but not yet executed). In case there is already a rendering task queued, this gets replaced by the newer rendering task.
This ensures that we have at most one content release running at any given time; and at most one content-release in the wait-list waiting to be rendered. Additionally, we can be sure that scheduled content releases will be eventually executed, because that's prunner's job.
Extensibility
pipelines.yml
Custom Crafting a custom pipelines.yml
is the main extension point for doing additional work (f.e. additional enumeration or rendering).
Custom Document Metadata, integrated with the Content Cache
Sometimes, you need to build additional data structures for every individual document. Ideally, you'll want this structure to be integrated with the content cache; i.e. only refresh it if the page has changed.
Performance-wise, it is clever to do this at the same time as the rendering itself, as the content nodes (which you'll usually need) are already loaded in memory. You can register a Flowpack\DecoupledContentStore\NodeRendering\Extensibility\DocumentMetadataGeneratorInterface
in Settings.yaml
:
Flowpack:
DecoupledContentStore:
extensions:
documentMetadataGenerators:
'yourMetadataGenerator':
className: 'Your\Extra\MetadataGenerator'
When you implement this class, you can add additional Metadata which is serialized to the Neos content cache for every rendered document.
Often, you'll also want to add another contentReleaseWriter
which reads the newly added metadata and adds it to the final content release. Read the next section how this works.
Custom Content Release Writer
You can completely define how a content release is laid out in Redis for consumption by your delivery layer.
By implementing a custom ContentReleaseWriter
, you can specify how the rendered content is stored in Redis.
Again, this is registered in Settings.yaml
:
Flowpack:
DecoupledContentStore:
extensions:
contentReleaseWriters:
'yourMetadataReleaseWriter':
className: 'Your\Extra\MetadataWriter'
Writing Custom Data to the Content Release
In case you write custom data to the content release (using $contentReleaseIdentifier->redisKey('foo')
), you need to register the custom key also in the settings:
Flowpack:
DecoupledContentStore:
redisKeyPostfixesForEachRelease:
foo: true
This is needed so that the system knows which keys should be synchronized between the different content stores, and what data to delete if a release is removed.
Development
- You need pnpm as package panager installed:
curl -f https://get.pnpm.io/v6.js | node - add --global pnpm
- Run
pnpm install
in this folder - Then run
pnpm watch
for development andpnpm build
for prod build.
We use esbuild combined with tailwind.css for building.
Rendering Deep Dive
Testing the Rendering
TODO
clean up of old content releases error handling better tests force-switch
Missing Features from old
data-url-next-page (or so) not supported
License
GPL v3