CDC (Change Data Capture) is a well defined software design pattern for a system that monitors and captures data changes so that other software can respond to those events.
CDC has many advantages compared to the traditional polling approach :
- All changes are captured: Intermediary changes between two polls are tracked and can be acted upon
- Real-time and low overhead: Reacting to CDC events happens in real time and only when changes happen avoiding CPU overhead of frequent polling
- Loose coupling: CDC send captured changes to messaging brokers, consumers can be added or removed on demand
Applications of Neo4j Change Data Capture
Over the years working with Neo4j, we have identified common real world applications where CDC adds value, here is a non-exhaustive list of such applications :
- Replication to external systems that can leverage a network representation of an entity (for eg: replication to Elasticsearch )
- Audit: combined with our Neo4j SSO plugin, all changes to the graph can be monitored and tracked down to single individuals
- Alerting: CDC can become the trigger to path finding queries between entities of interest and produce alerts when new paths are formed in the network
Not only with Kafka
While the Neo4j Streams project already offers a CDC capability by integrating with Kafka, we needed to support our enterprise customers and their own constraints, be it hardware resources, federal government listed authorized software or seamless testing on developer environments, with more connectors.
That’s why our first release comes with the connectors commonly used by our customers or used by GraphAware in Kubernetes deployments.
The following connectors are supported :
- RabbitMQ
- AWS SQS
- Azure Service Bus
- CloudEvents with Knative
More connectors can easily be developed depending on the demand.
Why CloudEvents ?
CloudEvents are probably the lesser known of the supported protocols and it merits a small dedicated section to it.
Our interest in CloudEvents is related to Knative in Kubernetes deployments and how we can provide Cloud Native CDC to Neo4j (for the newcomers, Cloud Native means highly distributed and resilient to infrastructure changes)
Knative provides two main advantages for CDC in a Kubernetes deployment :
- Abstraction of the messaging layer used: We can switch between Kafka, RabbitMQ or InMemory without having to change anything to the CDC plugin configuration
- Serverless consumers: Knative enables serverless applications with autoscaling of pods based on the eventing metrics and also allows scale-down-to-zero of consumers
Neo4j CDC And Hume Orchestra
For our Hume customers, listening to CDC events and acting on it is really a matter of clicks and not weeks of work.
The following screenshots show how to listen to CDC events produced from Neo4j, filtering for only relationship creation events and the content of the event message produced.
Summary
CDC is an efficient and cloud ready architectural pattern for third party applications to react to events from database changes. With GraphAware Hume, enterprises are ready to bring their solutions forward to drive business value.
If you’re interested to know more about our offerings, feel free to reach out to info@graphaware.com or meet us online at many Neo4j related events.