Neo4j Change Data Capture with GraphAware Hume

March 29, 2021 · 3 min read

CDC (Change Data Capture) is a well defined software design pattern for a system that monitors and captures data changes so that other software can respond to those events.

CDC has many advantages compared to the traditional polling approach :

  • All changes are captured: Intermediary changes between two polls are tracked and can be acted upon
  • Real-time and low overhead: Reacting to CDC events happens in real time and only when changes happen avoiding CPU overhead of frequent polling
  • Loose coupling: CDC send captured changes to messaging brokers, consumers can be added or removed on demand

Applications of Neo4j Change Data Capture

Over the years working with Neo4j, we have identified common real world applications where CDC adds value, here is a non-exhaustive list of such applications :

  • Replication to external systems that can leverage a network representation of an entity (for eg: replication to Elasticsearch )
  • Audit: combined with our Neo4j SSO plugin, all changes to the graph can be monitored and tracked down to single individuals
  • Alerting: CDC can become the trigger to path finding queries between entities of interest and produce alerts when new paths are formed in the network

Not only with Kafka

While the Neo4j Streams project already offers a CDC capability by integrating with Kafka, we needed to support our enterprise customers and their own constraints, be it hardware resources, federal government listed authorized software or seamless testing on developer environments, with more connectors.

That’s why our first release comes with the connectors commonly used by our customers or used by GraphAware in Kubernetes deployments.

The following connectors are supported :

  • RabbitMQ
  • AWS SQS
  • Azure Service Bus
  • CloudEvents with Knative

More connectors can easily be developed depending on the demand.

Why CloudEvents ?

CloudEvents are probably the lesser known of the supported protocols and it merits a small dedicated section to it.

Our interest in CloudEvents is related to Knative in Kubernetes deployments and how we can provide Cloud Native CDC to Neo4j (for the newcomers, Cloud Native means highly distributed and resilient to infrastructure changes)

Knative provides two main advantages for CDC in a Kubernetes deployment :

  • Abstraction of the messaging layer used: We can switch between Kafka, RabbitMQ or InMemory without having to change anything to the CDC plugin configuration
  • Serverless consumers: Knative enables serverless applications with autoscaling of pods based on the eventing metrics and also allows scale-down-to-zero of consumers

Neo4j CDC And Hume Orchestra

For our Hume customers, listening to CDC events and acting on it is really a matter of clicks and not weeks of work.

The following screenshots show how to listen to CDC events produced from Neo4j, filtering for only relationship creation events and the content of the event message produced.

Summary

CDC is an efficient and cloud ready architectural pattern for third party applications to react to events from database changes. With GraphAware Hume, enterprises are ready to bring their solutions forward to drive business value.

If you’re interested to know more about our offerings, feel free to reach out to info@graphaware.com or meet us online at many Neo4j related events.


Meet the authors