Expiring Data in Neo4j

March 15, 2016 · 4 min read

At GraphAware, we help organisations in a wide range of verticals solve problems with graphs. Once we come across a requirement or use case two or three different times, we typically create an open-source Neo4j extension that addresses it. The latest addition to our product portfolio, introduced in this post, is a simple library that automatically expires data from the Neo4j graph database.

GraphAware Framework

Open-sourcing useful extensions helps us deliver solutions faster, lets people who prefer a DIY approach be more productive, and gives us valuable community feedback. That’s how our most popular products, such as the GraphAware Recommendation Engine, TimeTree, UUID, and others were born. They all have one thing in common – the GraphAware Framework.

Developed since early 2013, the GraphAware Framework is a mature platform for developing and testing transaction-driven functionality (“triggers”), timer-driven functionality, and custom REST APIs on top of Neo4j. The module we’re about to introduce is a nice example of how simple and efficient it is to develop production-ready Neo4j functionality with the framework. The meat of the module only contains a few hundred lines of code!

Neo4j Expire

There are use cases where particular pieces of data (nodes and relationships in a graph database) have a limited life-span. When they reach their end of life, it is sometimes required for practical or compliance reasons to remove them from the database. For example, we may want to remove quotations (nodes) that have expired, states (nodes) that are no longer valid, temporary memberships (relationships) that have expired, or other nodes and relationships that are no longer relevant in order to keep the database clean. For this purpose, we have built a simple GraphAware Framework Module called Neo4j Expire that will automatically delete data from Neo4j.

Expiration Date vs TTL

Users can configure the plugin to look at a specific property of nodes and relationships to determine when to delete them. This property can either represent a fixed expiry date, or a time-to-live (TTL). As in all our production-ready modules, we need to take care of various edge cases, so there are a couple of other things worth pointing out.

First of all, updates to nodes and relationships are handled automatically in the sense that if the expiry date changes, the new one is of course taken into account by the module. It is important to know, however, that when using TTL, updating a TTL on a node or relationship will start the count-down to expiry from the time the node/relationship has been updated.

Expiration Strategies

Secondly, let’s have a look at what actually happens when data is expired. In the case of relationships, there is only one strategy right now, which deletes the relationship. For nodes, two different strategies are provided. The “orphan” strategy, which is enabled by default, only deletes expired nodes once all their relationships have been deleted (or expired). It is possible to change this to “force” strategy, which will delete expired nodes including all their relationships.

How Does It Work?

GraphAware Framework’s transaction-driven functionality is used to inspect each created/updated node and relationship in order to determine, whether one of their properties match the configured expiry date or TTL. If so, the node/relationships gets indexed in Neo4j’s legacy index.

GraphAware Framework’s configurable timer-driven functionality is then used to continuously check, whether there are any nodes or relationships to be deleted, and to perform the actual deletion. This happens in the background and can be configured to happen periodically on a best-effort basis, so that it does not interfere with regular transaction processing, or in a regular interval.

Sample Configuration

Once you have the GraphAware Framework downloaded and configured, getting started with Neo4j Expire is a matter of downloading an appropriate release and adding a few extra lines to neo4j.properties.

For example, adding the following configuration

com.graphaware.runtime.enabled=true com.graphaware.module.EM.1=com.graphaware.neo4j.expire.ExpirationModuleBootstrapper com.graphaware.module.EM.relationshipTtlProperty=ttl 

and then creating a relationship like this

MATCH (p:Person {name:'Michal'}) MATCH (o:Organisation {name:'ACM'}) MERGE (p)-[:MEMBER_OF {ttl:2592000000}]->(o) 

will cause the relationship to vanish in exactly 30 days (2,592,000,000 ms).

Conclusion

The module has been released and we believe it is production-ready. As always, users of Neo4j Enterprise with a subscription purchased through GraphAware get 24/7 Premium Support for the GraphAware Framework and all its modules free of charge, provided by the world’s top Neo4j experts. Members of the community are welcome to use the software free of charge with no SLAs, to contribute, and to give us much-appreciated feedback.


Meet the authors