GraphAware Blog

Find out what's new in the Neo4j world

Node Degrees in Neo4j 2.1

Neo4j GraphAware Beginner 23 Jul 2014 by Michal Bachman

Efficient counting of relationships in Neo4j was the cornerstone of my Master Thesis and the reason the very first GraphAware Framework module called the Relationship Count Module was born. The improvements in Neo4j 2.1 around dense nodes and the addition of getDegree(…) methods on the Node interface made me eager to do some benchmarking around relationship counts again.

The improvements in Neo4j 2.1, indeed, make the RelCount module slightly less useful. We’ve decided, however, not to make it obsolete, since it is a useful reference implementation of a GraphAware Framework Transaction-Driven Runtime Module. Instead, we ported the module to Neo4j 2.1 and reused all the performance (and functional) tests that had been already developed to learn a few things about Neo4j 2.1, which we would like to share in this short blog post.

Relationship Counting Performance

Prior to Neo4j 2.1, counting a node’s relationships was an operation with O(n) complexity, where n is the number of relationships. This was because all of the node’s relationships had to be visited in order for them to be counted. Neo4j 2.1 added a few getDegree(…) methods to the Node interface, which allows callers to ask for the degree of a node, optionally limited to a specific relationships type and/or direction.

We ran a test in which we asked for the degree of nodes with varying degree in three different scenarios: nodes on disk (no caches involved), nodes in low-level cache (memory-mapped files), and nodes in high-level cache (on the heap). The results are plotted in the following figure.

Counting Node Degrees in Neo4j 2.1

As we can see, the method runs in constant time (O(1)) when data is in high level cache, which is a massive improvement over O(n). For the other two scenarios, the curve is closer to linear time complexity. This means that Neo4j users will see some performance improvements, but also that the Relationship Count Module module might not be as obsolete as it first seemed. Please refer to its documentation for more benchmarks.

Degree of Nodes with Loops

The second learning point is related to functional rather than performance aspect of the newly added API methods. In graph theory, a loop (i.e., a relationship of a node with itself) counts as 2 towards the total degree of the node. However, in Neo4j 2.1, getDegree() and getDegree(RelationshipType) will only count loops as 1. This is a “gotcha” that could have been guessed by looking at the Javadoc:

/**
 * Returns the number of relationships connected to this node regardless of
 * direction or type. This operation is always O(1).
 * @return the number of relationships connected to this node.
 */
 public int getDegree();

A loop seems to be regarded by Neo4j as a single relationship “connected to this node”. Luckily, the workaround isn’t too hard. If you need a true degree and there are loops present in your graph, just sum up the outdegree and indegree, which still is an O(1) operation. Have a look at the following test:

@Test
public void loopsAndDegrees() {
    GraphDatabaseService database = new TestGraphDatabaseFactory().newImpermanentDatabase();

    try (Transaction tx = database.beginTx()) {
        Node node = database.createNode();
        node.createRelationshipTo(node, DynamicRelationshipType.withName("TEST"));
        tx.success();
    }

    try (Transaction tx = database.beginTx()) {
        Node node = database.getNodeById(0);
        assertEquals(1, node.getDegree(INCOMING));
        assertEquals(1, node.getDegree(OUTGOING));
        assertEquals(1, node.getDegree()); //not really expected!
        assertEquals(2, node.getDegree(OUTGOING) + node.getDegree(INCOMING));  //use this instead
        tx.success();
    }
}

Conclusion

Neo4j 2.1 improves relationship counting performance for data in high-level cache. GraphAware Relationship Count Module is still useful, however, for data in low-level cache and on disk, and for scenarios in which relationships need to be counted based on a property value (potentially on top of relationship type and/or direction). To find out a node’s degree in Neo4j 2.1 in the presence of loops, sum incoming and outgoing relationships rather than using getDegree() directly.

Share this blog post:

comments powered by Disqus