Neo4j 4: Post-Union Processing Explained

January 17, 2020 · 5 min read

Many, many years ago, I requested for the Cypher UNION clause in Cypher and Andres Taylor graciously added it. This was followed by the request for Post-Union Processing by Aseem Kishore, and it began to collect a whopping 99 comments over the course of time.

It is exciting to see support for a subset of subqueries in openCypher i.e. uncorrelated subqueries in the soon to be released Neo4j 4, bringing post-union processing finally to Cypher. Given its history, a short article is in order.

Post-Union Processing with Cypher in Neo4j 4

Union in 3.x

In pre-4x versions of Neo4j, UNION served to combine the results of 2 or more queries into one result set.

For example, to return the list of people who Julie knows or works with, and the people who live in the same city as Julie and tag her, you could do:

MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS|WORKS_WITH]-(m)RETURNDISTINCTm.firstnameUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:LIVES_IN]->(c)WITHn,cMATCH(c)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNDISTINCTm.firstname

There was no way to order the entire result set though, or limit it.

This query:

MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS|WORKS_WITH]-(m)RETURNDISTINCTm.firstnameUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:LIVES_IN]->(c)WITHn,cMATCH(c)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNDISTINCTm.firstnameORDERBYm.firstname

would just order the results of the second part of the UNION, as it belonged there and not to the UNION as a whole. There was also no way to process the result set and filter it further for example, or aggregate; this had to be done in your application.

Union and post-processing in 4x

Post processing the results of UNIONs is now possible with uncorrelated subqueries. We’ll examine some examples, and some things to bear in mind when using subqueries. A simple example, taken from the original feature request, is a query for:

  • People that Julie knows or works with
  • People that work with those that Julie knows
  • People that tag Julie and also live in the same city as her
  • Ordered by name
CALL{MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS|WORKS_WITH]-(m)RETURNDISTINCTm.firstnameASnameUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS]-()-[:WORKS_WITH]-(m)RETURNDISTINCTm.firstnameASnameUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:LIVES_IN]->(c)WITHn,cMATCH(c)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNDISTINCTm.firstnameASname}RETURNnameORDERBYname

Let’s see what a simple aggregation looks like:

CALL{MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS|WORKS_WITH]-(m)RETURNn,munionMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS]-()-[:WORKS_WITH]-(m)RETURNn,mUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:LIVES_IN]->(c)WITHn,cMATCH(c)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNn,m}RETURNn,COUNT(distinctm)AScount

We can also now use the result of the union in further matches, here is a simple filtering example:

CALL{MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS|WORKS_WITH]-(m)RETURNDISTINCTmUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS]-()-[:WORKS_WITH]-(m)RETURNDISTINCTmUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:LIVES_IN]->(c)WITHn,cMATCH(c)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNDISTINCTm}WITHmMATCH(c:City{name:"Kimton"})WHERE(m)-[:LIVES_IN]->(c)RETURNm.firstnameORDERBYm.firstname

Other Considerations

Now, if you decide to “optimise” and match Julie only once, beware!

MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})CALL{MATCH(n)-[:KNOWS|:WORKS_WITH]-(m)RETURNmUNIONMATCH(n)-[:KNOWS]-()-[:WORKS_WITH]-(m)RETURNmUNIONMATCH(city)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNm}RETURNm.firstname

This does not do as you expect, because 4.0 does not support correlated subqueries and the following restrictions apply:

  • A subquery cannot refer to variables from the enclosing query. In the example above, the n in the subqueries are not Julie at all, but every node in the graph is matched
  • A subquery cannot return variables with the same names as variables in the enclosing query

As such, the following query:

MATCH(m:Person)CALL{MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS|WORKS_WITH]-(m)RETURNDISTINCTmUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS]-()-[:WORKS_WITH]-(m)RETURNDISTINCTmUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:LIVES_IN]->(c)WITHn,cMATCH(c)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNDISTINCTm}RETURNmORDERBYm.firstname

will fail with

Variable `m` already declared 

Note that enclosing queries are allowed, just uncorrelated with the subquery. For example:

MATCH(c:City)CALL{MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS|WORKS_WITH]-(m)RETURNDISTINCTmUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:KNOWS]-()-[:WORKS_WITH]-(m)RETURNDISTINCTmUNIONMATCH(n:Person{firstname:"Julie",lastname:"Corkery"})-[:LIVES_IN]->(c)WITHn,cMATCH(c)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNDISTINCTm}RETURNc,count(m)

will return as many rows as there are cities- the subquery is evaluated for every incoming input row.

4.1 update

Neo4j 4.1 added support for correlated subqueries, so now a subquery can refer to variables from the enclosing query if they are explicitly imported, such as:

MATCH(n:Person{firstname:"Julie",lastname:"Corkery"})CALL{WITHnMATCH(n)-[:KNOWS|WORKS_WITH]-(m)RETURNDISTINCTmUNIONWITHnMATCH(n)-[:KNOWS]-()-[:WORKS_WITH]-(m)RETURNDISTINCTmUNIONWITHnMATCH(n)-[:LIVES_IN]->(c)WITHn,cMATCH(c)<-[:LIVES_IN]-(m)-[:TAGS]->(n)RETURNDISTINCTm}RETURNc,count(m)

And there you have it, thanks to the Cypher team, post-union processing is now available in the brand new Neo4j 4 release!


Meet the authors