The Cypher query planner is quite advanced and mature, and you can mostly rely on it to pick the best plan for your query. However, there are rare cases, or bugs, that might want you looking for ways to influence that plan. This article demonstrates practical usage of an index hint.
Note that all queries were tested against Neo4j Enterprise 3.5.8
The graph model
This is the relevant portion of the graph model that is sufficient to demonstrate the issue.
Simple enough- we have many tweets, and tweets have keywords.
Our graph has two indexes, one on the
value of the
Keyword, and the other on
created_date of the
Vlasta, my colleague, noticed that a simple query started to perform slower over time, as the number of tweets in the graph grew. The innocuous query looked like
He assured us that only one
Keyword node in the graph had this particular value, and it had a handful of tweets related to it.
This sounded quite odd since we’d expect that the starting point for traversing the graph would be this lone
Keyword, and from there on, it would traverse a few relationships, retrieve the tweets which satisfied the
created_date condition, and return.
So we asked him for the query plan, and to our surprise, it looked like this (note that the query plan shown here is from a generated graph that represents the issue):
Approximately 100,000 tweets in the graph are found via the
Tweet index on
created_date, and subsequently, expanded to their keywords via the
CONTAINS_KEYWORDS relationship, and then filtered down to the single row that matches the keyword in our query!
We tried a variety of query rewrites, such as
and it produced the same plan
Only at this stage did we resort to an index hint, to tell the planner that we think it should use the Keyword index, knowing that we have fewer keywords as compared to tweets, and the keywords have unique values.
We provide the hint as follows, and the number of db hits now drops to 400005 from 2200002, and we see the
NodeIndexSeek being employed with the
Keyword index as we initially expected.
It might have been even more performant if somehow the planner expanded out through the single
Keyword to the single
but I couldn’t make that happen with all the tricks I know.
Note, that had we not created the index on
Tweet, our original query
would have worked just fine as shown below
We did however need this index for other use cases, and hence the query hint was the best solution at that point.
The fix is coming!
Thanks to good engineers at Neo4j, this issue will be fixed in 3.5.9
Update: As promised, the issue has been fixed in 3.5.9, producing the expected query plan
This was indeed a rare case which resulted in the usage of query hints.
As stated in the manual (and also, premature optimization is the root cause of all evil), this should be used with caution, and after you’ve tried everything else; making sure that you clearly understand the characteristics and shape of your graph and are confident enough to instruct the Cypher planner.