Deep Dive into Neo4j 3.5 Full Text Search
· 15 min read
In this blog we will go over the Full Text Search capabilities available in the latest major release of Neo4j.
Contrary to our usual blogs, the content will rather focus on the underlying search engine used by Neo4j, that is Apache Lucene in version 5.5.5 .
What exactly is Search ?
Search is an interaction between a user and a search engine. The user has an information need at hand and attempts to satisfy it by providing a search with adequate constraints.
The search engine uses those constraints to collect matching results and return them to the user.
What is a Search Engine ?
A search engine’s purpose is to store, find and retrieve content. The underlying engine used by Neo4j is Apache Lucene, a free and open-source information retrieval software library.
There are some concepts that are key to search engines that will be detailed below.
Document
In search applications, the notion of a Document is central, because Documents are the items being stored, searched and returned. Documents correspond to content such as products in a catalog, content of books, the result of a pdf text extraction or people’s profiles.
A Document contains data fields, typically keys holding data values.
Inverted Index
An inverted index is the search engine’s data structure. Simply put, it maps documents to keywords just like a glossary at the end of a book.
It is composed of two main pieces : a term dictionary and a postings list. The term dictionary is a sorted list of all terms that occur in a given field across the corpus. The term dictionary assigns a unique identifier to each term. The postings list is the mapping between each term (referred by id) and the list of documents in which it appears.
In order to serve relevant results, Lucene adds more data structures and metadata to the index; we will talk about some of them later in this blog. For the impatient, they are: doc frequency, term frequency, term positions, term offsets and so on.
Analysis
The analysis is the process of converting text into smaller and precise units for the sake of searching: the tokens.
The analysis is composed of three steps : character filtering, tokenization and token filtering.
Let’s go over each step and demonstrate end-to-end how we analyze the text
The GraphAware’s fifth year anniversary at the Prague office in Žitná"
.
During the first step, character filtering, the characters of text fields are adjusted or filtered in different ways.
The next step is tokenization. As the name indicates, during this step, raw text is converted into tokens. The most straightforward way to tokenize a text is to split it on whitespaces, but it is rarely the right approach, because you would end up with tokens containing punctuation, such as commas.
Instead, English and most European language texts use the standard tokenizer, which split on word whitespace and punctuation.
The last step is token filtering. Here the tokens are adjusted by adding or removing them or by changing them. For the purpose of normalizing appropriately the tokens from our example, a typical choice would be to lowercase the tokens and remove common words such as ‘the’ and ‘at’ ( usually called stopwords ), and remove the possessive after GraphAware.
Once the analysis is completed, the data is saved into the inverted index as described above.
Searching
Once the index is built, we can search that index using a Query and an IndexSearcher. The IndexSearcher is hidden in the Neo4j implementation, so we will only go over the Query syntax.
The query syntax used is the Apache Lucene Classic Query syntax, let’s go over some examples:
hello
: search for documents containing the term hellotitle: neo4j
: search for documents containing the term neo4j in the title fieldgraph*
: search for documents containing terms starting with graph, such as graph, graphs, graphical, etc.
The human-readable query is parsed by the Lucene’s Query Parser and is then transformed to a concrete implementation of the Query class, for which we need some understanding and examples :
Query implementation | Purpose | Example |
---|---|---|
Term Query | Single term query | neo4j |
PhraseQuery | Match of several terms in sequence, or in near vicinity to each other | “graph database” |
RangeQuery | Matches documents between beginning and ending terms, including or excluding the end points | [A TO Z] {A TO Z} |
WildcardQuery | Regex like query | g*p? , d??abase |
PrefixQuery | Matches all terms beginning with a specified string | algo* |
FuzzyQuery | Levenshtein algorithm for closeness matching | cipher~ |
BooleanQuery | Aggregates other query instances into complex expressions | graph AND “shortest path” |
Full Text Search with Neo4j
We will now see how all of the above is available in Neo4j through dedicated Cypher procedures. To do so, we need to populate our database with some data, in this case, a list of book titles:
LOAD CSV WITH HEADERS FROM "https://bit.ly/fts-books" AS row
CREATE (n:Book {title: row.title, isbn: row.isbn, id: row.id, image: row.small_image_url, authors: row.authors})
Indexing
The first operation to do is to create a fulltext search index, with the help of the following procedure :
CALL db.index.fulltext.createNodeIndex('books', ['Book'], ['title', 'authors'])
The first argument is the name of the index, the second argument is a list of node labels that will be represented as documents in the books index. The last argument is the list of properties to be replicated as document fields, note that as of now, only text properties are being replicated.
There is an optional fourth argument that takes a configuration map, where you can specify the analyzer to be used. The analyzer is the class that will split the text into tokens, it primarily consist of tokenizers and filters. Different analyzers will have different combinations of tokenizers and filters.
CALL db.index.fulltext.createNodeIndex('books', ['Book'], ['title'], {analyzer: "spanish"})
You can find the list of available analyzers with the following the procedure :
CALL db.index.fulltext.listAvailableAnalyzers
The most commonly used analyzers are
- StandardAnalyzer ( one of the most sophisticated analyzers, it lowercase the text and remove stopwords and punctuation, it can also regonise emails and urls)
- StopAnalyzer ( same as StandardAnalyzer but without the ability to recognise emails and urls)
- KeywordAnalyzer ( tokenize the input as a single token, useful for ids or zipcodes )
You can check the index is created by issuing the :schema
command :
Indexes
ON NODE:Book(title) ONLINE
No constraints
Querying
Now that our books index is created, we can query it and test our full text search queries. Let’s find all books containing the word “secret” in their title :
CALL db.index.fulltext.queryNodes('books', 'secret')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1327873635s/2998.jpg","ti│1.7604600191116333│
│tle":"The Secret Garden","isbn":"517189607","authors":"Frances Hodgson│ │
│ Burnett"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1473454532s/37435.jpg","t│1.4083679914474487│
│itle":"The Secret Life of Bees","isbn":"142001740","authors":"Sue Monk│ │
│ Kidd"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
As you can see, the result of the procedure is not a list of documents, but a list of nodes instead.
There is a concept we did not cover yet, scoring. Let’s first show some examples of other queries before diving into it.
Let’s now search for secret life
:
CALL db.index.fulltext.queryNodes('books', 'secret life')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1473454532s/37435.jpg","t│1.9917329549789429│
│itle":"The Secret Life of Bees","isbn":"142001740","authors":"Sue Monk│ │
│ Kidd"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1320562005s/4214.jpg","ti│0.6224165558815002│
│tle":"Life of Pi","isbn":"770430074","authors":"Yann Martel"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1327873635s/2998.jpg","ti│0.6224165558815002│
│tle":"The Secret Garden","isbn":"517189607","authors":"Frances Hodgson│ │
│ Burnett"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
As you can see, the second result does not contain all of the search terms. It is because when the query is parsed, it is understood as a TermsQuery, where each term is handled separately.
To circumvent this, we can force the query to be understood as a PhraseQuery, by enclosing the terms in double quotes :
CALL db.index.fulltext.queryNodes('books', '"secret life"')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1473454532s/37435.jpg","t│2.8167359828948975│
│itle":"The Secret Life of Bees","isbn":"142001740","authors":"Sue Monk│ │
│ Kidd"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
We can also search on a specific field :
CALL db.index.fulltext.queryNodes('books', 'authors: rowling')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1474154022s/3.jpg","title│1.7578392028808594│
│":"Harry Potter and the Sorcerer's Stone (Harry Potter, #1)","isbn":"4│ │
│39554934","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1387141547s/2.jpg","title│1.7578392028808594│
│":"Harry Potter and the Order of the Phoenix (Harry Potter, #5)","isbn│ │
│":"439358078","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1474169725s/15881.jpg","t│1.7578392028808594│
│itle":"Harry Potter and the Chamber of Secrets (Harry Potter, #2)","is│ │
│bn":"439064864","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1361482611s/6.jpg","title│1.7578392028808594│
│":"Harry Potter and the Goblet of Fire (Harry Potter, #4)","isbn":"439│ │
│139600","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1474171184s/136251.jpg","│1.7578392028808594│
│title":"Harry Potter and the Deathly Hallows (Harry Potter, #7)","isbn│ │
│":"545010225","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1361039191s/1.jpg","title│1.7578392028808594│
│":"Harry Potter and the Half-Blood Prince (Harry Potter, #6)","isbn":"│ │
│439785960","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1499277281s/5.jpg","title│1.3183794021606445│
│":"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)","isbn"│ │
│:"043965548X","authors":"J.K. Rowling, Mary GrandPré, Rufus Beck"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
Or on more than one field :
CALL db.index.fulltext.queryNodes('books', 'authors: rowling AND title: goblet')
╒══════════════════════════════════════════════════════════════════════╤═════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪═════════════════╡
│{"image":"https://images.gr-assets.com/books/1361482611s/6.jpg","title│2.518252372741699│
│":"Harry Potter and the Goblet of Fire (Harry Potter, #4)","isbn":"439│ │
│139600","authors":"J.K. Rowling, Mary GrandPré"} │ │
└──────────────────────────────────────────────────────────────────────┴─────────────────┘
Fuzziness
The power of Full Text Search is also the ability to retrieve results even if the search query does not exactly match text in the original corpus.
There are a couple of implementations offering such behaviors, one of them is the FuzzyQuery.
CALL db.index.fulltext.queryNodes('books', 'garde~')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1408303130s/375802.jpg","│1.6505731344223022│
│title":"Ender's Game (Ender's Saga, #1)","isbn":"812550706","authors":│ │
│"Orson Scott Card"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1327873635s/2998.jpg","ti│1.5997971296310425│
│tle":"The Secret Garden","isbn":"517189607","authors":"Frances Hodgson│ │
│ Burnett"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1327656754s/11.jpg","titl│1.0181047916412354│
│e":"The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide to the Ga│ │
│laxy, #1)","isbn":"345391802","authors":"Douglas Adams"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1439632243s/24178.jpg","t│0.8555957078933716│
│itle":"Charlotte's Web","isbn":"64410935","authors":"E.B. White, Garth│ │
│ Williams, Rosemary Wells"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1436732693s/13496.jpg","t│0.5999239683151245│
│itle":"A Game of Thrones (A Song of Ice and Fire, #1)","isbn":"5535884│ │
│86","authors":"George R.R. Martin"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
The tilde (~)
allows a FuzzySearch for garde
using the Damarau-Levenshtein distance algorithm. As you can see, some results such as The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide to the Galaxy, #1
are not really relevant for our search, it is because of the default minimum term similarity set for the FuzzyQuery which is 0.5
, you can override the default with your own minimum by specifying it after the tilde :
CALL db.index.fulltext.queryNodes('books', 'garde~0.7')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1327873635s/2998.jpg","ti│3.0637331008911133│
│tle":"The Secret Garden","isbn":"517189607","authors":"Frances Hodgson│ │
│ Burnett"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
Proximity Search
If you think about the use case for the FuzzySearch
, you can imagine that we would encounter the same need regarding PhraseQuery
searches, where the sequence of term provided in the query mae not be exactly as it was in the original corpus.
The following search will return nothing, while knowing we have a book with the title The secret life of bees
:
CALL db.index.fulltext.queryNodes('books', '"secret bees"')
(no changes, no records)
You can specify the distance between the words specified in the search query, for example :
CALL db.index.fulltext.queryNodes('books', '"secret bees"~3')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1473454532s/37435.jpg","t│2.7131075859069824│
│itle":"The Secret Life of Bees","isbn":"142001740","authors":"Sue Monk│ │
│ Kidd"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
WildcardQuery
The last implementation we will cover is the WildcardQuery, where you can provide wildcards for your searches.
Use ?
for a single character wildcard search, use *
for multiple characters wildcard search.
CALL db.index.fulltext.queryNodes('books', 'bee?')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1473454532s/37435.jpg","t│0.7071067690849304│
│itle":"The Secret Life of Bees","isbn":"142001740","authors":"Sue Monk│ │
│ Kidd"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
CALL db.index.fulltext.queryNodes('books', 'secr*')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1474169725s/15881.jpg","t│0.7071067690849304│
│itle":"Harry Potter and the Chamber of Secrets (Harry Potter, #2)","is│ │
│bn":"439064864","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1473454532s/37435.jpg","t│0.7071067690849304│
│itle":"The Secret Life of Bees","isbn":"142001740","authors":"Sue Monk│ │
│ Kidd"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1327873635s/2998.jpg","ti│0.7071067690849304│
│tle":"The Secret Garden","isbn":"517189607","authors":"Frances Hodgson│ │
│ Burnett"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
Scoring
The default scoring function of Apache Lucene, at least in version 5.5.5, is based on a highly optimized Vector Space Model. That scoring function is more commonly known as TFIDF Similarity.
From Wikipedia :
In information retrieval, tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.
Term frequency
The term frequency is the raw count of a term in a document (the number of times the term t
appears in document d
).
Inverse document frequency
The inverse document frequency is a measure of how much information the word provides (ie. if it’s common or rare across the corpus).
The formula for calculating the idf is the following :
where :
N
is the total number of documents in the corpus|{d ∈ D : t ∈ d}|
is the number of documents where the termt
appears
Term-frequency Inverse Document Frequency
The TF-IDF is calculated as
There are some variations and adaptations in the concrete implementation of TF-IDF in Lucene, but you have the basic idea of the most common similarity computation function used in information retrieval. For a detailed explanation of TF-IDF in Lucene, you can refer to its Javadocs.
A small example explains sometimes better :
CALL db.index.fulltext.queryNodes('books', 'sample')
╒═══════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞═══════════════════════════════════════════════════╪══════════════════╡
│{"title":"this is a sample"} │0.5945348143577576│
├───────────────────────────────────────────────────┼──────────────────┤
│{"title":"this is another sample in a longer text"}│0.2972674071788788│
└───────────────────────────────────────────────────┴──────────────────┘
As you can see, the importance of the term sample
is higher in the first result because the text is shorter than the second result- this is the effect of the tf
formula.
If we would now create another 100 documents containing the term sample
, we will encounter the idf effect, which will increase the difference of similarity between the first result and all the other result for the term sample
because it appears often in many documents.
UNWIND range(1,100) AS i
CREATE (n:Book {title: "This is book with sample " + i})
╒═══════════════════════════════════════════════════╤═══════════════════╕
│"node" │"score" │
╞═══════════════════════════════════════════════════╪═══════════════════╡
│{"title":"this is a sample"} │0.9902438521385193 │
├───────────────────────────────────────────────────┼───────────────────┤
│{"title":"this is another sample in a longer text"}│0.49512192606925964│
├───────────────────────────────────────────────────┼───────────────────┤
│{"title":"This is book with sample 1"} │0.49512192606925964│
├───────────────────────────────────────────────────┼───────────────────┤
│{"title":"This is book with sample 2"} │0.49512192606925964│
├───────────────────────────────────────────────────┼───────────────────┤
│{"title":"This is book with sample 3"} │0.49512192606925964│
├───────────────────────────────────────────────────┼───────────────────┤
│{"title":"This is book with sample 4"} │0.49512192606925964│
├───────────────────────────────────────────────────┼───────────────────┤
│{"title":"This is book with sample 5"} │0.49512192606925964│
├───────────────────────────────────────────────────┼───────────────────┤
│{"title":"This is book with sample 6"} │0.49512192606925964│
├───────────────────────────────────────────────────┼───────────────────┤
Boosting
Users have the ability to influence the scoring of the matched results. Apache Lucene offers two types of boosting capabilities :
index time boosting
: which adds a boost factor to a document before it is indexed (not possible in Neo4j)query time boosting
: which applies a boost to a query
Let’s say that you want the search on the author to be more important than the search on the book’s title, you can apply a boost near the search terms for authors.
To demonstrate, let’s take a boolean query :
CALL db.index.fulltext.queryNodes('books', 'title: harry potter OR title:order of phoenix OR authors: rufus')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1387141547s/2.jpg","title│1.9725315570831299│
│":"Harry Potter and the Order of the Phoenix (Harry Potter, #5)","isbn│ │
│":"439358078","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1499277281s/5.jpg","title│1.0511908531188965│
│":"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)","isbn"│ │
│:"043965548X","authors":"J.K. Rowling, Mary GrandPré, Rufus Beck"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1474169725s/15881.jpg","t│0.4153219163417816│
│itle":"Harry Potter and the Chamber of Secrets (Harry Potter, #2)","is│ │
│bn":"439064864","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1361482611s/6.jpg","title│0.4153219163417816│
│":"Harry Potter and the Goblet of Fire (Harry Potter, #4)","isbn":"439│ │
│139600","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1474171184s/136251.jpg","│0.4153219163417816│
│title":"Harry Potter and the Deathly Hallows (Harry Potter, #7)","isbn│ │
│":"545010225","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
The first result has a higher score because it matches all the title conditions, but we can influence the authors to be of higher importance :
CALL db.index.fulltext.queryNodes('books', 'title: harry potter OR title:order of phoenix OR authors: rufus^5')
╒══════════════════════════════════════════════════════════════════════╤═══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪═══════════════════╡
│{"image":"https://images.gr-assets.com/books/1499277281s/5.jpg","title│1.2862814664840698 │
│":"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)","isbn"│ │
│:"043965548X","authors":"J.K. Rowling, Mary GrandPré, Rufus Beck"} │ │
├──────────────────────────────────────────────────────────────────────┼───────────────────┤
│{"image":"https://images.gr-assets.com/books/1387141547s/2.jpg","title│0.9179487228393555 │
│":"Harry Potter and the Order of the Phoenix (Harry Potter, #5)","isbn│ │
│":"439358078","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼───────────────────┤
│{"image":"https://images.gr-assets.com/books/1474169725s/15881.jpg","t│0.19327662885189056│
│itle":"Harry Potter and the Chamber of Secrets (Harry Potter, #2)","is│ │
│bn":"439064864","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼───────────────────┤
│{"image":"https://images.gr-assets.com/books/1361482611s/6.jpg","title│0.19327662885189056│
│":"Harry Potter and the Goblet of Fire (Harry Potter, #4)","isbn":"439│ │
│139600","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼───────────────────┤
│{"image":"https://images.gr-assets.com/books/1474171184s/136251.jpg","│0.19327662885189056│
│title":"Harry Potter and the Deathly Hallows (Harry Potter, #7)","isbn│ │
│":"545010225","authors":"J.K. Rowling, Mary GrandPré"} │ │
├──────────────────────────────────────────────────────────────────────┼───────────────────┤
You can apply boosting to phrase queries as well :
CALL db.index.fulltext.queryNodes('books', 'title: "harry potter and the order of the phoenix" OR authors:"rufus beck"')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1499277281s/5.jpg","title│1.7385646104812622│
│":"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)","isbn"│ │
│:"043965548X","authors":"J.K. Rowling, Mary GrandPré, Rufus Beck"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1387141547s/2.jpg","title│1.0253233909606934│
│":"Harry Potter and the Order of the Phoenix (Harry Potter, #5)","isbn│ │
│":"439358078","authors":"J.K. Rowling, Mary GrandPré"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
CALL db.index.fulltext.queryNodes('books', 'title: "harry potter and the order of the phoenix" OR authors:"rufus beck"^5')
╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node" │"score" │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│{"image":"https://images.gr-assets.com/books/1499277281s/5.jpg","title│1.7385646104812622│
│":"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)","isbn"│ │
│:"043965548X","authors":"J.K. Rowling, Mary GrandPré, Rufus Beck"} │ │
├──────────────────────────────────────────────────────────────────────┼──────────────────┤
│{"image":"https://images.gr-assets.com/books/1387141547s/2.jpg","title│1.0253233909606934│
│":"Harry Potter and the Order of the Phoenix (Harry Potter, #5)","isbn│ │
│":"439358078","authors":"J.K. Rowling, Mary GrandPré"} │ │
└──────────────────────────────────────────────────────────────────────┴──────────────────┘
This concludes this article about the Full Text Search capabilities in Neo4j.
Suggested Reading
Introduction to Information Retrieval : Manning, Raghavan & Schütze, 2007
Relevant Search : Doug Turnbull and John Berryman, 2016
Neo4j Full Text Search Documentation
Conclusion
Search is an important part of any application. The recent release of Neo4j brings this support which has been a long-time feature request from the community.
GraphAware has been a pioneer of Graph-Aided Search, using graphs to help during relevance engineering, with implementations at Airbnb or the World Economic Forum.