So far, the Graph-Powered Machine Learning book has introduced us to graphs and machine learning. The second part of the book talks about recommendations. Recommender systems (RS) gather information about users and items and provide item suggestions, bringing great value to online stores - clothing stores, bookstores, you name it. Companies like Netflix base their entire businesses on high performing recommender systems.
Recommender systems can help businesses:
- increase the number of items sold,
- sell more diverse items,
- increase customer satisfaction and loyalty, and
- get a better understanding of what the customers want.
Recommendations are one of the top use cases for graphs. Creating a graph recommendation model is highly effective and efficient. With graphs’ fast computations and access patterns, recommender systems can provide even real-time recommendations.
There are many different approaches to providing recommendations. The author does not discuss all of them, but he’ll take you through the most common ones in the book. In this chapter, we’ll start with content-based recommendations. So buckle up and get ready to learn:
- what are content-based recommendations,
- how to model items and users for content-based recommender systems, and
- what are the different approaches to providing content-based recommendations.
Content-based recommender systems (CBRS) rely on item and user profiles. Item profile is a collection of item features, i.e. characteristics of the item such as the colour of an object, authors of a book, and actors in a movie. User profiles can be compiled of implicit or explicit information about user preferences. CBRS matches user profiles with item features they like and find items with the same/similar features. This method of providing recommendations is suitable even if one has limited data.
CBRS have three main components:
- Item analyzer extracts item features from their contents or metadata.
- User profile builder collects data about users and their preferences.
- Recommendation engine matches user interests with item features. The recommendations are provided based on the relevancy scores calculated for each item.
Modelling items and item features
The first step in providing content-based recommendations is modelling items. Items can be represented as a set of features. The features can be, for example, the genres of a movie, ingredients in a particular meal, or keywords from a text file. These features are usually collected from the item metadata.
Features can be represented in a graph. In a Labelled Property Graph (LPG), you can label nodes with classes and create node and relationship attributes. Take the example of a movie recommender system. There are two main ways to model data about movies in a graph. You could model the movies as nodes and movie features as node properties. While intuitive, this approach comes with many drawbacks. Alternatively, you can model both the items and the features, as separate nodes but connected via relationships. This allows you to:
- Avoid data duplication. Modelling features as nodes’ properties leads to data duplication. Modelling the features (actors, genres) as nodes prevents this - and thus saves space. Leveraging multiple labels, you can model, e.g., one person as both an actor and a director of a movie. Modelling features as nodes also helps tackle aliases, further cleaning your dataset.
- Find and correct errors easily. Duplicates in your data not only take up space they also make spotting errors more difficult. If you model features as separate nodes, errors such as misspelt names are easier to identify.
- Easily extend and enrich the graph. Modelling features as nodes makes it easy to group them and add more features or properties.
- Easily navigate the graph. As each node, and each relationship, can be an access point, it’s much faster to navigate the graph. Let’s say you don’t know what movie you would like to watch, but you know you’re in the mood for an action film. If you model features (genres) as nodes, you can access the “action” node and find all the movies with that feature. If you model features as properties of the movie nodes, you’d need a more complex, slower query to do this.
Thus the way you model your data in a graph affects the amount of space used, as well as the speed of the model.
The next step in building a content-based recommendation engine is to model the users. This can be done by taking the graph model we already have and adding user nodes to it. The user nodes are connected to the features and/or items the users like.
The relationships between users and features/items can be obtained explicitly by simply asking users to rate items or implicitly. Assuming user interests based on their interactions (purchases, views, etc.). In this stage, it doesn’t matter if a user explicitly rated a movie or if we only know they’ve seen it. We can draw connections between the item features and the user based on the user-item link.
Providing content-based recommendations
Once we have both user and item profiles, we’re ready to provide recommendations to the users. There are several ways of doing this.
Pure graph model
The pure graph model is one of the most straightforward approaches to providing recommendations. Especially considering its simplicity, the model offers high-quality recommendations. The only thing you need to use in this approach is correctly modelled data. A simple query can provide recommendations - no algorithms or complex computations are required. The simplicity of the model translates to its high speed and real-time recommendations since it immediately adapts to new nodes and new relationships. Finally, the model is extensible, meaning the graph can also contain other information. Taking more aspects, i.e. features, into consideration increases the quality of the recommendations further.
Vector approach also considers the extent to which a user likes a particular feature. Thus, it provides more accurate recommendations than the pure graph approach. The user-item similarity is computed from user and item vectors in this approach. You need the user and item profiles to be represented in a single model to do so. A similarity function computes the user-item similarities, which are used to provide recommendations. The cosine similarity function is the most used in this approach.
To use the vector approach for providing recommendations, you need to:
- Represent the items as vectors of their features. An item can be represented as a vector of all the possible features an item can have. Binary (0 and 1) values indicate whether the item has the feature. The numerical values represent the extent to which one likes the feature. These can be obtained explicitly (based on user rating), or implicitly (calculated).
- Migrate the user-item and the user-feature metrics into a single vector space model.
- Count occurrences for boolean (binary) features, and compute averages for integer (numerical) features.
- Multiply the integer values by a scaling factor based on their weight. This ensures the integer values don’t over-dominate the calculation.
- Normalise each value. In other words, make sure the value represents a part of a whole. For example, having seen four action movies can be a little or a lot depending on whether you saw 5 or 100 movies in total.
- Provide a recommendation. Using cosine similarity, you can compute similarities between user-profiles and items. Once ordered from highest to lowest, the top similarities turn into recommendations.
Similarity-based retrieval approach
This approach finds items similar to the ones a user already purchased. This approach can be used when item metadata is missing. Thus the recommendations are not computed from user-feature data, but from user-item data.
This approach has three elements:
- User profile. Here users are represented by the connections to the items they have purchased/liked.
- Item representation/description. The way items are represented depends on the similarity function used.
- Similarity function computes the similarity between items.
Building recommender systems is one of the top use cases for graphs. Content-based recommender systems rely on matching item and user profiles. CBRS match users with item features and finds similar items to the ones a user has already purchased. Depending on the kind and the amount of data you have, you can choose an approach most suited to your use case.