Bridging similarity islands in recommendation systems with Neo4j

March 8, 2017 · 9 min read

Recommendation engines are a crucial element in the global trend towards a push-based web experience and away from a pull-based one. They provide the ability to personalize content offered to each user by predicting the interest the user will have in the recommended items. This is not only a powerful business tool for content providers, but also a vital improvement to the user experience. In today’s world where the volume, interdependence, variety and speed of information is overwhelming, recommendation engines can significantly reduce the gap between us and what we search for. Indeed, these engines are used even to enhance common text based search (read more about graph-aided search in our blog).

The more similar the better?

Real world recommender systems can be complex and diverse (combining content-based and collaborative filtering, graph theory and machine learning, with multiple algorithms competing and contributing to the end result). In all cases, the final recommendation is based on some sort of similarity or closeness measure (Leskovec et al. 2014).

Recommendations can be items similar to items previously chosen by the user, or items that have attributes that the user likes (content-based filtering); items chosen by others with taste/behavior similar to the user’s own (collaborative filtering); even items chosen by friends of the user (social recommendation).

+-------------------------------------+
|     user similarity table (example) |
+-------------------------------------+
| u1.id | u2.id | (cosine) similarity |
+-------------------------------------+
|    12 |    54 |                1,00 |
|    12 |  3465 |                0,92 |
|    12 |   893 |                0,91 |
|    12 |   943 |                0,76 |
|    12 |    14 |                0,65 |
+-------------------------------------+

Similar items are items that have things in common (e.g. features, categories), or that co-occur as choices of the same user (or class of users). Similar users are users who chose the same items (or the same items’ attributes), or that share some similar features (e.g. age, city).

Is it always true that the item most similar to the users’ preferences or closest to the users themselves is the best possible recommendation? I believe this assumption can sometimes be misleading. Let’s consider a couple of examples in the learning and fashion domains.

Examples of something different

Think about the task of recommending courses to users so that they can learn specific skills, useful for their career. For example, let’s say we are in Human Capital Development department of a big enterprise, and we want our system to recommend online courses to our employees (you can read more about HCM with graphs in our blog). We could take into account their skills and their level of expertise, and try to match the needs of the company with the learning choices of users. Suppose that the company needs to increase the number of experts in some particular skill. We could find which employees are already on the way to mastering it, and suggest the next level course appropriate to their needs.

Great, but let’s see this issue from the employee’s perspective. Is it better that I learn a skill that all my colleagues already know so that I keep up with them, or is it better that I learn a new skill – something that it is trending yet beneficial to the business and to my career, but that none of my colleagues have yet mastered? Some skill that sets me apart and yet opens up previously unexplored opportunities for the company and my career?

There is a lot of value in this type of recommendation, and it would be lost if we focus only on what is most similar to the user.

Let’s apply the same logic to another scenario. We are a fashion retailer and we are recommending clothing to our customers. Do we want to recommend items that others in the user’s social circles, maybe his closest friends, already bought? Or do we want to recommend something similar enough to his current style so that it still appeals to him but that none of his acquaintances have? Perhaps a brand that is not yet known in the user’s social circles, so that he could be the bridge of that brand into those communities?

Social network weak ties eating together

The theory behind these ideas is well known in the social sciences, and goes back to Mark Granovetter’s highly influential paper The Strength of Weak Ties published in 1973. People strongly tied to each other form close community structures, inside of which the same information tends to be repeatedly shared among members with high levels of redundancy. On the other hand, weak ties to people outside a strongly connected community can introduce fresh innovative information. Members connecting two communities act, as a matter of fact, as bridges of information.

The weak tie between Ego and his acquaintance, therefore, becomes not merely a trivial acquaintance tie but rather a crucial bridge between the two densely knit clumps of close friends. (Granovetter 1983: 202)

This theory had great success in Social Networks literature. More recently the same ideas have been introduced in Social Capital literature, distinguishing bonding and bridging types of social capital, suggesting that the integration of both is the best recipe for network performance (Burt 2001). Other strands of literature such as social learning and communities of practice (with concepts of core and periphery of the community) are greatly aligned to these results (Wenger 2010).

Getting out of user neighborhoods in recommendation graphs

Example of a recommendation graph with users, items, categories, reviews...

A recommendation graph is more complex that a pure social network. Along with persons and groups, it stores different types of items, categories, and so on, and a multitude of different relationship types including user interactions with items. Nevertheless, the structure of both types of graphs is similar in that they are not homogeneous. Structural holes highlight internally highly connected sub-graphs, loosely connected between each other. Each user is located somewhere in this structure and we can think about the user’s neighborhood as composed of items she purchased, her reviews, her preferences (such as categories she likes), her friends, and so on.

Always recommending things that best fit into the user’s current neighborhood in terms of similarity could hit the problem of redundancy. The same items will be recommended over and over and it would be difficult to step out of the recommendation set without losing the ability to generate automatic and quality recommendations (like for example with a random or curated recommendation).

In some cases it might be best to boost the existent weak tie scores in the recommendation engine, according to the specific domain and business needs. Moreover, recommending a new item which (if chosen) would create a bridge between strongly connected sub-graphs can be a good strategy for leading the recommendation algorithms to reach unexplored parts of the graph. This has the potential of better matching user’s tastes and helping them discover new things, as well as promoting niches in the market and “selling in the long tail”.

The NOT recommendation: breaking free from similarity segregation with Neo4j

Bridging similarity islands in a recommendation graph

Fortunately implementing “weak-ties strategies” in a Neo4j based recommendation engine is easy and performant.

MATCH (u:User {id:{uid}})-[:LIKED]->(f:FEATURE {name: {featureName}})<-[:TAGGED_WITH]-(reco:Item)
WHERE NOT (u)-[:FB_FRIENDS]->(fbFriend:User)-[:BOUGHT]->(reco)
	AND NOT (u)-[:BOUGHT]->()<-[:BOUGHT]-(another:User)->[:BOUGHT]->(reco)
RETURN reco, count (DISTINCT f) as frequency
ORDER BY frequency DESC
LIMIT 5

Here for example we are finding items which have features liked by the user, excluding all items which are bought by the user’s Facebook friends or by anyone who bought any item bought by the user. We only needed to exclude a path using the WHERE clause and the NOT operator in our Cypher query. Real world engines can improve this simple example and implement the logic in different and more complex ways.

The importance of innovative information in recommendation systems

Let me close with one last scenario where all this become very clear – scientific writing.

A bunch of researchers are writing a paper together. At a certain point they need new theoretical models or empirical data. They are out of ideas, so they need to go back to bibliographic research, and feed search engines with the best keywords they can think of. The problem is that the results will include many of the papers they already know, and which they already know do not fit their case. They need new research.

Wouldn’t it be amazing if they could filter out all the publications cited by each of them in their own papers? Maybe even adjust the strength of this filter with a slider binded to the number of times they cited each publication? And what about going a step further and filtering out publications cited in publications they cited? Or filtering out publications cited by each of their co-authors? With these “!NOT recommendations” they might get closer to what they are searching for. This way they can unveil unexplored new parts of the scientific network related to their research. Maybe they’ll find a paper whose language or author nationality kept it too far from their citation circles for them to know. It happens all the time.

Bibliographic search filter using the NOT recommendation and weak-ties strategies

Well, with Neo4j and GraphAware you can make this happen, in real-time!

Try out the power and simplicity of Neo4j as a recommendation engine

If you are interested in adding a powerful and flexible recommendation engine to your business you can leverage our Framework and Reco module. Find out how watching this video or check out the open source software on github. If you need graph-aided search, we have you covered as well! Visit our detailed collaborative filtering glossary page for more information.

References

Burt, Ronald S. 2001. “Structural Holes versus Network Closure as Social Capital.” In Social Capital: Theory and Research, edited by Nan Lin, Karen S. Cook, and Ronald S. Burt, 31–56. Aldine de Gruyter.

Granovetter, Mark S. 1973. “The Strength of Weak Ties.” American Journal of Sociology 78 (6): 1360–80.

————. 1983. “The Strength of Weak Ties: A Network Theory Revisited.” Sociological Theory 1: 201–33.

Leskovec, Jurij, Anand Rajaraman, and Jeffrey D. Ullman. 2014. Mining of Massive Datasets. 2nd ed. Cambridge, UK: Cambridge University Press.

Wenger, Etienne. 2010. “Conceptual Tools for CoPs as Social Learning Systems: Boundaries, Identity, Trajectories and Participation.” In Social Learning Systems and Communities of Practice, edited by Chris Blackmore, 125–44. London: Springer-Verlag and the Open University.

Meet the authors

Miro Marchi

Product Development

Dr. Miro Marchi holds a Ph.D. in Cultural Anthropology. With expertise in ethnographic analysis, graph data modelling with Neo4j, and JavaScript data visualisation, Miro brings a unique perspective to the field of graph technologies. Combining his diverse experiences, Miro plays a critical role in bridging the gap between human behaviour and technology. He currently focuses on product development to leverage his complex understanding of all related aspects.