Find out what's new in the Neo4j world
In the first part of this short series about random graph models, we talked about why they are useful and had a brief look at two of them: Erdos-Renyi graphs and Barabasi-Albert model. In this post, we take a look at the “small world” phenomenon and another network model, namely the Watts-Strogatz model.
There is an important property of random networks which we did not write about in the last blog post: the way the node separation scales with network size. Both Erdos-Renyi and Barabasi-Albert networks are “small world” models, meaning that the characteristic node separation scales logarithmically with number of nodes present in the graph.
What are the implications of such scaling? It means that the majority of nodes can be connected to all other nodes by a shortest path of length at most (order) ln(N), where N is the number of nodes in the network. Hence, it is relatively easy to reach any node from another one in such network; the path needs to cross only a few other (ln(N)) nodes before reaching the target. If one compares two networks of different scaling properties (say linear and small world) for N = 1,000,000, then the diameter of the network (the longest of all shortest paths) is about 1,000,000 for the linear case, compared to about 10 for the logarithmic case. What is quite intriguing is the fact that certain subsets of the Earth’s population network or other networks of interest (US electric grid, film actors collaboration network, etc.) obey exactly this property. That is what Milgram’s experiment tried to test in 1967.
The experiment was set in a way that a few selected people in Omaha (Nebraska) and Wichita (Kansas) were given packages to be sent to Cambridge (Massachusetts). The parcels carried name tags and rough locations only - the goal was for the people to use their contact network to propagate the package further towards Cambridge. The parcels could be swapped only between those who knew each other on a first name basis. Interestingly, the packages that arrived had been through about five intermediaries, which in turn led to the idea of six degrees of separation between any two contacts in the population network.
Try to think how many contacts could separate you (for example) from Stephen Hawking, a prominent cosmologist and physicists. Provided you do not study physics in Cambridge (UK), it might sound rather weird that the connection to this person should be that close to you. But say, for the sake of argument, that since you now know the author of this post by name, you could contact him when looking for that particular person. He, in turn, happens to know a person who knows prof. Hawking and the two seem to remain in contact from time to time. So interestingly enough, you can reach prof. Hawking in just three steps (two intermediaries).
Erdos himself was very well aware of this random network path characteristics and established another network experiment by inventing the so-called Erdos number. The Erdos number quantifies the number of separations from Erdos in an academic network. In such network, two people are neighbors if they co-authored a research paper. The closest collaborators had Erdos number 1, their collaborators have Erdos number 2, and so on. The median of Fields medal recipients is 3, for example.
However successful the most basic random network models seem to be at the first glimpse, they do not capture another important property frequently present in real-world networks: network clustering. To get an idea what it is, note that people tend to know one another in small, partially isolated groups - clusters. Most people tend to have friends who are friends of one another. This creates triangle relationships (friend 1 - you, you - friend 2, friend 1 - friend 2) in the network. Such clustering is not explicitly present in either Erdos-Renyi or Barabasi-Albert graphs that are created by random attachment of edges. This motivated the invention of the Watts-Strogatz model.
Watts-Strogatz is a small world model, based on an idea of regular network rewiring. As you probably suspect by now, grids and regular nets do not have the small world property. Increasing the number of streets and hence crossings on Manhattan would increase the largest shortest path linearly (or definitely not logarithmically), since the district is organized in a grid-like manner (hence the concept of “Manhattan distance”). Unlike the example of Manhattan though, we want to start with a regular net with many triangles and, thus, high clustering. It turns out that if one chooses few edges at random and replaces them by different ones that connect two arbitrary nodes (rewiring), the model starts to show the small world behavior rather rapidly. In contrast to Erdos-Renyi graphs, however, the clustering on the network remains high. Watts-Strogatz is therefore a good choice if the clustering is the dominant property one wants to reflect in the model.
In this blog post, we’ve introduced the small world phenomenon and the Watts-Strogatz random graph model. In the next (and final) post of the series, we will introduce the implementation of these models in Neo4j and demonstrate their usage.
Vojtěch Havlíček is an MSci student in Theoretical Physics at Imperial College London and an intern and graph theory consultant at GraphAware.
Networks: An Introduction, M. Newman, Oxford University Press, 2010 Complexity and Networks course notes - Networks notes, T. Evans, Imperial College London. Unpublished, 2013 Collective dynamics of ‘small-world’ networks, D.J. Watts & S. H. Strogatz, Nature, vol 393, 1998