Few years ago I decided that one day I would create a Graph Technology Landscape map, which would be useful for everyone who wants to discover the players around graph technologies. I started to collect the companies and products, but my research has never manifested into a proper blog post. Till now. I am happy to announce, that the first version of my landscape is published, I hope we can consider this as a start of a long journey.
A little background about this post: A few years ago when I started to take my first baby steps in the land of big data, I found the Matt Turck Big Data Landscape picture on the internet. I was impressed; I was able to spend hours staring at it on the wall at the office. It was like a treasure map with lots of new areas to discover! It was my best guide to follow the trends of technologies every year, and I always used that picture in my presentations to describe the role of graph technologies in the Big Data World. I realised that graph technologies should have a landscape too. It took a while, but it is here!
This landscape is not something that a market research company created, this is something I created with passion and love, because I believe in graphs. It is it is highly probable that it is not perfect, and of course, we at GraphAware see this entire World through our “Neo4j expert” eyeglasses. We consider this just as a starting point, and the goal is to collect community feedbacks, and include them into our next graph technology landscape. As a colleague of mine used to say: “If you build it, they will come.”
“Fortunately”, it is a big challenge! That’s because graph databases are the fastest growing category in databases (based on the db-engines.com), and there are many changes every day. We are sure this will continue to be the case in 2019. Graph databases and graph technology will be hot topics again this year and we are looking forward to seeing it.
Let me share a few thoughts about the picture and the categories I used. During the research phase I had to define a few rules I tried to follow during the whole process:
These were the general rules I applied. Now let’s spend a few words about the the categories.
I simply used two categories here. One for graph databases, and one for all other non-native graphy solutions. So the category “Multi-model / RDF” can contain key-value, document, sparse matrix, and RDF storage models. I did not create distinct categories for each storage model, because sometimes they use a hybrid solution to store graphs.
If you decide to host your graph data in the cloud, then you have several options. If you choose infrastructure-as-a-service (IAAS), you can select one of the major players and just run a VPS server with your graph database on it. It can be almost any cloud hosting provider. But here we only listed the vendors which by our knowledge support running a graph database as a droplet/cluster/addon. In the platform-as-a-service category (PAAS) we listed the vendors which let you use a graph database out of the box in the cloud.
In this category we listed the data ingestion tools which are perfect fit for a graph database. Some of them are Neo4j specific, like GraphAware Databridge or the recently released new version of the Neo4j ETL tool, but there are other tools which are more universal, like Talend, or a Kafka, or a Pentaho Kettle. Norconex also released a Neo4j loader extension to their crawlers. We are pretty sure there are other integration frameworks out there, but these are the ones we usually see during our client projects.
This category can be on any data technology landscape, not just on the graph one. They are the major players in data visualisation / dashboarding market, and they already realised that they should focus on connected data as well. We can see cool examples on the internet how they work together with a graph database.
This is a pretty flexible category, I just drop here all the graphy applications I found. I think the common feature of these applications is that they provide you insights about your connected data. They may use machine learning, they may visualise your data, or they just provide a solution in a specific domain with the help of graph technology. I feel that the number of applications are much bigger than I collected, but I do this to initiate a real community effort to extend our lists.
I created this category because there are products and solutions on the market focusing on this special area. We have some of the databases here as well, because they are not just storing textual information, but they are focusing on the knowledge and how to gain that kind of information from the data. “A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.” We know this definition from here, and we are realising that almost all organizations are going to need to build large internal knowledge graphs to maintain the company wisdom somehow. We are looking forward to this evolution, because recently this is one of the hottest topics driving the graph technology space.
If you ever introduced a new technology into a company then you already realised it is never an easy-going activity. So there are companies out there who can help with training and consultancy to adopt graph technologies faster. Undoubtedly every organization should focus on the relationships inside their data to get insights. There are several companies who can help in this activity.
I tried to collect the tools/libraries here, which can be used for developing your applications based on some of the graph databases or platforms. These libraries provide useful tools and methods which make the developers’ life easier. These frameworks speed up development, and provide a platform for building useful generic as well as domain-specific functionality, modelling tools, testing methods, analytical capabilities, graph algorithms, and many more.
In general, a query language is a computer programming language used to retrieve information from a database. Languages used to interact with databases are called query languages, of which the Structured Query Language (SQL) is the well-known standard. Queries written against graph databases are closer to how the data is modeled than other query languages. Graph databases have numerous other query languages, each of which is trying to solve a particular problem. And because of different languages for different products do not help us much, there are activities to create a common, unified property graph query language, like GQL.
It is a collection of books about graph technology. It is hard to define the line but we tried to list the books about technology and not include books about “pure” network science. Because network science can fill a whole library, and we are interested in the implementation part of it, how graph algorithms and graph theory are manifested into runnable code.
One of the best thing around graph technology is the community. The people who work in this area they already realised that relationships matter, so they are always open to new connections/discussions/ideas. And the best place to catch up with them is a graph conference. Earlier there were GraphConnect in London and San Francisco, now there is only one annually in New York, but you can find a few others as well. But I am pretty sure that any of the data science/analytics conferences contain talks about graphs. Graphs are everywhere. We did not list the local meetups here, but you can find a lot if you look around a bit.
Finally, I want to thank my colleagues for helping me to create the first version of this landscape. They already have crazy ideas how to improve this in a graphy way. I appreciate that so much! Enjoy and let me know what do you think! What should I fix/change in the next version?