As you know, our Chief Scientist, Dr. Alessandro Negro, recently published a book titled Graph-Powered Machine Learning. We are very proud of the Chief, and very excited about the book. We're even planning an event, where you'll be able to ask Alessandro anything about it!
But what is really in the book? Let me share what the first chapter is about so you know what you're in for - a treat! The book opens with an introduction to Machine Learning and Graphs. The first chapter covers Machine Learning, some of the challenges of Machine Learning, Graphs, and the role Graphs play in Machine Learning. So what did we learn in the first chapter?
What is Machine Learning?
The author starts the chapter by saying: "Machine Learning is the core branch of artificial intelligence." He explains that for artificial intelligence to know how to do anything, it needs to learn it first. Such learning is basically turning experience to expertise - after having seen a pattern millions of times, or after performing a task millions of times, machines can learn from what they saw/did, find a pattern, and drive their ability to perform a said task to another level. This is where we can say that they know how to perform a task. Therefore, the input to machine learning is data, and the output is expertise in the form of predictive models, computer programs, etc.
When you're deciding whether you should use machine learning, you should always ask yourself:
Is the specific task too complex to be programmed?
Take for example programming an image recognition software - while it is easy for us to recognize our friends from pictures, it's not easy to program a computer to do so - it needs to learn how to do it.
Does the task require any sort of adaptivity throughout its life?
The importance of this question can be observed on the example of spam filters. Spam filters can be easily programmed if one lists all the words and phrases that are likely to indicate a spam email, and programs a software to avoid these. However (omitting the fact that not all "suspicious" words can be listed by a human), the words and phrases can change over time - this means that in order to have an effective spam filter, you'd have to constantly update it. The alternative? Use machine learning - let the program itself learn, continuously identify new words, and update its accuracy as it does its job.
Graphs can be extremely useful for machine learning projects
While graphs are not always the most suitable type of database for machine learning projects, they have many advantages that make them ideal for certain projects. Indeed, graphs are great for storing the data, and make the process of accessing it extremely quick.
How does machine learning project life cycle look like?
Machine learning project life cycle is a cyclical process composed of: business understanding, data understanding, data preparation, modelling, evaluation, and deployment.
To put it simply, in every machine learning project, you need to understand why you are doing it - understand the business domain - that is, understand what it is you are trying to achieve, when will you be able to consider yourself successful, what are the goals and objectives of the project, and how can you translate these goals and objectives into a machine learning problem definition. Once you understand the business domain, you are ready to look at the data, the understanding of which will help you understand the business domain even better. In understanding your data, you should look at what kind of data you have, where is this data coming from, what are the initial insights and hypotheses you can draw from the data, and so on. Preparing your data means gathering, merging, cleaning, and organizing it in one structure before it is processed by the machine learning algorithms. The actual processing of the data and building prediction models is called modelling. The most appropriate prediction model is selected in the evaluation stage, and the solution is then deployed. However, the cycle does not end here - it is important to monitor and track the progress and success of the project.
What are the main machine learning challenges?
The source of truth
For a machine learning project to be successful and produce accurate predictions, you need it to learn/train on a lot of data. Even for the simple use cases, you need to provide your machine learners with a lot of data. What's more, this data needs to be of high quality, representative, and include only the relevant features.
The performance of your machine learning project can be assessed by the accuracy of the predictions produced, the time required to compute the model, and the response time needed to provide predictions. Here we're beginning to see how graphs can help machine learning projects. Graph databases can not only provide storage for big data, but they are also exceptional for accessing data quickly, reducing the time needed for producing a prediction, and can provide algorithmic techniques that can improve the accuracy of predictions.
Storing the model
Storing the machine learning model and being able to query it quickly are big challenges in machine learning projects - again, graphs can help tackle these challenges.
As recommendations need to be produced very quickly and sometimes based on only a limited set of data ( for example recommendations for anonymous users of an online store), the learners need to be able to learn fast and predict quickly - that is to process new data, learn from it, and provide new, more accurate predictions in milliseconds.
What are graphs?
As the author puts it: "A graph is a simple and quite old mathematical concept: a data structure consisting of a set of vertices (or nodes/points) and edges (or relationships/lines) that can be used to model relationships among a collection of objects."
What is the role of graphs in machine learning?
As I said above, graphs are collections of nodes and relationships that can be used to represent pretty much anything in today's world - everything that includes items connected to each other by different relationships can be represented in a graph. But what can graphs do for machine learning? Three things: manage data, analyze it, and visualize it.
Machine learning projects need to access, store, read, and manage large amounts of data - graphs can help with this. Thanks to graphs, you can: connect data from different sources into a single connected source of truth, organize it in a homogeneous structure in a knowledge graph, access the data quickly, enrich, clean, and merge it effortlessly thanks to their schemaless structure, and identify and select the relevant features quickly.
Graphs provide various graph algorithms that can enrich your data analysis. Furthermore, they help with cleaning the data before and during the training and increase the quality of the predictions thanks to network dynamics and the ability to host different models, which can be accessed quickly in the same graph.
Graph-powered data visualization can be surprisingly powerful and helpful in data navigation, performing human-brain analysis - identifying patterns, and communicating the insights clearly (Labeled Property Graphs are incredibly intuitive and easy to read).
In the first chapter of his book, the author walked us through the concepts of machine learning and graphs and explained how they fit together. I can't wait to read more about Graph-Powered Machine Learning, but I am a slow reader, so just get your copy of the book now and carry on reading on your own ;)