Data Orchestration meets Graph Analytics
by Esther Bergmark
· 4 min read
Organisations face a dilemma: the data they have gathered bears immense potential, yet leveraging it is a significant challenge. This blog post outlines an approach to solving this problem by combining graph technology with data orchestration.
How powerful your graph is depends very much on the data you feed it with. And that is where the challenge begins - data exists in abundance. Organisations have been aware for a long time that the data they can collect bears great potential in the form of increased efficiency through empowered digital transformation, discovery of new revenue sources or uncovering of insights. Large investments have been made in order to collect and store data. Most organisations have a sophisticated system landscape for data extraction, transformation and storage - databases, data warehouses, data hubs, or data lakes are in place, and data pipelines have been built to tap into those storages. While data is now successfully extracted, cleaned and stored, it often sits in silos. Data silos have become one of the major issues that prevent organisations from fully leveraging the potential of their data.
Data orchestration has gained recognition as one way of breaking down silos in recent years.
Each step within a data pipeline, starting with data extraction from different sources, continuing with the loading and transformation of these data, all the way to the creation of reports and other data products, can be understood as a workflow. A complete pipeline consists of several workflows that need to be managed to ensure efficiency and meet the requirements of business users in downstream processes.
Data orchestration is the process of coordinating the execution and monitoring of these workflows.
Orchestration, in many ways, is the practice of breaking down silos by making that data accessible. At the heart of a data orchestration infrastructure is the authoring of data pipelines and workflows to move data from one location to another while coordinating the combination, verification and storage of that data to make it useful. Following that, modern orchestration has become the task to facilitate breaking down silos and making data more accessible and useful across the organisation. Modern Orchestration is usually data-driven, while the first generation of orchestration tools, such as Apache Airflow, are task-driven.
Another technology that increasingly gained recognition in recent years are graphs. We do not need to go into detail about the power of graphs for data analysis and investigations in this blog post. We have written about it extensively in the past (read our blogs about Graph Data Science and Clustering, or Virtual Relationships), and you are well aware of it. But how powerful the graph is and which experience it provides to data scientists, analysts and investigators depend on how it is connected to data sources and how data is ingested, transformed and surfaced to the user.
Law enforcement agencies have a large amount of data sources which provide data in a wide range of formats, from phone call records to images or audio files. Officers use graphs to discover unknown unknowns, which enables them to gather intelligence to prevent crimes or gather proof for already committed ones.
During an investigation, an officer will explore phone call records, location data gathered from cell phone towers, personal register records, etc. Once evidence is found, the officer will compile a report and share it with people and authorities that do not have access to the graph. All this data comes from different sources, was ingested and transformed and finally written to the graph database and surfaced to the officer in the graph visualisation.
Normally, a data engineer takes care of setting up the connection to the various sources and orchestrating the data. Suppose a data-driven orchestration tool is used for this task. In that case, the engineer can easily create workflows that tap into several sources, extract the data and manipulate it in one consistent flow. Contrary to a task-driven approach, the data-driven flow knows the data that will be transformed and how it will be transformed. If the orchestration tool allows for streaming processing, data can be consistently ingested and transformed; the officer will always have the latest data available in the visualisation.
Combine Data Orchestration and Graph Analytics
In Hume, we use Orchestra to create data-driven streaming workflows that leverage an ecosystem of sources and integrations to event-streaming services such as Apache Kafka, transform the data and make it available for exploration and investigation in the graph. Apart from connections to data sources and event-streaming services, Orchestra allows for integrations with any application. This way, data engineers can tap into all available data, transform it for business purposes and make it available for end-users in one central place.
The combination of a data-driven orchestration tool with the most powerful graph database and highly sophisticated graph visualisation gives us the opportunity to bridge silos and support the organisations that work with us to fully tap into the potential of their data - to solve the dilemma of this potential being locked away in inaccessible silos.
Each of these technologies - Neo4j, Hume Visualisation and Orchestra - are powerful by themselves. Combined, they enable us to gain insights into data we did not know existed.
Do you need to break your data silos? Reach out to our experts and find out how Hume Orchestra can help you combine and manage your data, creating a single source of truth for your organisation.