Monitoring Neo4j and Procedures with Prometheus and Grafana - Part 1

· 8 min read

Database Monitoring is a crucial aspect of any application deployment. After all, databases manage data and sit quite down in the stack. They are robust pieces of software, but their setup and maintenance need care and attention since any problem has the potential to be disruptive to business.

Lotus Elan Dashboard © Smiths Instruments

The Neo4j Graph Database provides a complete set of metrics that allow continuous monitoring of status, performance and usage of everything important, from transactions lifecycle, memory utilization, driver sessions all the way to cluster communication.

Neo4j metrics are exposed in different ways - as CSV files or JMX beans, and the Graphite protocol or Prometheus endpoint - and they can be consumed by a variety of systems. Graphite is an application capable of storing and rendering metrics, and there are ready to use dashboards for Graphite/Grafana combinations, where the rendering is taken care of by the popular Grafana. Halin is a Graph App for monitoring Neo4j databases, developed by David Allen and now a Neo4j Labs Project. Halin provides extremely useful out-of-the-box insights about how a Neo4j system is performing. You should definitely check it out if you haven’t done so yet.

When you need to integrate with an existing monitoring infrastructure, it is not always possible or desirable to install new tools. As Prometheus becomes a widely used solution, it is likely to be the choice of preference in many organizations for storing and querying time series data. Grafana is also a very popular choice because it offers a single place to watch all metrics, whatever the source.

We didn’t find any Prometheus/Grafana templates for Neo4j ready to be used, and this is the first reason for this tutorial. Besides, the Neo4j database you want to monitor is often part of a bigger data-centric architecture. Data flows into the graph database from other systems, gets connected in a graph model, and then is served to other systems for use. In these cases, you might not find all the monitoring support you need in the internal metrics alone, and you want to create additional metrics in your procedures to have a better view of the processes in which Neo4j is involved. Showing how to do this is the second reason for this blog post. So, if you are interested in this topic, read along, we’ll jump right into it!

The Prometheus Connection

Support for the powerful open source monitoring and alerting tool Prometheus was first introduced in Neo4j in version 3.4. This feature is only available in the Enterprise edition of Neo4j. You can enable it with these two entries to the neo4j.conf file (more detailed instructions in the Knowledge Base):

# Enable the Prometheus endpoint. Default is 'false'.
# The IP and port the endpoint will bind to in the format <hostname or IP address>:<port number>.
# The default is localhost:2004.

With this configuration, Neo4j starts a valid Prometheus endpoint where it exposes the metrics, and its job is done. It will be Prometheus that will poll - or scrape - metrics data from it.

After you install Prometheus, you just need to configure a scrape configuration in prometheus.yml like this:

# A scrape configuration containing the Neo4j endpoint to scrape:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'neo4j-prometheus'
    scrape_interval: 10s
    # target: ip address of the Neo4j server
    - targets: ['localhost:2004']

With this configuration, you can run Prometheus and it will start collecting Neo4j metrics. You can check if it is working by opening Prometheus browser at localhost:9090 and visiting the Status/Targets page. It should list your neo4j-prometheus job.

If you go to the Graph page, you can choose any Neo4j metric from the dropdown. Just type “neo4j_” to list all the options. Note that dots in the original Neo4j metric names are replaced by underscores. When you execute the query with the metric name, Prometheus will render a nice line chart of the timeseries. Congratulations, you just used PromQL, the Prometheus query language! We will explore it in detail in the next post.

Prometheus UI graphing a Neo4j metric

Publish custom metrics from Neo4j procedures to Prometheus

So we have all Neo4j metrics in Prometheus, this is a good point to start discovering how your graph database behaves in different scenarios. Usually you do not want to monitor only these Neo4j metrics, but you want to see the behaviour of your custom database logic as well, for example, the creation rate of some business entities.

If you have your business logic already implemented in stored procedures, then you can easily monitor them in Prometheus by defining your own custom metrics inside procedures.

Since Neo4j server uses Dropwizard Metric to collect the database level metrics, it is a good idea to use this method to capture your application-level metrics as well. Metrics is a Java library which provides measuring instruments for Java applications. It has several modules; Neo4j uses metrics-core and metrics-graphite modules, but in our case, we will use the metrics-core library only.

In our procedures, we will define a MetricRegistry to register our metrics. For this we need to instantiate a MetricRegistry class. We can have several metrics registries, it depends on the reporting strategy, if you want to use different reporting methods for different metrics, then you can have different registries. For example if you want to report some metrics to the console, some of them to Prometheus, and some of them to CSV file, then you can organise your metrics into separate registries, and then you can use the corresponding reporters to produce the required output. Dropwizard provides a great collection of these reporters.

private static final MetricRegistry METRIC_REGISTRY = new MetricRegistry();

In Dropwizard Metrics you can use several commonly used metric types like Meter, Gauge, Counter, Histogram and Timer. Neo4j publishes all its metric values as Gauges which are the simplest metric type to publish a particular value. Maybe it is not the best for all kinds of Neo4j metrics so you can use different types of metrics in your application. In our example, we will use a Counter and a Timer metric.

Counter is used for recording incrementations (and decrementations). In our example, we register a Counter and use it for counting the number of executions of the procedure.

Registering a Counter:

private static final Counter myCounter = METRIC_REGISTRY.counter(name(MonitorProcedure.class, "my_counter"));

Incrementing a Counter:;

Timer metric is used for keeping track of multiple timing durations which are represented by Context objects, and it also provides their statistical data. We use a Timer to measure the execution time of our business method.

Registering a Timer:

private static final Timer myTimer = METRIC_REGISTRY.timer(name(MonitorProcedure.class, "my_timer"));

Using Timer to measure execution:

Timer.Context timerContext = myTimer.time();

//Do some business logic here

//Close Timer context

With Dropwizard you can use different types of Reporters to publish your measurements. If you want to output your metrics to the console or a CSV file, then you can use the provided Reporters, such as ConsoleReporter, CsvReporter, Slf4jReporter, JmxReporter and so on.

In our case, we will not use any reporter, and we will reuse the CollectorRegistry which is provided by the Prometheus client. To register a collector (DropWizardExports) to Prometheus, we have to do a simple registration when we initialise our Procedure class. This way our metrics will be published on the Prometheus endpoint provided by Neo4j (localhost:2004).

Registering our MetricRegistry to the collector:

CollectorRegistry.defaultRegistry.register(new DropwizardExports(METRIC_REGISTRY));

Once you create your example plugin, you can install it to your Neo4j server, and verify metrics by calling http://localhost:2004. Note that your metrics will be found among the default Neo4j metrics. To make it work you have to call your procedure at least once, like with this cypher query:

call example.monitoring

After that you can see your custom metrics in the monitoring endpoint (this is the result after 20 invocations):

# HELP com_graphaware_neo4j_monitoring_MonitorProcedure_my_counter Generated from Dropwizard metric import (metric=com.graphaware.neo4j.monitoring.MonitorProcedure.my_counter, type=com.codahale.metrics.Counter)
# TYPE com_graphaware_neo4j_monitoring_MonitorProcedure_my_counter gauge
com_graphaware_neo4j_monitoring_MonitorProcedure_my_counter 20.0
# HELP com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer Generated from Dropwizard metric import (metric=com.graphaware.neo4j.monitoring.MonitorProcedure.my_timer, type=com.codahale.metrics.Timer)
# TYPE com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer summary
com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer{quantile="0.5",} 1.3296000000000001E-5
com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer{quantile="0.75",} 1.5634E-5
com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer{quantile="0.95",} 2.0026E-5
com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer{quantile="0.98",} 7.17251E-4
com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer{quantile="0.99",} 7.17251E-4
com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer{quantile="0.999",} 7.17251E-4
com_graphaware_neo4j_monitoring_MonitorProcedure_my_timer_count 20.0

You can find the example project here.

With the approach we described, you can monitor any aspect of your graph business logic, and make sure your Neo4j database maintains continuous performance and health.

We use Neo4j Testcontainers in the integration test, especially the Neo4j Enterprise version. To make it runnable you have to add a container-license-acceptance.txt file to the root directory of your test resources, containing the text neo4j:3.5.0-enterprise in one line. With this you will accept the license terms and conditions. You’ll find more information about licensing Neo4j here. Or contact us directly at GraphAware.

We hope you enjoyed this post so far. Soon we will publish the second part of “Monitoring Neo4j and Procedures with Prometheus and Grafana”. This is going to be a detailed tutorial focused on how to create a perfect Neo4j Dashboard. Interested? Stay tuned!

Update: As promised, here is Monitoring Neo4j and Procedures with Prometheus and Grafana - Part 2

Janos Szendi-Varga

Miro Marchi

Product Development | Neo4j certification

Dr. Miro Marchi holds a Ph.D. in Cultural Anthropology. With expertise in ethnographic analysis, graph data modelling with Neo4j, and JavaScript data visualisation, Miro brings a unique perspective to the field of graph technologies. Combining his diverse experiences, Miro plays a critical role in bridging the gap between human behaviour and technology. He currently focuses on product development to leverage his complex understanding of all related aspects.