Prometheus has become the mainstream open source monitoring tool of choice in the container and microservice world.
It provides ready-to-use Exporters for all services that our system provides and implements a pull mechanism that is better suited for microservice architecture.
Furthermore, it supports additional features like the Alert Manager, a query language to retrieve the data and the visualization tool Grafana
Kubernetes Prometheus Stack
We are using the Kube-Prometheus Stack developed by the Prometheus community.
Installing the helm chart
Make sure your kubectx is set to the Airy Core instance and you have Helm installed.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack
In this section you can customize the Prometheus chart by changing the defaults
infrastructure/tools/prometheus/values.yaml to ones that suite your
To access Prometheus, Grafana and Alertmanager from outside the cluster you have
to put your hostname in the respective
hosts:  variable.
In the case that you make Grafana publicly accessible you should also set the
adminPassword to something secure.
You can apply those changes by running:
helm upgrade prometheus --values infrastructure/tools/prometheus/values.yaml
Grafana is a very powerful visualization tool which can be used for all sorts of dashboarding and monitoring tasks.
For Grafana there is one more step to do before you can access it.
k edit cm prometheus-grafana
domain = <your_hostname>
root_url = <your_hostname>/grafana
serve_from_sub_path = true
Access Grafana under
/grafana and login with the
adminPassword you set in
Access Predefined Dashboards
defaultDashboardsEnabled is set to true you can find the default
Kubernetes dashboards under
The Grafana website provides a lot more dashboards that can be added to your instance by importing them.
Here is a list of dashboard we recommend to add to monitor your Airy Core instance
To get notifications for the default alerts all you have to do is set up a receiver like described here. That way you can get notified on Slack or PagerDuty on issues like crashing components and nodes running out of free storage space.