Monitoring
#
Why PrometheusPrometheus has become the mainstream open source monitoring tool of choice in the container and microservice world.
It provides ready-to-use Exporters for all services that our system provides and implements a pull mechanism that is better suited for microservice architecture.
Furthermore, it supports additional features like the Alert Manager, a query language to retrieve the data and the visualization tool Grafana
#
Kubernetes Prometheus StackWe are using the Kube-Prometheus Stack developed by the Prometheus community.
#
Installing the helm chartMake sure your kubectx is set to the Airy Core instance and you have Helm installed.
#
Customizing PrometheusIn this section you can customize the Prometheus chart by changing the defaults
in infrastructure/tools/prometheus/values.yaml
to ones that suite your
requirements.
To access Prometheus, Grafana and Alertmanager from outside the cluster you have
to put your hostname in the respective hosts: []
variable.
In the case that you make Grafana publicly accessible you should also set the
adminPassword
to something secure.
You can apply those changes by running:
helm upgrade prometheus --values infrastructure/tools/prometheus/values.yaml
#
Grafana DashboardsGrafana is a very powerful visualization tool which can be used for all sorts of dashboarding and monitoring tasks.
For Grafana there is one more step to do before you can access it.
k edit cm prometheus-grafana
Access Grafana under /grafana
and login with the adminPassword
you set in
the values.yaml
.
#
Access Predefined DashboardsIf the defaultDashboardsEnabled
is set to true you can find the default
Kubernetes dashboards under /grafana/dashboards
The Grafana website provides a lot more dashboards that can be added to your instance by importing them.
Here is a list of dashboard we recommend to add to monitor your Airy Core instance
#
Receiving alertsTo get notifications for the default alerts all you have to do is set up a receiver like described here. That way you can get notified on Slack or PagerDuty on issues like crashing components and nodes running out of free storage space.