.
Prometheus and Grafana
In this HashiQube DevOps lab you will get hands on experience with Grafana and Prometheus.
We need a monitoring and alerting solution. For this we have chosen to use Prometheus and Grafana
Grafana
https://grafana.com/ Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.
Prometheus
Prometheus is an open source monitoring system for which Grafana provides out-of-the-box support. This topic walks you through the steps to create a series of dashboards in Grafana to display system metrics for a server monitored by Prometheus.
Provision
In order to provision Prometheus and Grafana, you need bastetools, docker, minikube as dependencies.
We enable Vault, Consul and Nomad, because we monitor these with Prometheus and we enable Minikube because we host Grafana and Prometheus on Minikkube and deploy it using Helm
Provision
bash docker/docker.sh
bash vault/vault.sh
bash consul/consul.sh
bash nomad/nomad.sh
bash minikube/minikube.sh
bash prometheus-grafana/prometheus-grafana.sh
vagrant up --provision-with basetools,docker,docsify,vault,consul,nomad,minikube,prometheus-grafana
docker compose exec hashiqube /bin/bash
bash hashiqube/basetools.sh
bash docker/docker.sh
bash docsify/docsify.sh
bash vault/vault.sh
bash consul/consul.sh
bash nomad/nomad.sh
bash minikube/minikube.sh
bash prometheus-grafana/prometheus-grafana.sh
Prometheus http://localhost:9090
Alertmanager http://localhost:9093
Grafana http://localhost:3000 and login with Username: admin Password: Password displayed in the Terminal
Look at Minikube dashboard for progress update and the terminal output.
...
hashiqube0.service.consul: ++++ Waiting for Prometheus to stabalize, sleep 30s
hashiqube0.service.consul: NAME READY STATUS RESTARTS AGE
hashiqube0.service.consul: grafana-557fc9455c-67h4s 1/1 Running 0 90s
hashiqube0.service.consul: hello-minikube-7bc9d7884c-fks85 1/1 Running 0 3m36s
hashiqube0.service.consul: prometheus-alertmanager-76b7444fc5-8b2sq 2/2 Running 0 100s
hashiqube0.service.consul: prometheus-kube-state-metrics-748fc7f64-hxcvj 1/1 Running 0 100s
hashiqube0.service.consul: prometheus-node-exporter-xm6fw 1/1 Running 0 100s
hashiqube0.service.consul: prometheus-pushgateway-5f478b75f7-j9tpj 1/1 Running 0 100s
hashiqube0.service.consul: prometheus-server-8c96d4966-bv24c 1/2 Running 0 100s
hashiqube0.service.consul: 5m23s Warning SystemOOM node/minikube System OOM encountered, victim process: prometheus, pid: 2375725
hashiqube0.service.consul: 5m23s Warning SystemOOM node/minikube System OOM encountered, victim process: prometheus, pid: 2385107
hashiqube0.service.consul: 5m23s Warning SystemOOM node/minikube System OOM encountered, victim process: prometheus, pid: 2394543
hashiqube0.service.consul: 5m22s Normal NodeHasSufficientMemory node/minikube Node minikube status is now: NodeHasSufficientMemory
hashiqube0.service.consul: The easiest way to access this service is to let kubectl to forward the port:
hashiqube0.service.consul: kubectl port-forward service/prometheus-server-np 9090:9090
hashiqube0.service.consul: vagrant 1198180 0.4 0.3 751844 42116 ? Sl 22:25 0:01 kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 10888:80 --address=0.0.0.0
hashiqube0.service.consul: vagrant 1198888 0.2 0.3 751588 41420 ? Sl 22:26 0:00 kubectl port-forward -n default service/hello-minikube 18888:8080 --address=0.0.0.0
hashiqube0.service.consul: ++++ Prometheus http://localhost:9090
hashiqube0.service.consul: ++++ Grafana http://localhost:3000 and login with Username: admin Password:
hashiqube0.service.consul: N6as3Odq7bprqVdvWV5iFmwhOLs8QvutCJb8f2lS
hashiqube0.service.consul: ++++ You should now be able to access Prometheus http://localhost:9090 and Grafana http://localhost:3000 Please login to Grafana and add Prometheus as a Datasource, next please click on the + symbol in Grafana and import 6417 dashboard.
You can also open Prometheus web interface and look at Status -> Targets
Grafana Datasource
We have done this automatically during the provisioning step, in the grafana-values.yaml file see below
plugins:
- digrich-bubblechart-panel
- grafana-clock-panel
- grafana-piechart-panel
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://10.9.99.10:9090
To use Prometheus as a Datasource in Grafana, we need to add it so let's do that now, please head over to Grafana on http://localhost:3000 and login with user: admin
and the password: TOKEN_IN_TERMINAL_OUTPUT
Click on Configuration -> Datasources
We have done this automatically during the provisioning step
Click add Data sources Select Prometheus and enter the URL of Prometheus, in this case we will use http://10.9.99.10:9090
Lastly we can import a dashboard, lick on the +
in the left menue and select Import
now enter 6417
and click import
and you should be able to see some graphs.
Monitoring Vault
https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
https://developer.hashicorp.com/vault/docs/configuration/telemetry
In vault/vault.sh we enabled Telemetry in the Vault config file see vault/vault.sh
# https://developer.hashicorp.com/vault/docs/configuration/telemetry
# https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
telemetry {
disable_hostname = true
prometheus_retention_time = "12h"
}
When we install Prometheus with Helm we set a prometheus-values.yaml file that specify an extraScrapeConfigs
You guessed it! Vault...
helm install prometheus prometheus-community/prometheus -f /vagrant/prometheus-grafana/prometheus-values.yaml
extraScrapeConfigs: |
- job_name: vault
metrics_path: /v1/sys/metrics
params:
format: ['prometheus']
scheme: http
bearer_token: "VAULT_TOKEN"
static_configs:
- targets: ['10.9.99.10:8200']
- job_name: consul
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: '/v1/agent/metrics'
scheme: http
params:
format: ['prometheus']
static_configs:
- targets: ['10.9.99.10:8500']
- job_name: nomad
consul_sd_configs:
- server: '10.9.99.10:8500'
services: ['nomad-client', 'nomad']
relabel_configs:
- source_labels: ['__meta_consul_tags']
regex: '(.*)http(.*)'
action: keep
scrape_interval: 5s
metrics_path: /v1/metrics
params:
format: ['prometheus']
- job_name: 'docker'
static_configs:
- targets: ['10.9.99.10:9323']
You should now see the Vault target in Prometheus web interface at http://localhost:9090/targets
Grafana Datasource
We have done this automatically during the provisioning step, in the grafana-values.yaml file see below
plugins:
- digrich-bubblechart-panel
- grafana-clock-panel
- grafana-piechart-panel
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://10.9.99.10:9090
We now need to add a Grafana Datasource of Type Prometheus based on these Targets
Please navigate to http://localhost:3000/connections/your-connections/datasources
Name: Prometheus
URL: http://10.9.99.10:9090
Now, let's import the Vault Grafana Dashboard, to do that, click on the top right + and select Import Dashboard
ref: https://grafana.com/grafana/dashboards/12904-hashicorp-vault/
Enter 12904
and click on Load
Navigating to Grafana -> Dashboards you should now be able to see the Hashicorp Vault Grafana Dashboard
Monitoring Nomad
https://developer.hashicorp.com/nomad/docs/configuration/telemetry
https://developer.hashicorp.com/nomad/docs/configuration/telemetry#prometheus
https://developer.hashicorp.com/nomad/docs/operations/monitoring-nomad
https://developer.hashicorp.com/nomad/tutorials/manage-clusters/prometheus-metrics
In nomad/nomad.sh we enabled Telemetry in the Nomad config file see nomad/nomad.sh
# https://developer.hashicorp.com/nomad/docs/configuration/telemetry
# https://developer.hashicorp.com/nomad/docs/configuration/telemetry#prometheus
# https://developer.hashicorp.com/nomad/docs/operations/monitoring-nomad
# https://developer.hashicorp.com/nomad/tutorials/manage-clusters/prometheus-metrics
telemetry {
collection_interval = "1s"
disable_hostname = true
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
When we install Prometheus with Helm we set a prometheus-values.yaml file that specify an extraScrapeConfigs
You guessed it! Nomad...
helm install prometheus prometheus-community/prometheus -f /vagrant/prometheus-grafana/prometheus-values.yaml
extraScrapeConfigs: |
- job_name: vault
metrics_path: /v1/sys/metrics
params:
format: ['prometheus']
scheme: http
bearer_token: "VAULT_TOKEN"
static_configs:
- targets: ['10.9.99.10:8200']
- job_name: consul
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: '/v1/agent/metrics'
scheme: http
params:
format: ['prometheus']
static_configs:
- targets: ['10.9.99.10:8500']
- job_name: nomad
consul_sd_configs:
- server: '10.9.99.10:8500'
services: ['nomad-client', 'nomad']
relabel_configs:
- source_labels: ['__meta_consul_tags']
regex: '(.*)http(.*)'
action: keep
scrape_interval: 5s
metrics_path: /v1/metrics
params:
format: ['prometheus']
- job_name: 'docker'
static_configs:
- targets: ['10.9.99.10:9323']
You should now see the Nomad target in Prometheus web interface at http://localhost:9090/targets
We now need to Grafana Datasource of Type Prometheus based on this Target
Please navigate to http://localhost:3000/connections/your-connections/datasources
And add a Nomad Datasource
Name: Nomad URL: http://10.9.99.10:9090
Now, let's import the Nomad Grafana Dashboard, to do that, click on the top right + and select Import Dashboard
ref: https://grafana.com/grafana/dashboards/12787-nomad-jobs/
Enter 12787
and click on Load
Navigating to Grafana -> Dashboards you should now be able to see the Hashicorp Nomad Grafana Dashboard
Monitoring Consul
https://lvinsf.medium.com/monitor-consul-using-prometheus-and-grafana-1f2354cc002f
https://grafana.com/grafana/dashboards/13396-consul-server-monitoring/
https://developer.hashicorp.com/consul/docs/agent/telemetry
In consul/consul.sh we enabled Telemetry in the Consul config file see consul/consul.sh
# https://lvinsf.medium.com/monitor-consul-using-prometheus-and-grafana-1f2354cc002f
# https://grafana.com/grafana/dashboards/13396-consul-server-monitoring/
# https://developer.hashicorp.com/consul/docs/agent/telemetry
telemetry {
prometheus_retention_time = "24h"
disable_hostname = true
}
Now, let's import the Consul Grafana Dashboard, to do that, click on the top right + and select Import Dashboard
ref: https://grafana.com/grafana/dashboards/10642-consul/
Enter 10642
and click on Load
Navigating to Grafana -> Dashboards you should now be able to see the Hashicorp Consul Grafana Dashboard
Monitoring Docker
https://docs.docker.com/config/daemon/prometheus/
In docker/docker.sh we enabled Telemetry in the Docker config file see docker/docker.sh
# https://docs.docker.com/config/daemon/prometheus/
sudo echo '{
"metrics-addr": "0.0.0.0:9323",
"experimental": true,
"storage-driver": "overlay2",
"insecure-registries": ["10.9.99.10:5001", "10.9.99.10:5002", "localhost:5001", "localhost:5002"]
}
' >/etc/docker/daemon.json
Now, let's import the Docker Grafana Dashboard, to do that, click on the top right + and select Import Dashboard
ref: https://grafana.com/grafana/dashboards/10619-docker-host-container-overview/
Enter 10619
and click on Load
Navigating to Grafana -> Dashboards you should now be able to see the Docker Grafana Dashboard
Prometheus Grafana Provisioner
#!/bin/bash
# https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
# https://prometheus.io/docs/visualization/grafana/#using
# https://blog.marcnuri.com/prometheus-grafana-setup-minikube
cd ~/
# Determine CPU Architecture
arch=$(lscpu | grep "Architecture" | awk '{print $NF}')
if [[ $arch == x86_64* ]]; then
ARCH="amd64"
elif [[ $arch == aarch64 ]]; then
ARCH="arm64"
fi
echo -e '\e[38;5;198m'"CPU is $ARCH"
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Cleanup"
echo -e '\e[38;5;198m'"++++ "
sudo docker stop grafana prometheus
sudo docker rm grafana prometheus
yes | sudo docker system prune -a
yes | sudo docker system prune --volumes
for i in $(ps aux | grep kubectl | grep -ve sudo -ve grep -ve bin | grep -e grafana -e prometheus -e alertmanager | tr -s " " | cut -d " " -f2); do kill -9 $i; done
sudo --preserve-env=PATH -u vagrant helm list
sudo --preserve-env=PATH -u vagrant helm uninstall prometheus
sudo --preserve-env=PATH -u vagrant helm uninstall grafana
sudo --preserve-env=PATH -u vagrant helm list
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm version"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm version
# https://helm.sh/docs/intro/quickstart/#initialize-a-helm-chart-repository
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Helm add Prometheus repo"
echo -e '\e[38;5;198m'"++++ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm repo update"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm repo update
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm search repo prometheus-community"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm search repo prometheus-community
# https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Set Vault token in prometheus-values.yaml for prometheus for monitoring Vault"
echo -e '\e[38;5;198m'"++++ "
export VAULT_TOKEN=$(grep 'Initial Root Token' /etc/vault/init.file | cut -d ':' -f2 | tr -d ' ')
sed -i "s/bearer_token: .*/bearer_token: \"$VAULT_TOKEN\"/g" /vagrant/prometheus-grafana/prometheus-values.yaml
cat /vagrant/prometheus-grafana/prometheus-values.yaml
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm install prometheus prometheus-community/prometheus"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm install prometheus prometheus-community/prometheus -f /vagrant/prometheus-grafana/prometheus-values.yaml
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Helm add Grafana repo"
echo -e '\e[38;5;198m'"++++ helm repo add grafana https://grafana.github.io/helm-charts"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm repo add grafana https://grafana.github.io/helm-charts
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm repo update"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm repo update
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm search repo grafana"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm search repo grafana
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm install grafana grafana/grafana"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm install grafana grafana/grafana -f /vagrant/prometheus-grafana/grafana-values.yaml
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Waiting for Prometheus and Alertmanager and Grafana to become available.."
echo -e '\e[38;5;198m'"++++ See Minikube Dashboard for details: http://localhost:10888"
echo -e '\e[38;5;198m'"++++ "
attempts=0
max_attempts=15
while ! ( sudo --preserve-env=PATH -u vagrant kubectl get po | grep prometheus | tr -s " " | cut -d " " -f3 | grep Running ) && (( $attempts < $max_attempts )); do
attempts=$((attempts+1))
sleep 60;
echo -e '\e[38;5;198m'"++++ Waiting for Prometheus to become available, (${attempts}/${max_attempts}) sleep 60s"
sudo --preserve-env=PATH -u vagrant kubectl get po
sudo --preserve-env=PATH -u vagrant kubectl get events | grep -e Memory -e OOM
done
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Waiting for Prometheus to stabalize, sleep 30s"
echo -e '\e[38;5;198m'"++++ "
sleep 30;
sudo --preserve-env=PATH -u vagrant kubectl get po
sudo --preserve-env=PATH -u vagrant kubectl get events | grep -e Memory -e OOM
# https://gitlab.com/gitlab-org/charts/gitlab/-/issues/2572 Error 422
echo -e '\e[38;5;198m'"The easiest way to access this service is to let kubectl to forward the port:"
echo -e '\e[38;5;198m'"kubectl port-forward service/prometheus-server 9090:9090"
# https://stackoverflow.com/questions/67084554/how-to-kubectl-port-forward-gitlab-webservice
# https://github.com/kubernetes/kubernetes/issues/44371
attempts=0
max_attempts=20
while ! ( sudo netstat -nlp | grep 9090 ) && (( $attempts < $max_attempts )); do
attempts=$((attempts+1))
sleep 60;
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ kubectl port-forward -n default prometheus-server 9090 --address=\"0.0.0.0\", (${attempts}/${max_attempts}) sleep 60s"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant kubectl port-forward --namespace default $(sudo --preserve-env=PATH -u vagrant kubectl get po -n default | grep prometheus-server | tr -s " " | cut -d " " -f1) 9090 --address="0.0.0.0" > /dev/null 2>&1 &
done
attempts=0
max_attempts=20
while ! ( sudo netstat -nlp | grep 9093 ) && (( $attempts < $max_attempts )); do
attempts=$((attempts+1))
sleep 60;
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ kubectl port-forward -n default prometheus-alertmanager 9093 --address=\"0.0.0.0\", (${attempts}/${max_attempts}) sleep 60s"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant kubectl port-forward --namespace default $(sudo --preserve-env=PATH -u vagrant kubectl get po -n default | grep prometheus-alertmanager | tr -s " " | cut -d " " -f1) 9093 --address="0.0.0.0" > /dev/null 2>&1 &
done
attempts=0
max_attempts=20
while ! ( sudo netstat -nlp | grep 3000 ) && (( $attempts < $max_attempts )); do
attempts=$((attempts+1))
sleep 60;
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ kubectl port-forward -n default grafana 3000 --address=\"0.0.0.0\", (${attempts}/${max_attempts}) sleep 60s"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant kubectl port-forward --namespace default $(sudo --preserve-env=PATH -u vagrant kubectl get po -n default | grep grafana | tr -s " " | cut -d " " -f1) 3000 --address="0.0.0.0" > /dev/null 2>&1 &
done
ps aux | grep kubectl | grep -ve sudo -ve grep -ve bin
# https://developer.hashicorp.com/vault/tutorials/monitoring/monitor-telemetry-grafana-prometheus
# https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Vault policy write prometheus-metrics path /sys/metrics"
echo -e '\e[38;5;198m'"++++ "
export VAULT_ADDR=http://127.0.0.1:8200
env | grep VAULT_ADDR
vault policy write prometheus-metrics - << EOF
path "/sys/metrics*" {
capabilities = ["read", "list"]
}
EOF
# https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Reset Vault token in prometheus-values.yaml"
echo -e '\e[38;5;198m'"++++ "
sed -i "s/bearer_token: .*/bearer_token: \"VAULT_TOKEN\"/g" /vagrant/prometheus-grafana/prometheus-values.yaml
# https://github.com/grafana/grafana/issues/29296
echo -e '\e[38;5;198m'"++++ Prometheus http://localhost:9090"
echo -e '\e[38;5;198m'"++++ Alertmanager http://localhost:9093"
echo -e '\e[38;5;198m'"++++ Grafana http://localhost:3000 and login with Username: admin Password:"
sudo --preserve-env=PATH -u vagrant kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
echo -e '\e[38;5;198m'"++++ You should now be able to access Prometheus http://localhost:9090 \
and Grafana http://localhost:3000 Please login to Grafana and add Prometheus (http://10.9.99.10:9090) as a Datasource, next \
please click on the + symbol in Grafana and import 6417 dashboard."