HashiQube - DevOps Lab
Youtube Channel Medium Posts Riaan Nolan Linkedin Riaan Nolan Hashicorp Ambassador

.

Prometheus and Grafana

In this HashiQube DevOps lab you will get hands on experience with Grafana and Prometheus.

We need a monitoring and alerting solution. For this we have chosen to use Prometheus and Grafana

Grafana

https://grafana.com/ Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.

Grafana

Prometheus

Prometheus is an open source monitoring system for which Grafana provides out-of-the-box support. This topic walks you through the steps to create a series of dashboards in Grafana to display system metrics for a server monitored by Prometheus.

Prometheus

Provision

In order to provision Prometheus and Grafana, you need bastetools, docker, minikube as dependencies.

bulb We enable Vault, Consul and Nomad, because we monitor these with Prometheus and we enable Minikube because we host Grafana and Prometheus on Minikkube and deploy it using Helm

Provision

Open in GitHub Codespaces

bash docker/docker.sh
bash vault/vault.sh
bash consul/consul.sh
bash nomad/nomad.sh
bash minikube/minikube.sh
bash prometheus-grafana/prometheus-grafana.sh
vagrant up --provision-with basetools,docker,docsify,vault,consul,nomad,minikube,prometheus-grafana
docker compose exec hashiqube /bin/bash
bash hashiqube/basetools.sh
bash docker/docker.sh
bash docsify/docsify.sh
bash vault/vault.sh
bash consul/consul.sh
bash nomad/nomad.sh
bash minikube/minikube.sh
bash prometheus-grafana/prometheus-grafana.sh

Prometheus http://localhost:9090
Alertmanager http://localhost:9093
Grafana http://localhost:3000 and login with Username: admin Password: Password displayed in the Terminal

Look at Minikube dashboard for progress update and the terminal output.

...
hashiqube0.service.consul: ++++ Waiting for Prometheus to stabalize, sleep 30s
hashiqube0.service.consul: NAME                                            READY   STATUS    RESTARTS   AGE
hashiqube0.service.consul: grafana-557fc9455c-67h4s                        1/1     Running   0          90s
hashiqube0.service.consul: hello-minikube-7bc9d7884c-fks85                 1/1     Running   0          3m36s
hashiqube0.service.consul: prometheus-alertmanager-76b7444fc5-8b2sq        2/2     Running   0          100s
hashiqube0.service.consul: prometheus-kube-state-metrics-748fc7f64-hxcvj   1/1     Running   0          100s
hashiqube0.service.consul: prometheus-node-exporter-xm6fw                  1/1     Running   0          100s
hashiqube0.service.consul: prometheus-pushgateway-5f478b75f7-j9tpj         1/1     Running   0          100s
hashiqube0.service.consul: prometheus-server-8c96d4966-bv24c               1/2     Running   0          100s
hashiqube0.service.consul: 5m23s       Warning   SystemOOM                                          node/minikube                                        System OOM encountered, victim process: prometheus, pid: 2375725
hashiqube0.service.consul: 5m23s       Warning   SystemOOM                                          node/minikube                                        System OOM encountered, victim process: prometheus, pid: 2385107
hashiqube0.service.consul: 5m23s       Warning   SystemOOM                                          node/minikube                                        System OOM encountered, victim process: prometheus, pid: 2394543
hashiqube0.service.consul: 5m22s       Normal    NodeHasSufficientMemory                            node/minikube                                        Node minikube status is now: NodeHasSufficientMemory
hashiqube0.service.consul: The easiest way to access this service is to let kubectl to forward the port:
hashiqube0.service.consul: kubectl port-forward service/prometheus-server-np 9090:9090
hashiqube0.service.consul: vagrant  1198180  0.4  0.3 751844 42116 ?        Sl   22:25   0:01 kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 10888:80 --address=0.0.0.0
hashiqube0.service.consul: vagrant  1198888  0.2  0.3 751588 41420 ?        Sl   22:26   0:00 kubectl port-forward -n default service/hello-minikube 18888:8080 --address=0.0.0.0
hashiqube0.service.consul: ++++ Prometheus http://localhost:9090
hashiqube0.service.consul: ++++ Grafana http://localhost:3000 and login with Username: admin Password:
hashiqube0.service.consul: N6as3Odq7bprqVdvWV5iFmwhOLs8QvutCJb8f2lS
hashiqube0.service.consul: ++++ You should now be able to access Prometheus http://localhost:9090 and Grafana http://localhost:3000 Please login to Grafana and add Prometheus as a Datasource, next please click on the + symbol in Grafana and import 6417 dashboard.

Minikube Dashboard Pods

You can also open Prometheus web interface and look at Status -> Targets

Prometheus Targets

Grafana Datasource

bulb We have done this automatically during the provisioning step, in the grafana-values.yaml file see below

plugins:
  - digrich-bubblechart-panel
  - grafana-clock-panel
  - grafana-piechart-panel

datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://10.9.99.10:9090

To use Prometheus as a Datasource in Grafana, we need to add it so let's do that now, please head over to Grafana on http://localhost:3000 and login with user: admin and the password: TOKEN_IN_TERMINAL_OUTPUT

Grafana Login

Click on Configuration -> Datasources

bulb We have done this automatically during the provisioning step

Click add Data sources Select Prometheus and enter the URL of Prometheus, in this case we will use http://10.9.99.10:9090

Grafana Configuration Datasources

Lastly we can import a dashboard, lick on the + in the left menue and select Import now enter 6417 and click import

and you should be able to see some graphs.

Grafana Dashboard Kubernetes Cluster (Prometheus)

Monitoring Vault

https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
https://developer.hashicorp.com/vault/docs/configuration/telemetry

In vault/vault.sh we enabled Telemetry in the Vault config file see vault/vault.sh

# https://developer.hashicorp.com/vault/docs/configuration/telemetry
# https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
telemetry {
  disable_hostname = true
  prometheus_retention_time = "12h"
}

When we install Prometheus with Helm we set a prometheus-values.yaml file that specify an extraScrapeConfigs You guessed it! Vault...

helm install prometheus prometheus-community/prometheus -f /vagrant/prometheus-grafana/prometheus-values.yaml

extraScrapeConfigs: |
  - job_name: vault
    metrics_path: /v1/sys/metrics
    params:
      format: ['prometheus']
    scheme: http
    bearer_token: "VAULT_TOKEN"
    static_configs:
    - targets: ['10.9.99.10:8200']
  - job_name: consul
    honor_timestamps: true
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: '/v1/agent/metrics'
    scheme: http
    params:
      format: ['prometheus']
    static_configs:
    - targets: ['10.9.99.10:8500']
  - job_name: nomad
    consul_sd_configs:
    - server: '10.9.99.10:8500'
      services: ['nomad-client', 'nomad']
    relabel_configs:
    - source_labels: ['__meta_consul_tags']
      regex: '(.*)http(.*)'
      action: keep
    scrape_interval: 5s
    metrics_path: /v1/metrics
    params:
      format: ['prometheus']
  - job_name: 'docker'
    static_configs:
    - targets: ['10.9.99.10:9323']

You should now see the Vault target in Prometheus web interface at http://localhost:9090/targets

Prometheus Vault Target

Grafana Datasource

bulb We have done this automatically during the provisioning step, in the grafana-values.yaml file see below

plugins:
  - digrich-bubblechart-panel
  - grafana-clock-panel
  - grafana-piechart-panel

datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://10.9.99.10:9090

We now need to add a Grafana Datasource of Type Prometheus based on these Targets

Please navigate to http://localhost:3000/connections/your-connections/datasources

Name: Prometheus
URL: http://10.9.99.10:9090

Now, let's import the Vault Grafana Dashboard, to do that, click on the top right + and select Import Dashboard ref: https://grafana.com/grafana/dashboards/12904-hashicorp-vault/

Enter 12904 and click on Load

Navigating to Grafana -> Dashboards you should now be able to see the Hashicorp Vault Grafana Dashboard

Grafana Hashicorp Vault Dashboard

Monitoring Nomad

https://developer.hashicorp.com/nomad/docs/configuration/telemetry
https://developer.hashicorp.com/nomad/docs/configuration/telemetry#prometheus
https://developer.hashicorp.com/nomad/docs/operations/monitoring-nomad
https://developer.hashicorp.com/nomad/tutorials/manage-clusters/prometheus-metrics

In nomad/nomad.sh we enabled Telemetry in the Nomad config file see nomad/nomad.sh

# https://developer.hashicorp.com/nomad/docs/configuration/telemetry
# https://developer.hashicorp.com/nomad/docs/configuration/telemetry#prometheus
# https://developer.hashicorp.com/nomad/docs/operations/monitoring-nomad
# https://developer.hashicorp.com/nomad/tutorials/manage-clusters/prometheus-metrics
telemetry {
  collection_interval = "1s"
  disable_hostname = true
  prometheus_metrics = true
  publish_allocation_metrics = true
  publish_node_metrics = true
}

When we install Prometheus with Helm we set a prometheus-values.yaml file that specify an extraScrapeConfigs You guessed it! Nomad...

helm install prometheus prometheus-community/prometheus -f /vagrant/prometheus-grafana/prometheus-values.yaml

extraScrapeConfigs: |
  - job_name: vault
    metrics_path: /v1/sys/metrics
    params:
      format: ['prometheus']
    scheme: http
    bearer_token: "VAULT_TOKEN"
    static_configs:
    - targets: ['10.9.99.10:8200']
  - job_name: consul
    honor_timestamps: true
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: '/v1/agent/metrics'
    scheme: http
    params:
      format: ['prometheus']
    static_configs:
    - targets: ['10.9.99.10:8500']
  - job_name: nomad
    consul_sd_configs:
    - server: '10.9.99.10:8500'
      services: ['nomad-client', 'nomad']
    relabel_configs:
    - source_labels: ['__meta_consul_tags']
      regex: '(.*)http(.*)'
      action: keep
    scrape_interval: 5s
    metrics_path: /v1/metrics
    params:
      format: ['prometheus']
  - job_name: 'docker'
    static_configs:
    - targets: ['10.9.99.10:9323']

You should now see the Nomad target in Prometheus web interface at http://localhost:9090/targets

Prometheus Nomad Target

We now need to Grafana Datasource of Type Prometheus based on this Target

Please navigate to http://localhost:3000/connections/your-connections/datasources

And add a Nomad Datasource

Name: Nomad URL: http://10.9.99.10:9090

Now, let's import the Nomad Grafana Dashboard, to do that, click on the top right + and select Import Dashboard ref: https://grafana.com/grafana/dashboards/12787-nomad-jobs/

Enter 12787 and click on Load

Navigating to Grafana -> Dashboards you should now be able to see the Hashicorp Nomad Grafana Dashboard

Grafana Hashicorp Nomad Dashboard

Monitoring Consul

https://lvinsf.medium.com/monitor-consul-using-prometheus-and-grafana-1f2354cc002f
https://grafana.com/grafana/dashboards/13396-consul-server-monitoring/
https://developer.hashicorp.com/consul/docs/agent/telemetry

In consul/consul.sh we enabled Telemetry in the Consul config file see consul/consul.sh

# https://lvinsf.medium.com/monitor-consul-using-prometheus-and-grafana-1f2354cc002f
# https://grafana.com/grafana/dashboards/13396-consul-server-monitoring/
# https://developer.hashicorp.com/consul/docs/agent/telemetry
telemetry {
  prometheus_retention_time = "24h"
  disable_hostname = true
}

Now, let's import the Consul Grafana Dashboard, to do that, click on the top right + and select Import Dashboard ref: https://grafana.com/grafana/dashboards/10642-consul/

Enter 10642 and click on Load

Navigating to Grafana -> Dashboards you should now be able to see the Hashicorp Consul Grafana Dashboard

Grafana Hashicorp Consul Dashboard

Monitoring Docker

https://docs.docker.com/config/daemon/prometheus/

In docker/docker.sh we enabled Telemetry in the Docker config file see docker/docker.sh

# https://docs.docker.com/config/daemon/prometheus/
sudo echo '{
  "metrics-addr": "0.0.0.0:9323",
  "experimental": true,
  "storage-driver": "overlay2",
  "insecure-registries": ["10.9.99.10:5001", "10.9.99.10:5002", "localhost:5001", "localhost:5002"]
}
' >/etc/docker/daemon.json

Now, let's import the Docker Grafana Dashboard, to do that, click on the top right + and select Import Dashboard ref: https://grafana.com/grafana/dashboards/10619-docker-host-container-overview/

Enter 10619 and click on Load

Navigating to Grafana -> Dashboards you should now be able to see the Docker Grafana Dashboard

Grafana Docker Dashboard

Prometheus Grafana Provisioner

#!/bin/bash

# https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
# https://prometheus.io/docs/visualization/grafana/#using
# https://blog.marcnuri.com/prometheus-grafana-setup-minikube

cd ~/
# Determine CPU Architecture
arch=$(lscpu | grep "Architecture" | awk '{print $NF}')
if [[ $arch == x86_64* ]]; then
  ARCH="amd64"
elif  [[ $arch == aarch64 ]]; then
  ARCH="arm64"
fi
echo -e '\e[38;5;198m'"CPU is $ARCH"

echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Cleanup"
echo -e '\e[38;5;198m'"++++ "
sudo docker stop grafana prometheus
sudo docker rm grafana prometheus
yes | sudo docker system prune -a
yes | sudo docker system prune --volumes
for i in $(ps aux | grep kubectl | grep -ve sudo -ve grep -ve bin | grep -e grafana -e prometheus -e alertmanager | tr -s " " | cut -d " " -f2); do kill -9 $i; done
sudo --preserve-env=PATH -u vagrant helm list
sudo --preserve-env=PATH -u vagrant helm uninstall prometheus
sudo --preserve-env=PATH -u vagrant helm uninstall grafana
sudo --preserve-env=PATH -u vagrant helm list

echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm version"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm version

# https://helm.sh/docs/intro/quickstart/#initialize-a-helm-chart-repository
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Helm add Prometheus repo"
echo -e '\e[38;5;198m'"++++ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm repo update"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm repo update
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm search repo prometheus-community"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm search repo prometheus-community

# https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Set Vault token in prometheus-values.yaml for prometheus for monitoring Vault"
echo -e '\e[38;5;198m'"++++ "
export VAULT_TOKEN=$(grep 'Initial Root Token' /etc/vault/init.file | cut -d ':' -f2 | tr -d ' ')
sed -i "s/bearer_token: .*/bearer_token: \"$VAULT_TOKEN\"/g" /vagrant/prometheus-grafana/prometheus-values.yaml
cat /vagrant/prometheus-grafana/prometheus-values.yaml

echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm install prometheus prometheus-community/prometheus"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm install prometheus prometheus-community/prometheus -f /vagrant/prometheus-grafana/prometheus-values.yaml

echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Helm add Grafana repo"
echo -e '\e[38;5;198m'"++++ helm repo add grafana https://grafana.github.io/helm-charts"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm repo add grafana https://grafana.github.io/helm-charts
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm repo update"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm repo update
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm search repo grafana"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm search repo grafana

echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ helm install grafana grafana/grafana"
echo -e '\e[38;5;198m'"++++ "
sudo --preserve-env=PATH -u vagrant helm install grafana grafana/grafana -f /vagrant/prometheus-grafana/grafana-values.yaml

echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Waiting for Prometheus and Alertmanager and Grafana to become available.."
echo -e '\e[38;5;198m'"++++ See Minikube Dashboard for details: http://localhost:10888"
echo -e '\e[38;5;198m'"++++ "

attempts=0
max_attempts=15
while ! ( sudo --preserve-env=PATH -u vagrant kubectl get po | grep prometheus | tr -s " " | cut -d " " -f3 | grep Running ) && (( $attempts < $max_attempts )); do
  attempts=$((attempts+1))
  sleep 60;
  echo -e '\e[38;5;198m'"++++ Waiting for Prometheus to become available, (${attempts}/${max_attempts}) sleep 60s"
  sudo --preserve-env=PATH -u vagrant kubectl get po
  sudo --preserve-env=PATH -u vagrant kubectl get events | grep -e Memory -e OOM
done

echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Waiting for Prometheus to stabalize, sleep 30s"
echo -e '\e[38;5;198m'"++++ "
sleep 30;
sudo --preserve-env=PATH -u vagrant kubectl get po
sudo --preserve-env=PATH -u vagrant kubectl get events | grep -e Memory -e OOM

# https://gitlab.com/gitlab-org/charts/gitlab/-/issues/2572 Error 422
echo -e '\e[38;5;198m'"The easiest way to access this service is to let kubectl to forward the port:"
echo -e '\e[38;5;198m'"kubectl port-forward service/prometheus-server 9090:9090"
# https://stackoverflow.com/questions/67084554/how-to-kubectl-port-forward-gitlab-webservice
# https://github.com/kubernetes/kubernetes/issues/44371

attempts=0
max_attempts=20
while ! ( sudo netstat -nlp | grep 9090 ) && (( $attempts < $max_attempts )); do
  attempts=$((attempts+1))
  sleep 60;
  echo -e '\e[38;5;198m'"++++ "
  echo -e '\e[38;5;198m'"++++ kubectl port-forward -n default prometheus-server 9090 --address=\"0.0.0.0\", (${attempts}/${max_attempts}) sleep 60s"
  echo -e '\e[38;5;198m'"++++ "
  sudo --preserve-env=PATH -u vagrant kubectl port-forward --namespace default $(sudo --preserve-env=PATH -u vagrant kubectl get po -n default | grep prometheus-server | tr -s " " | cut -d " " -f1) 9090 --address="0.0.0.0" > /dev/null 2>&1 &
done

attempts=0
max_attempts=20
while ! ( sudo netstat -nlp | grep 9093 ) && (( $attempts < $max_attempts )); do
  attempts=$((attempts+1))
  sleep 60;
  echo -e '\e[38;5;198m'"++++ "
  echo -e '\e[38;5;198m'"++++ kubectl port-forward -n default prometheus-alertmanager 9093 --address=\"0.0.0.0\", (${attempts}/${max_attempts}) sleep 60s"
  echo -e '\e[38;5;198m'"++++ "
  sudo --preserve-env=PATH -u vagrant kubectl port-forward --namespace default $(sudo --preserve-env=PATH -u vagrant kubectl get po -n default | grep prometheus-alertmanager | tr -s " " | cut -d " " -f1) 9093 --address="0.0.0.0" > /dev/null 2>&1 &
done

attempts=0
max_attempts=20
while ! ( sudo netstat -nlp | grep 3000 ) && (( $attempts < $max_attempts )); do
  attempts=$((attempts+1))
  sleep 60;
  echo -e '\e[38;5;198m'"++++ "
  echo -e '\e[38;5;198m'"++++ kubectl port-forward -n default grafana 3000 --address=\"0.0.0.0\", (${attempts}/${max_attempts}) sleep 60s"
  echo -e '\e[38;5;198m'"++++ "
  sudo --preserve-env=PATH -u vagrant kubectl port-forward --namespace default $(sudo --preserve-env=PATH -u vagrant kubectl get po -n default | grep grafana | tr -s " " | cut -d " " -f1) 3000 --address="0.0.0.0" > /dev/null 2>&1 &
done

ps aux | grep kubectl | grep -ve sudo -ve grep -ve bin

# https://developer.hashicorp.com/vault/tutorials/monitoring/monitor-telemetry-grafana-prometheus
# https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Vault policy write prometheus-metrics path /sys/metrics"
echo -e '\e[38;5;198m'"++++ "
export VAULT_ADDR=http://127.0.0.1:8200
env | grep VAULT_ADDR
vault policy write prometheus-metrics - << EOF
path "/sys/metrics*" {
  capabilities = ["read", "list"]
}
EOF

# https://developer.hashicorp.com/vault/docs/configuration/telemetry#prometheus
echo -e '\e[38;5;198m'"++++ "
echo -e '\e[38;5;198m'"++++ Reset Vault token in prometheus-values.yaml"
echo -e '\e[38;5;198m'"++++ "
sed -i "s/bearer_token: .*/bearer_token: \"VAULT_TOKEN\"/g" /vagrant/prometheus-grafana/prometheus-values.yaml

# https://github.com/grafana/grafana/issues/29296
echo -e '\e[38;5;198m'"++++ Prometheus http://localhost:9090"
echo -e '\e[38;5;198m'"++++ Alertmanager http://localhost:9093"
echo -e '\e[38;5;198m'"++++ Grafana http://localhost:3000 and login with Username: admin Password:"
sudo --preserve-env=PATH -u vagrant kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
echo -e '\e[38;5;198m'"++++ You should now be able to access Prometheus http://localhost:9090 \
and Grafana http://localhost:3000 Please login to Grafana and add Prometheus (http://10.9.99.10:9090) as a Datasource, next \
please click on the + symbol in Grafana and import 6417 dashboard."