11 minute read

All pages

Name Summary
Monitoring with Prometheus, Loki, Grafana and Kubernetes. Part 1. Kubernetes cluster Here is about basic configuration of Kubernetes monitoring cluster
Monitoring with Prometheus, Loki, Grafana and Kubernetes. Part 2. SNMP Here is about SNMP O_O
Monitoring with Prometheus, Loki, Grafana and Kubernetes. Part 3. GitLab Agent How to connect a Kubernetes cluster to GitLab
Monitoring with Prometheus, Loki, Grafana and Kubernetes. Part 4. Prometheus exporters Exporting Prometheus metrics
Monitoring with Prometheus, Loki, Grafana and Kubernetes. Part 5. kube-prometheus-stack Migration to kube-prometheus-stack

There is no specific purpose here, it is about service monitoring inside and outside the Kubernetes cluster. And in this part about basic configuration of Kubernetes monitoring cluster.

Kubernetes cluster

Kubernetes is a solution for creating a development, testing and production environment. I will build a kubernetes cluster on a desktop and use a minikube to create a Node. I’ll call it the development environment.

Why kubernetes?

Why Kubernetes? Container scaling? Yes, but for my simple cluster, it’s the convenience of configuration management and automation of updates.

Prometheus, Loki, Grafana integration into Kubernetes cluster

All configuration files can be taken from the repository.

Before creating Kubernetes objects for each image, you need to decide which data should be stored regardless of the Pod state. For Prometheus, there are prometheus.yml, prometheus_rules.yml configuration files and prometheus TSDB. For Grafana, there are prometheus.yaml, loki.yaml grafana datasources configuration files and grafana storage with grafana.db database (sqlite3 by default), alerting, plugins, csv, file-collections, png directories. For Loki, there are loki.yaml, loki_rules.yml loki configuration files and loki storage.

For configuration files in kubernetes there is a ConfigMap, for databases - Persistent Volumes

I will describe resources using classic Kubernetes manifests. I’m not gonna use Helm charts and Kubernetes Operators (for example, prometheus-operator) this time.

ConfigMap

I use ConfigMap to take the configuration outside of the image container and store it as a separate kubernetes object. At the same time, to update the configuration, it is enough to update the yaml file and restart the Pod in prometheus Deployment.

And here are my ConfigMap objects with comments:

Prometheus ConfigMap:

# prometheus-config-map.yaml
apiVersion: v1
# type of object 
kind: ConfigMap
metadata:
  # ConfigMap name
  # used in the Deployment object template
  name: prometheus-config
data:
  # there are no scrape targets and alerting rules
  prometheus.yml: |-
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    rule_files:
      - /etc/prometheus/prometheus_rules.yml
  prometheus_rules.yml: |-

Loki ConfigMap:

# loki-config-map.yaml
# loki parameters - https://grafana.com/docs/loki/latest/configuration/
apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-config
data:
  loki.yaml: |-
    auth_enabled: false

    server:
      http_listen_port: 3100

    common:
      path_prefix: /loki
      storage:
        filesystem:
          chunks_directory: /loki/chunks
          rules_directory: /loki/rules
      replication_factor: 1
      ring:
        kvstore:
          store: inmemory

    schema_config:
      configs:
        - from: 2023-02-15
          store: boltdb-shipper
          object_store: filesystem
          schema: v11
          index:
            prefix: index_
            period: 24h

    ruler:
      storage:
        type: local
        local:
          directory: /etc/loki/rules
      rule_path: /tmp/loki/rules-temp
      # there is no alertmanager in the cluster yet
      alertmanager_url: http://localhost:9093
      ring:
        kvstore:
          store: inmemory
      enable_api: true

    # see 'Grafana dashboard shows "too many outstanding requests" 
    # after upgrade to v2.4.2' issue
    # https://github.com/grafana/loki/issues/5123
    # and from docs:
    # https://grafana.com/docs/loki/latest/configuration/#querier
    # https://grafana.com/docs/loki/latest/configuration/#query_scheduler
    querier:
      max_concurrent: 2048
    query_scheduler:
      max_outstanding_requests_per_tenant: 2048
# loki-rules-config-map.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-rules-config
data:
  # there are no alerting rules
  loki_rules.yml: |-

Grafana ConfigMap:

# grafana-config-datasources-map.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-config-datasources
  labels:
    version: "1"
data:
  # 2 datasources: prometheus and loki
  prometheus.yaml: |-
    {
        "apiVersion": 1,
        "datasources": [
            {
               "access":"proxy",
                "editable": true,
                "name": "prometheus",
                "orgId": 1,
                "type": "prometheus",
                "url": "http://prometheus-cluster-ip-service.default.svc.cluster.local:9090/prometheus/",
                "version": 1
            }
        ]
    }
  loki.yaml: |-
    {
        "apiVersion": 1,
        "datasources": [
            {
               "access":"proxy",
                "editable": true,
                "name": "loki",
                "orgId": 1,
                "type": "loki",
                "url": "http://loki-cluster-ip-service.default.svc.cluster.local:3100/",
                "version": 1
            }
        ]
    }

PVC (Persistent Volume Claim)

If docker Volumes allows to mount a file system (directory on disk or in another container) from host machine into a container, then for kubernetes everything is somewhat different. There are Volumes (this is a Pod directory mounted inside each container from this Pod. Volume lifetime = Pod lifetime), Persistent Volumes or PV (this is a piece of storage in cluster, that have a lifecycle independent of any individual Pod that uses the PV) and Persistent Volume Claim or PVC (this is a request for PV storage with varying properties, size or access mode, as instance).

In my case, the PVC will use the default storageclass and the provisioner to get a local filesystem resource:

root@devbox:~$ kubectl get storageclass
NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
standard (default)   k8s.io/minikube-hostpath   Delete          Immediate           false                  127d

And here are my PVC objects with comments:

Prometheus PVC:

# prometheus-persistent-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  # PVC name
  # use in template of Deployment object
  name: prometheus-persistent-volume-claim
spec:
  # see Access Modes - 
  # https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
  # 1Gi is just for testing in a small desktop cluster
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Loki PVC:

# loki-persistent-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: loki-persistent-volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi 

Grafana PVC:

# grafana-persistent-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-persistent-volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi 

Deployment

Deployment is an object that can contain either one or a group of Pods. Pod is an object that can contain either one or a group of containers.

Why Deployment and not a Pod?

  • The Deployment object allows you to flexibly change the fields in the configuration file, unlike the Pod object. For example, for an already created Pod, it is not possible to update ports.containerPort, this will show an error The Pod "pod-name" is invalid: spec: Forbidden: pod updates may not change fields other than 'spec.containers[*].image', 'spec.initContainers[*].image', ...
  • Deployment monitors the status of each Pod. If the iPod is broken, restarts it
  • Deployment is suitable for both development and production

And here are my Deployment objects with comments:

Prometheus Deployment:

# prometheus-deployment.yaml
apiVersion: apps/v1
# type of object
kind: Deployment
metadata:
  # Deployment object name
  name: prometheus-deployment
spec:
  # number of Pods created based on the template below
  replicas: 1
  selector:
    matchLabels:
      component: prometheus
  # configuration of each replica (each Pod) that will be created inside the Deployment object
  template:
    metadata:
      labels:
      # label is used to identify the Pod in the Deployment object,
      # for example, to communicate with the Service ClusterIP object
      # the label can be arbitrary
      # there can be as many labels as you want
        component: prometheus
    spec:
      volumes:
      # define the available volumes
      # for PVC and ConfigMap objects
        - name: prometheus-config-volume
          configMap:
            name: prometheus-config
        - name: prometheus-persistent-volume-claim
          persistentVolumeClaim:
            claimName: prometheus-persistent-volume-claim
      containers:
        - name: prometheus
          image: prom/prometheus
          # arguments when starting a container with Prometheus
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
            - "--web.external-url=https://localhost:9090/prometheus/"
          ports:
            # an opened port in a container with Prometheus
            - containerPort: 9090
          volumeMounts:
              # mounting point of ConfigMap configuration files
              # inside a Prometheus container
            - name: prometheus-config-volume
              mountPath: /etc/prometheus/
              # PVC mounting point inside a Prometheus container
            - name: prometheus-persistent-volume-claim
              mountPath: /prometheus/

Loki Deployment:

# loki-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: loki-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      component: loki
  template:
    metadata:
      labels:
        component: loki
    spec:
      volumes:
        - name: loki-config-volume
          configMap:
            name: loki-config
        - name: loki-rules-config-volume
          configMap:
            name: loki-rules-config
        - name: loki-persistent-volume-claim
          persistentVolumeClaim:
            claimName: loki-persistent-volume-claim
      containers:
        - name: loki
          image: grafana/loki
          args:
            - "--config.file=/etc/loki/loki.yaml"
          ports:
            - containerPort: 3100
          volumeMounts:
            - name: loki-config-volume
              mountPath: /etc/loki/
            - name: loki-rules-config-volume
              mountPath: /etc/loki/rules/fake
            - name: loki-persistent-volume-claim
              mountPath: /loki/

Grafana Deployment:

# grafana-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      component: grafana
  template:
    metadata:
      labels:
        component: grafana
    spec:
      volumes:
        - name: grafana-config-datasources-volume
          configMap:
            name: grafana-config-datasources
        - name: grafana-persistent-volume-claim
          persistentVolumeClaim:
            claimName: grafana-persistent-volume-claim
      containers:
        - name: grafana
          image: grafana/grafana
          ports:
            - containerPort: 3000
          volumeMounts:
            - name: grafana-config-datasources-volume
              mountPath: /etc/grafana/provisioning/datasources
              readOnly: false
            - name: grafana-persistent-volume-claim
              mountPath: /var/lib/grafana

ClusterIP Service

Why use Services? The IP address of the Pod object is assigned automatically. If you delete the Pod and create it again, the address may change. The Service object with the coredns solves this problem and allows you to access any Pod by the name of the Service object.

What about ClusterIP and NodePort objects? The NodePort object is used to forward ports to the Pod for access both outside (outside the cluster) and inside the cluster. This object is not usually used in production clusters. The ClusterIP object is used to link the Deployment object with other objects in the cluster. First of all, it allows to transfer incoming traffic to the Pods inside the Deployment object. If incoming traffic is not needed, you can only use Deployment object without Service NodePort or ClusterIP objects.

And here are my ClusterIP objects with comments:

Prometheus ClusterIP:

# prometheus-cluster-ip-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: prometheus-cluster-ip-service
spec:
  type: ClusterIP
  selector:
    # label for communication with the Deployment object
    component: prometheus
  ports:
    # port inside the container
    - port: 9090
    # port outside the container
      targetPort: 9090

Loki ClusterIP:

# loki-cluster-ip-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: loki-cluster-ip-service
spec:
  type: ClusterIP
  selector:
    component: loki
  ports:
    - port: 3100
      targetPort: 3100

Grafana ClusterIP:

# grafana-cluster-ip-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: grafana-cluster-ip-service
spec:
  type: ClusterIP
  selector:
    component: grafana
  ports:
    - port: 3000
      targetPort: 3000

Ingress

I’ll use github.com/kubernetes/ingress-nginx - community led project as an entry point to the cluster. Probably, it can be called best practice. If the number of Pod replicas in the Deployment object is > 1, ingress-nginx will provide traffic balancing between them. My case is 1 Pod replica in each Deployment, so the main task of ingress here will be routing of user traffic.

# ingress-service.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-service
  annotations:
    kubernetes.io/ingress.class: 'nginx'
    nginx.ingress.kubernetes.io/use-regex: 'true'
spec:
  rules:
    - http:
        paths:
          # link with /prometheus will be addressed to prometheus ClusterIP
          # for example, http://minikube_ip/prometheus/graph
          - path: /(prometheus.*)
            pathType: Prefix
            backend:
              service:
                name: prometheus-cluster-ip-service
                port:
                  number: 9090
          # link with /loki will be addressed to loki ClusterIP
          # for example, http://minikube_ip/loki/api/v1/status/buildinfo
          - path: /(loki.*)
            pathType: Prefix
            backend:
              service:
                name: loki-cluster-ip-service
                port:
                  number: 3100
          # any other links will be addressed to grafana ClusterIP
          # for example, http://minikube_ip
          - path: /(.*)
            pathType: Prefix
            backend:
              service:
                name: grafana-cluster-ip-service
                port:
                  number: 3000

Apply and check Kubernetes cluster

Apply configuration

root@devbox:~/Projects/k8s$ ls
grafana  ingress-service.yaml  loki  prometheus

root@devbox:~/Projects/k8s$ kubectl apply -f prometheus
service/prometheus-cluster-ip-service created
configmap/prometheus-config created
deployment.apps/prometheus-deployment created
persistentvolumeclaim/prometheus-persistent-volume-claim created

root@devbox:~/Projects/k8s$ kubectl apply -f loki
service/loki-cluster-ip-service created
configmap/loki-config created
deployment.apps/loki-deployment created
persistentvolumeclaim/loki-persistent-volume-claim created
configmap/loki-rules-config created

root@devbox:~/Projects/k8s$ kubectl apply -f grafana
service/grafana-cluster-ip-service created
configmap/grafana-config-datasources created
deployment.apps/grafana-deployment configured
persistentvolumeclaim/grafana-persistent-volume-claim created

root@devbox:~/Projects/k8s$ kubectl apply -f .
ingress.networking.k8s.io/ingress-service created

Check the availability of services

Because the cluster is on a minikube VM, then to access to services from outside the cluster you need to use the minikube VM ip:

root@devbox:~/Projects/k8s$ minikube ip
192.168.49.2

http routing is configured on Ingress. All links containing /(prometheus.*) and /(loki.*), will be addressed to the corresponding prometheus and loki ClusterIP services (ports 9090 for prometheus and 3100 for loki), and everything else, /(.*) - will be addressed to grafana ClusterIP (port 3000). At the same time, each service must wait for a request along the appropriate path. For prometheus, this is implemented by the argument when starting the container - --web.external-url=https://localhost:9090/prometheus/, for loki, you need to define the path_prefix directive (when defined, the given prefix will be present in front of the endpoint paths) in ConfigMap and then, for example, path https://localhost:3100/loki/api/v1/status/buildinfo will be active. And for grafana, the default settings are used.

Checking accessibility from the browser:

# Prometheus
https://192.168.49.2/prometheus
# Loki
https://192.168.49.2/loki/api/v1/status/buildinfo
# Grafana
https://192.168.49.2
# Grafana datasources
https://192.168.49.2/datasources

Debugging

To debug the http services inside the Kubernetes cluster, it is useful to have a Pod with curl on board. I run such a Pod:

root@devbox:~/Projects/k8s$ kubectl run curlpod --image=curlimages/curl

For example, to check any changes in the Loki configuration bypassing Ingress, you can:

find out the ClusterIP ip for Loki:

root@devbox:~/Projects/k8s$ kubectl get service -o wide
NAME                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE    SELECTOR
grafana-cluster-ip-service         ClusterIP   10.99.166.87     <none>        3000/TCP   69d    component=grafana
kubernetes                         ClusterIP   10.96.0.1        <none>        443/TCP    126d   <none>
loki-cluster-ip-service            ClusterIP   10.100.46.66     <none>        3100/TCP   61d    component=loki
prometheus-cluster-ip-service      ClusterIP   10.107.49.254    <none>        9090/TCP   69d    component=prometheus

open sh in curlpod:

root@devbox:~/Projects/k8s$ kubectl get pod
NAME                                       READY   STATUS      RESTARTS       AGE
curlpod                                    0/1     Completed   0              22m
grafana-deployment-5d8ccc6d46-kzw7j        1/1     Running     8 (32h ago)    45d
loki-deployment-567bc6d974-dvtl5           1/1     Running     13 (30h ago)   33d
prometheus-deployment-7b4d56c8bc-6tc2d     1/1     Running     3 (30h ago)    15d

root@devbox:~/Projects/k8s$ kubectl exec -it curlpod -- sh
/ $

check Loki ClusterIP DNS name via Kubernetes coredns from running curlpod:

/ $ nslookup 10.100.46.66
Server:     10.96.0.10
Address:    10.96.0.10:53

66.46.100.10.in-addr.arpa   name = loki-cluster-ip-service.default.svc.cluster.local

send GET with curl from curlpod:

/ $ curl http://loki-cluster-ip-service.default.svc.cluster.local:3100/metrics
...
ring_member_tokens_to_own{name="compactor"} 1
ring_member_tokens_to_own{name="scheduler"} 1

Logs

To debug events inside the Pod, you can view the log, for example:

root@devbox:~/Projects/k8s$ kubectl get pod
NAME                                       READY   STATUS    RESTARTS       AGE
curlpod                                    1/1     Running   1 (17m ago)    39m
grafana-deployment-5d8ccc6d46-kzw7j        1/1     Running   8 (32h ago)    45d
loki-deployment-567bc6d974-dvtl5           1/1     Running   13 (30h ago)   33d
prometheus-deployment-7b4d56c8bc-6tc2d     1/1     Running   3 (30h ago)    15d

root@devbox:~/Projects/k8s$ kubectl logs loki-deployment-567bc6d974-dvtl5
...

For Pod, everything is simple. But how to view events on Ingress?

find the ingress-nginx-controller Pod and namespace:

root@devbox:~$ kubectl get pod --all-namespaces
NAMESPACE       NAME                                        READY   STATUS      RESTARTS        AGE
default         curlpod                                     1/1     Running     1 (28m ago)     50m
default         grafana-deployment-5d8ccc6d46-kzw7j         1/1     Running     8 (32h ago)     45d
default         loki-deployment-567bc6d974-dvtl5            1/1     Running     13 (30h ago)    33d
default         prometheus-deployment-7b4d56c8bc-6tc2d      1/1     Running     3 (30h ago)     15d
ingress-nginx   ingress-nginx-admission-create-qvcrr        0/1     Completed   0               73d
ingress-nginx   ingress-nginx-admission-patch-rp9pc         0/1     Completed   1               73d
ingress-nginx   ingress-nginx-controller-5959f988fd-ptglc   1/1     Running     35 (30h ago)    73d
kube-system     coredns-565d847f94-5ph98                    1/1     Running     30 (32h ago)    126d
kube-system     etcd-minikube                               1/1     Running     30 (32h ago)    126d
kube-system     kube-apiserver-minikube                     1/1     Running     30 (32h ago)    126d
kube-system     kube-controller-manager-minikube            1/1     Running     30 (30h ago)    126d
kube-system     kube-proxy-rzndh                            1/1     Running     29 (30h ago)    126d
kube-system     kube-scheduler-minikube                     1/1     Running     30 (30h ago)    126d
kube-system     storage-provisioner                         1/1     Running     107 (30h ago)   126d

check the log in the found Pod and namespace:

root@devbox:~$ kubectl logs -n ingress-nginx ingress-nginx-controller-5959f988fd-ptglc
...

Ingress troubleshooting is described here