|
|
## About this project
|
|
|
The purpose of this project is to test various solutions for monitoring of Kubernetes(k8s) environment. Inside a k8s cluster we can observe various metrics, for example: interactions between pods inside a cluster, whether pods are up or down, utilization of system resources inside of pods and many more. It is useful to track these metrics to get a sense of overall health of the cluster and prevent failures in advance.
|
|
|
We will use `minikube` tool, which creates a local k8s cluster with single node.
|
|
|
It is recommended to run this project in VM(virtual machine).
|
|
|
|
|
|
The purpose of this project is to test various solutions for monitoring of Kubernetes(k8s) environment. Inside a k8s cluster we can observe various metrics, for example: interactions between pods inside a cluster, whether pods are up or down, utilization of system resources inside of pods and many more. It is useful to track these metrics to get a sense of overall health of the cluster and prevent failures in advance. We will use `minikube` tool, which creates a local k8s cluster with single node. It is recommended to run this project in VM(virtual machine).
|
|
|
|
|
|
## Web app
|
|
|
If we wish to compare monitoring solutions, we should generate some traffic inside the cluster. I decided to create a simple web application, consisting of `frontend` and `backend`. Frontend is a container based on `node` image using the `Vue.js` framework. Backend is using `flask`, a micro framework for `python`. Both of these building blocks have their k8s pod and service. There is a `nginx` proxy inside of the `frontend` pod, which will redirect URLs starting with `/api` to the `backend` pod, other requests will be served from the `frontend` pod.
|
|
|
Api of the `backend` pod is used to collect information and manipulate system resources. There is an issue however, since we are running everything locally using `minikube`, it is impossible to separate resources used by the pod from resources used outside of the k8s cluster.
|
|
|
|
|
|
If we wish to compare monitoring solutions, we should generate some traffic inside the cluster. I decided to create a simple web application, consisting of `frontend` and `backend`. Frontend is a container based on `node` image using the `Vue.js` framework. Backend is using `flask`, a micro framework for `python`. Both of these building blocks have their k8s pod and service. There is a `nginx` proxy inside of the `frontend` pod, which will redirect URLs starting with `/api` to the `backend` pod, other requests will be served from the `frontend` pod. Api of the `backend` pod is used to collect information and manipulate system resources. There is an issue however, since we are running everything locally using `minikube`, it is impossible to separate resources used by the pod from resources used outside of the k8s cluster.
|
|
|
|
|
|
#### Structure of Web app
|
|
|
|
|
|
- **/** **/host**
|
|
|
- shows information about the container the `backend` app runs on
|
|
|
- shows information about the container the `backend` app runs on
|
|
|
- **/cpu**
|
|
|
- shows current cpu load, allows creation of cpu stressing porcesses and displays those processes
|
|
|
- shows current cpu load, allows creation of cpu stressing porcesses and displays those processes
|
|
|
- **/mem**
|
|
|
- shows total, used, free and available memory and allows creation and diplay of memory stressing processes
|
|
|
- shows total, used, free and available memory and allows creation and diplay of memory stressing processes
|
|
|
|
|
|
#### Api Reference
|
|
|
|
|
|
- host
|
|
|
- GET /api/host - return information about the container the `backend` app runs on
|
|
|
- GET /api/host - return information about the container the `backend` app runs on
|
|
|
- cpu
|
|
|
- GET /api/cpu - return current cpu load collected by [mpstat](https://linux.die.net/man/1/mpstat) tool. The collection takes 1 second to complete
|
|
|
- POST /api/cpu/create - creates a [stress-ng](https://wiki.ubuntu.com/Kernel/Reference/stress-ng) process that will cause cause the load of cpu to increase by specified amount.
|
|
|
- params:
|
|
|
- load: the amount in % to stress cpu for
|
|
|
- GET /api/cpu/running - returns json with all the currently running `stress-ng` processes that are stressing the cpu
|
|
|
- POST /api/cpu/kill - kill one or all `strss-ng` cpu processes
|
|
|
- parmas:
|
|
|
- pid: pid of process to kill
|
|
|
- all: if all=true then kill all processes
|
|
|
- GET /api/cpu - return current cpu load collected by [mpstat](https://linux.die.net/man/1/mpstat) tool. The collection takes 1 second to complete
|
|
|
- POST /api/cpu/create - creates a [stress-ng](https://wiki.ubuntu.com/Kernel/Reference/stress-ng) process that will cause cause the load of cpu to increase by specified amount.
|
|
|
- params:
|
|
|
- load: the amount in % to stress cpu for
|
|
|
- GET /api/cpu/running - returns json with all the currently running `stress-ng` processes that are stressing the cpu
|
|
|
- POST /api/cpu/kill - kill one or all `strss-ng` cpu processes
|
|
|
- parmas:
|
|
|
- pid: pid of process to kill
|
|
|
- all: if all=true then kill all processes
|
|
|
- memory
|
|
|
- GET /api/memory - return total, used, free and avaliable memory collected by [free](https://man7.org/linux/man-pages/man1/free.1.html) tool
|
|
|
- params:
|
|
|
- format - one of ("kb", "mb") - the format in whci hthe statistics should be displayed. Default is "mb"
|
|
|
- POST /api/memory/create - spawns a `stress-ng` process that occupy the specified amount of RAM.
|
|
|
- params:
|
|
|
- size - size of process
|
|
|
- format - one of ("kb, mb") - format to append to size
|
|
|
- GET /api/memory/running - returns json with all the currently running `stress-ng` processes that are stressing memory
|
|
|
- POST /api/memory/kill - kill one or all `strss-ng` memory processes
|
|
|
- parmas:
|
|
|
- pid: pid of process to kill
|
|
|
- all: if all=true then kill all processes
|
|
|
- GET /api/memory - return total, used, free and avaliable memory collected by [free](https://man7.org/linux/man-pages/man1/free.1.html) tool
|
|
|
- params:
|
|
|
- format - one of ("kb", "mb") - the format in whci hthe statistics should be displayed. Default is "mb"
|
|
|
- POST /api/memory/create - spawns a `stress-ng` process that occupy the specified amount of RAM.
|
|
|
- params:
|
|
|
- size - size of process
|
|
|
- format - one of ("kb, mb") - format to append to size
|
|
|
- GET /api/memory/running - returns json with all the currently running `stress-ng` processes that are stressing memory
|
|
|
- POST /api/memory/kill - kill one or all `strss-ng` memory processes
|
|
|
- params:
|
|
|
- pid: pid of process to kill
|
|
|
- all: if all=true then kill all processes
|
|
|
|
|
|
## Prometheus
|
|
|
|
|
|
#### Overview
|
|
|
|
|
|
[Prometheus](https://prometheus.io) is an open-source system monitoring tool. It is specifically targeted for highly dynamic container environments, such as `kubernetes`. It offers monitoring of many aspects of environment and automatic alerting. Prometheus works by collecting metrics from monitored targets by scraping the `/metrics` HTTP endpoint and stores these metrics to storage. We can use PromQL language to then query the stored metrics. While Prometheus offers some basic visualization tools, we will use Grafana to visualize the collected metrics.
|
|
|
[Prometheus](https://prometheus.io) is an open-source system monitoring tool. It is specifically targeted for highly dynamic container environments, such as `kubernetes`. It offers monitoring of many aspects of environment and automatic alerting. Prometheus works by collecting metrics from monitored targets by scraping the `/metrics` HTTP endpoint and stores these metrics to storage. We can use PromQL language to then query the stored metrics. While Prometheus allows same basic data visualization, we will use Grafana to visualize the data and Prometheus for scraping.
|
|
|
|
|
|
#### Setup
|
|
|
|
|
|
When installing Prometheus, we have two options: manually or using [helm](https://helm.sh/) package manager. Since Prometheus is stateful application(it saves the data from current session to storage), it is not recommended to set it up manually(unless we need some customizable feature), as it would be difficult. Instead, we will use `helm`, the package manager for k8s, for the initial setup.
|
|
|
When installing Prometheus, we have two options: manually or using [helm](https://helm.sh/) package manager. Since Prometheus is stateful application(it saves the data from current session to storage), it is not recommended to set it up manually(unless we need some customizable feature), as it would be difficult. Instead, we will use `helm`, the package manager for k8s, for the initial setup. In general, using `helm` with official charts(packaging format used by `helm`) is the preferred way of installing kubernetes components.
|
|
|
|
|
|
#### Config
|
|
|
Before we install Prometheus, we need to enable the `metrics` page on our `backend` and `frontend` pods so Prometheus can collect metrics. This could be done manually but in practice we are going to use Prometheus [exporters](https://prometheus.io/docs/instrumenting/exporters/). Exporters can be either language specific libraries or Docker images. There are plenty of officially supported exporters for the most popular services. For our `backend` pod, which runs a Flask webserver, we are going to use [prometheus-flask-exporter](https://github.com/rycus86/prometheus_flask_exporter). Enabling basic metrics scraping with this exporter is as easy as two lines of code:
|
|
|
|
|
|
Before we begin, we need to enable the `/metrics` endpoint on our custom pods so Prometheus can scrape the metrics. This can be done using exporters, a library which helps exporting metrics from third-party services to Prometheus. There are [a lot](https://prometheus.io/docs/instrumenting/exporters/) of officially supported exporters. Since our backend pod is using Flask, we will utilize [prometheus-flask-exporter](https://github.com/rycus86/prometheus_flask_exporter). Enabling some basic metrics is as easy as writing two lines of code:
|
|
|
|
|
|
```python
|
|
|
from flask import Flask
|
|
|
from prometheus_flask_exporter import PrometheusMetrics
|
|
|
|
|
|
app = Flask(__name__)
|
|
|
metrics = PrometheusMetrics(app)
|
|
|
PrometheusMetrics(app)
|
|
|
```
|
|
|
|
|
|
Now we need to make Prometheus aware of this new target. For this we will use ServiceMonitor object, that will get deployed into the same namespace as our webapp and will monitor a service given by the `selector` property. By default, Prometheus only monitors objects in the same namespace, we need to add `serviceMonitorNamespaceSelector` that will match common labels on all namespaces we want to monitor. |
|
|
\ No newline at end of file |