Kubernetes
Overview Copied
The Kubernetes Collection Agent plugin collects logs, metrics, and events from OpenShift and Kubernetes.
Prerequisites Copied
The Kubernetes Collection Agent plugin requires the following versions of Geneos components.
- Gateway and Netprobe 5.1.x or higher. If you are using a Netprobe 5.2.x or higher (contains Collection Agent 2.1.0 or higher) when using this plugin, then you must upgrade to Gateway 5.2.x or higher.
- Collection Agent 2.2.x or higher.
For more information about installing Collection Agent, see Collection Agent setup.
Note
This plugin also requires an additional licence to use. Please contact your ITRS Account Manager or ITRS Sales.
Permissions Copied
The Kubernetes plugin requires the following permissions:
- Access to the Kubernetes API with permission to read pods and watch events in specific or all namespaces.
- Read-only volume mounts for the following host directories:
/var/log/containers
/var/log/pods
/var/lib/docker/containers
- If disk persistence is enabled, a read and write persistent volume is required. You can configure the required size for this volume.
- In OpenShift, the Collection Agent container must run in privileged mode in order to use
HostPorts
and to access the host volume mounts.
Configuration reference Copied
Below is an example YAML file which may require some changes for your project’s configuration:
collectors:
- type: plugin
name: kube-metrics
className: KubernetesMetricsCollector
# The namespaces and namespaceSelectors settings restrict the collection by namespace.
# If both are undefined, all namespaces are collected. If both are defined,
# namespaces will have a higher priority, and namespaceSelectors will be ignored.
# These settings can be defined here (which applies to both events and metrics),
# or under the events and metrics sections separately. If defined in both,
# the effective value is the union of both settings.
# Restrict collection to specific namespaces.
namespaces:
- geneos
# Restrict collection to filtered namespaces based on label selectors.
# In the case of multiple label selectors, a logical AND will be used to combine them.
namespaceSelectors:
- purpose=Production
- department in (Engineering)
# Whether to collect metrics/events for nodes and other non-namespaced resources. Defaults to false.
excludeNonNamespaced: false
# Events module configuration
events:
# Whether events collection is enabled. Defaults to true.
enabled: true
# The namespaces and namespaceSelectors settings restrict the collection by namespace.
# If both are undefined, all namespaces are collected. If both are defined,
# namespaces will have a higher priority, and namespaceSelectors will be ignored.
# If values are listed here and above, the effective value is the union of both settings.
# Restrict collection to specific namespaces.
namespaces:
- ns1
# Restrict collection to filtered namespaces based on label selectors.
# In the case of multiple label selectors, a logical AND will be used to combine them.
namespaceSelectors:
- purpose=Events
# Name of the data point. Default value shown.
dataPointName: kubernetes_event
# Metrics module configuration
metrics:
# Whether metrics collection is enabled. Defaults to true.
enabled: true
# Number of milliseconds between reporting intervals. Default value shown.
reportingInterval: 10000
# The namespaces and namespaceSelectors settings restrict the collection by namespace.
# If both are undefined, all namespaces are collected. If both are defined,
# namespaces will have a higher priority, and namespaceSelectors will be ignored.
# If values are listed here and above, the effective value is the union of both settings.
# Restrict collection to specific namespaces.
namespaces:
- ns2
# Restrict collection to filtered namespaces based on label selectors.
# In the case of multiple label selectors, a logical AND will be used to combine them.
namespaceSelectors:
- purpose=Metrics
- type: plugin
name: kube-logs
className: KubernetesLogCollector
# Container log directory.
# Required. On a Kubernetes or OpenShift node, logs are usually in /var/log/containers.
logDirectory: /var/log/containers
# Directory where the collector will save position files for each container log.
# Required. Must have read/write privileges to this directory.
persistenceDirectory: /var/lib/itrs/collection-agent/log-collector
# Whether to read newly discovered log files from the beginning of the file.
# If false, only lines written to the log after the collector starts will be read.
# Defaults to false.
readFromBeginning: false
# Number of worker threads (i.e. concurrent log readers). Increasing this may improve
# performance, especially if there are several very active log files.
# Default value shown.
workerThreads: 5
# Number of milliseconds to wait before pausing a worker that is blocking other workers from running.
# Default value shown.
longRunningWorkerThreshold: 30000
# Number of milliseconds between log processing intervals, i.e. how long to wait before checking
# if a log has new data to read.
# Default value shown.
processingInterval: 5000
# The namespaces and namespaceSelectors settings restrict the collection by namespace.
# If both are undefined, all namespaces are collected. If both are defined,
# namespaces will have a higher priority, and namespaceSelectors will be ignored.
# Restrict log collection to specific namespaces. Defaults to all namespaces.
namespaces:
- ns1
- ns2
# Restrict collection to filtered namespaces based on label selectors.
# In the case of multiple label selectors, a logical AND will be used to combine them.
namespaceSelectors:
- purpose=Production
- department in (Engineering)
Docker logging configuration Copied
Log collection is supported only when using Docker with the json-file
driver. For example /etc/docker/daemon.json
:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "5"
}
}
It is important to set max-size
large enough so that the logs are not rotated too often and too quickly which may cause the collector to miss data. This is critical if there are applications in the cluster that log at a high frequency.
Label Selector configuration Copied
The namespaceSelectors
setting follows the Label Selector that is described in the Kubernetes Documentation.
Additionally, this setting supports both the Equality-based and Set-based requirements.
Equality-based requirement Copied
namespaceSelectors:
- environment = production
- tier != frontend
Set-based requirement Copied
namespaceSelectors:
- environment in (production, qa)
- tier notin (frontend, backend)
- partition
- !partition
Collection of Kubernetes object labels Copied
The labels for all monitored objects are collected as Attribute data points. The dimensions of an attribute will correspond exactly to the dimensions of the monitored object.
Additionally, an attribute indicating the object kind is also published for each object:
kubernetes.itrsgroup.com/kind = [Node|Pod|etc...]
All attributes are sent periodically, 30 seconds after startup, then every 5 minutes.
Load an include file Copied
A sample kubernetes_mapping.xml
include file for the Kubernetes Collection Agent plugin is provided in /templates
directory of the downloaded Gateway binaries. To load an include file into the Gateway Setup Editor:
- Open the Gateway Setup Editor.
- In the Navigation panel, click Includes to create a new file.
- Enter the location of the file to include in the Location field.
- Update the Priority field. This can be any value except
1
. If you input a priority of1
, the returns an error. - Expand the file location in the Include section.
- Select Click to load.
- Click Yes to load the new include file and save your setup.
Collected metrics Copied
All metrics are collected from the Summary API and cAdvisor of each node.
Note
Certain container and pod metrics collected from cAdvisor will subsequently be moved to CRI metric collection or potentially deprecated. For more information, see Kubernetes enhancements.
Namespace metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_namespace_status | status | namespace | Describes the current state of the namespace. Possible values are Active and Terminating . |
Node metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_node_conditions | status | node | Comma-delimited list of conditions of the node. Possible conditions are Ready , DiskPressure ,MemoryPressure ,PIDPressure , and NetworkUnavailable . |
|
kube_node_cpu_capacity | gauge | millicores | node | Number of CPU cores on a node. |
kube_node_cpu_allocatable | gauge | millicores | node | Number of allocatable CPU cores on a node. |
kube_node_cpu_usage | gauge | % | node | Percentage of CPU usage from allocatable CPU cores of the node. |
kube_node_cpu_core_usage | counter | nanocores | node | CPU usage in nanocores (sum of all cores). |
kube_node_cpu_usage_time | counter | nanoseconds | node | CPU usage in time (sum of all cores). |
kube_node_kubelet_version | attribute | node | Version of kubelet. | |
kube_node_kubeproxy_version | attribute | node | Version of kube-proxy. | |
kube_node_mem_capacity | gauge | bytes | node | Bytes of memory on a node. |
kube_node_mem_allocatable | gauge | bytes | node | Bytes of allocatable memory on a node. |
kube_node_mem_used | gauge | bytes | node | Total memory in use. |
kube_node_memory_free | gauge | bytes | node | Available memory for use. |
kube_node_net_rx | counter | bytes | node, interface | Windowed count of bytes received since last sample. |
kube_node_net_rx_rate | counter | bytes/sec | node, interface | Windowed rate of bytes received since last sample. |
kube_node_net_rx_errors | counter | node, interface | Windowed count of errors received since the last sample. | |
kube_node_net_rx_error_rate | gauge | per sec | node, interface | Windowed rate of errors received since the last sample. |
kube_node_net_tx | counter | bytes | node, interface | Windowed count of bytes sent since last sample. |
kube_node_net_tx_rate | gauge | bytes/sec | node, interface | Windowed rate of bytes sent since last sample. |
kube_node_net_tx_errors | counter | node, interface | Windowed count of errors sent since last sample. | |
kube_node_net_tx_error_rate | gauge | per sec | node, interface | Windowed rate of errors sent since last sample. |
kube_node_fs_size | gauge | bytes | node, volume | Size of the filesystem |
kube_node_fs_used | gauge | bytes | node, volume | Number of bytes used. |
kube_node_fs_usage | gauge | % | node, volume | Percentage of the filesystem used. The percentage is calculated by dividing |
kube_node_fs_free | gauge | bytes | node, volume | Number of bytes free. |
kube_node_fs_inodes_used | gauge | node, volume | Number of used inodes by the filesystem. Total number of inodes may not equal
|
|
kube_node_fs_inodes_free | gauge | node, volume | Number of free inodes. | |
kube_node_taints | attribute | node | Comma-delimited list of taints. Taints are described in key=<value>:effect format. |
Note
Filesystem metrics for a node represent the root filesystem whosevolume_name
dimension isfs
by default.
Pod metrics Copied
Pods filesystem metrics come from different dimensions:
ephemeral-storage
— reports the total filesystem usage for the containers and emptyDir-backed volumes in the measured Pod.- Volumes — stats pertaining to volume usage of filesystem resources, whose dimension is the
volume_name
.
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_pod_containers_ready | gauge | node, namespace, pod | Number of ready containers. | |
kube_pod_containers_running | gauge | node, namespace, pod | Number of running containers. | |
kube_pod_containers_terminated | gauge | node, namespace, pod | Number of terminated containers. | |
kube_pod_containers_waiting | gauge | node, namespace, pod | Number of waiting containers. | |
kube_pod_cpu_cfs_periods | counter | node, namespace, pod | Number of elapsed enforcement period intervals of the pod. This is acquired using the cAdvisor. | |
kube_pod_cpu_cfs_throttled_periods | counter | node, namespace, pod | Number of throttled period intervals of the pod. This is acquired using the cAdvisor. | |
kube_pod_cpu_cfs_throttled_seconds | counter | seconds | node, namespace, pod | Total time duration the pod has been throttled. This is acquired using the cAdvisor. |
kube_pod_cpu_core_usage | gauge | nanocores | node, namespace, pod | CPU usage in nanocores (sum of all cores). |
kube_pod_cpu_usage | gauge | % | node, namespace, pod | Percentage of CPU usage from allocatable CPU cores of the node. |
kube_pod_cpu_usage_time | counter | nanoseconds | node, namespace, pod | CPU usage in time (sum of all cores). |
kube_pod_created | attribute | epoch_milliseconds | node, namespace, pod | Pod creation timestamp. |
kube_pod_fs_free | gauge | bytes | node, namespace, volume | Number of bytes free. |
kube_pod_fs_inodes_free | gauge | node, namespace, volume | Number of free inodes. | |
kube_pod_fs_inodes_used | gauge | node, namespace, volume | Number of used inodes in the filesystem. Total number of inodes may not equal
because this filesystem may share inodes with other filesystems. For |
|
kube_pod_fs_size | gauge | bytes | node, namespace, volume | Size of the filesystem. |
kube_pod_fs_usage | gauge | % | node, namespace, volume | Percentage of the filesystem used. The percentage is calculated by dividing |
kube_pod_fs_used | gauge | bytes | node, namespace, volume | Number of bytes used. For |
kube_pod_ip | attribute | node, namespace, pod | Default IP address of the pod. | |
kube_pod_mem_free | gauge | bytes | node, namespace, pod | Available memory for use. |
kube_pod_mem_used | gauge | bytes | node, namespace, pod | Memory in use. |
kube_pod_net_rx | counter | bytes | node, namespace, interface | Windowed count of bytes received since last sample. |
kube_pod_net_rx_errors | counter | node, namespace, interface | Windowed count of errors received since the last sample. | |
kube_pod_net_rx_error_rate | gauge | per sec | node, namespace, interface | Windowed rate of errors received since the last sample. |
kube_pod_net_rx_rate | counter | bytes/sec | node, namespace, interface | Windowed rate of bytes received since last sample. |
kube_pod_net_tx | counter | bytes | node, namespace, interface | Windowed count of bytes sent since last sample. |
kube_pod_net_tx_rate | gauge | bytes/sec | node, namespace, interface | Windowed rate of bytes sent since last sample. |
kube_pod_netw_tx_error_rate | gauge | bytes/sec | node, namespace, interface | Windowed rate of errors sent since last sample. |
kube_pod_netw_tx_errors | counter | node, namespace, interface | Windowed count of errors sent since last sample. | |
kube_pod_oom_events | counter | node, namespace, pod | Count of out of memory events observed in the pod. This is acquired using the cAdvisor. | |
kube_pod_status | status | node, namespace, pod | Status of the pod’s deployment. Values:
|
|
kube_pod_status_condition | attribute | node, namespace, pod | Latest status condition of the pod. Possible values are PodScheduled , ContainersReady , Initialized , and Ready . |
|
kube_pod_status_condition_reason | attribute | node, namespace, pod | Reason for the latest status condition of the pod. |
Container metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_container_cpu_cfs_periods | counter | node, namespace, pod, container | Number of elapsed enforcement period intervals of the container. This is acquired using the cAdvisor. | |
kube_container_cpu_cfs_throttled_periods | counter | node, namespace, pod, container | Number of throttled period intervals of the container. This is acquired using the cAdvisor. | |
kube_container_cpu_cfs_throttled_seconds | counter | seconds | node, namespace, pod, container | Total time duration the container has been throttled. This is acquired using the cAdvisor. |
kube_container_cpu_core_usage | gauge | nanocores | node, namespace, pod, container | CPU usage in nanocores (sum of all cores). |
kube_container_cpu_limit | gauge | millicores | node, namespace, pod, container | CPU resource limit. See Kubernetes documentation for resource configuration details. |
kube_container_cpu_limit_usage | gauge | % | node, namespace, pod, container | Percentage used of the configured CPU resource limit. See Kubernetes documentation for resource configuration details. |
kube_container_cpu_request | gauge | millicores | node, namespace, pod, container | CPU resource request. See Kubernetes documentation for resource configuration details. |
kube_container_cpu_request_usage | gauge | % | node, namespace, pod, container | Percentage of the configured CPU resource request. See Kubernetes documentation for resource configuration details. |
kube_container_cpu_usage | gauge | % | node, namespace, pod, container | Percentage of CPU usage from allocatable CPU cores of the node. |
kube_container_cpu_usage_time | counter | nanoseconds | node, namespace, pod, container | CPU usage in time (sum of all cores). |
kube_container_fs_free | gauge | bytes | node, namespace, pod, container, volume | Number of bytes free. |
kube_container_fs_inodes_free | gauge | node, namespace, pod, container, volume | Number of free inodes. | |
kube_container_fs_inodes_used | gauge | node, namespace, pod, container, volume | Number of used inodes in the filesystem. Total number of inodes may not equal
because this filesystem may share inodes with other filesystems. For |
|
kube_container_fs_size | gauge | bytes | node, namespace, pod, container, volume | Size of the filesystem. |
kube_container_fs_usage | gauge | % | node, namespace, pod, container, volume | Percentage of the filesystem used. The percentage is calculated by dividing |
kube_container_fs_used | gauge | bytes | node, namespace, pod, container, volume | Number of bytes used. For |
kube_container_mem_free | gauge | bytes | node, namespace, pod, container | Available memory for use. |
kube_container_mem_limit | gauge | bytes | node, namespace, pod, container | Memory resource limit. See Kubernetes documentation for resource configuration details. |
kube_container_mem_limit_usage | gauge | % | node, namespace, pod, container | Percentage used of the configured memory resource limit. See Kubernetes documentation for resource configuration details. |
kube_container_mem_request | gauge | bytes | node, namespace, pod, container | Memory resource request. See Kubernetes documentation for resource configuration details. |
kube_container_mem_request_usage | gauge | % | node, namespace, pod, container | Percentage used of the configured memory resource request. See Kubernetes documentation for resource configuration details. |
kube_container_mem_used | gauge | bytes | node, namespace, pod, container | Memory in use. |
kube_container_oom_events | counter | node, namespace, pod, container | Count of out of memory events observed in the container. This is acquired using the cAdvisor. | |
kube_container_status | status | node, namespace, pod, container, volume | Current state of the container. Values: |
ResourceQuota metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_resource_quota_hard | gauge | millicores/bytes/none | namespace, quota, resource | Configured hard limit. |
kube_resource_quota_used | gauge | millicores/bytes/none | namespace, quota, resource | Quota used amount. |
kube_resource_quota_used_percent | gauge | % | namespace, quota, resource | Quota used percent. |
Workload/Deployment metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_deployment_spec_replicas | gauge | namespace, deployment | Number of desired pods. | |
kube_deployment_status_replicas | gauge | namespace, deployment | Total number of non-terminated pods targeted by the deployment. | |
kube_deployment_status_replicas_ready | gauge | namespace, deployment | Total number of ready pods targeted by the deployment. | |
kube_deployment_status_replicas_available | gauge | namespace, deployment | Total number of available pods, which are ready for at least minReadySeconds , targeted by the deployment. |
|
kube_deployment_status_replicas_unavailable | gauge | namespace, deployment | Total number of unavailable pods targeted by the deployment. This is the required total number of pods for the deployment to have 100% available capacity. The pods may either be running but not yet available or have not been created yet. | |
kube_deployment_status_condition | status | namespace, deployment | Describes the current state of the deployment. |
Workload/DaemonSet metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_daemonset_status_number_available | gauge | namespace, daemonset | Number of nodes that are expected to run the daemon pod and have one or more running and available daemon pods. | |
kube_daemonset_status_number_unavailable | gauge | namespace, daemonset | Number of nodes that are expected to run the daemon pod but not having running and available daemon pods. | |
kube_daemonset_status_current_number_scheduled | gauge | namespace, daemonset | Number of nodes that are expected to run the daemon pod and have at least one running daemon pod. | |
kube_daemonset_status_desired_number_scheduled | gauge | namespace, daemonset | Total number of nodes expected to run the daemon pod. | |
kube_daemonset_status_number_misscheduled | gauge | namespace, daemonset | Number of nodes that are not expected to run the daemon pod but having a running daemon pod. | |
kube_daemonset_status_number_ready | gauge | namespace, daemonset | Number of nodes that are expected to run the daemon pod and have one or more running and ready daemon pods. | |
kube_daemonset_status_condition | status | namespace, daemonset | Describes the current state of the DaemonSet. |
Workload/ReplicaSet metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_replicaset_spec_replicas | gauge | namespace, replicaset_name | Number of desired replicas. | |
kube_replicaset_status | gauge | namespace, replicaset_name | Number of desired most recently observed replicas. | |
kube_replicaset_status_replicas_available | gauge | namespace, replicaset_name | Number of available replicas, which are ready for at least minReadySeconds , in the replica set. |
|
kube_replicaset_status_replicas_ready | gauge | namespace, replicaset_name | Number of ready replicas for this replica set. | |
kube_replicaset_status_condition | attribute | namespace, replicaset_name | Describes the current state of the replica set. |
Workload/StatefulSet metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_statefulset_spec_replicas | gauge | namespace, statefulset | Desired number of replicas for the given template. | |
kube_statefulset_status_replicas_available | gauge | namespace, statefulset | Number of pods created by the StatefulSet controller. | |
kube_statefulset_status_replicas_current | gauge | namespace, statefulset | Number of pods created by the StatefulSet controller from the StatefulSet version indicated by currentRevision. | |
kube_statefulset_status_replicas_ready | gauge | namespace, statefulset | Number of pods created by the StatefulSet controller that have a Ready condition. |
|
kube_statefulset_status_condition | status | namespace, statefulset | Describes the current state of the stateful set. |
Workload/Job metrics Copied
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
kube_job_spec_completions | gauge | namespace, job | Desired number of successfully finished pods that should run with the job. | |
kube_job_spec_parallelism | gauge | namespace, job | Maximum desired number of pods that should run with the job at any given time. | |
kube_job_status_active | gauge | namespace, job | Number of actively running pods. | |
kube_job_status_succeeded | gauge | namespace, job | Number of successful pods. | |
kube_job_status_failed | gauge | namespace, job | Number of failed pods. | |
kube_job_status_start_time | gauge | epoch_milliseconds | namespace, job | Time when the job was acknowledged by the job controller. |
kube_job_status_completion_time | gauge | epoch_milliseconds | namespace, job | Time when the job was completed. |
kube_job_status_condition | status | namespace, job | Describes the current state of the job. |
Kubernetes log rotation Copied
This table lists the supported options of the log collector rotation schemes:
Log rotation scheme | Description |
---|---|
DockerJSON driver | Supported
|
Logrotate create mode | Supported |
Logrotate copy mode | Not supported |
Logrotate copytruncate mode | Not supported |
Collecting from compressed log files | Not supported |