Kubernetes

Overview Copied

The Kubernetes Collection Agent plugin collects logs, metrics, and events from OpenShift and Kubernetes.

Prerequisites Copied

The Kubernetes Collection Agent plugin requires the following versions of Geneos components.

Gateway and Netprobe 5.1.x or higher. If you are using a Netprobe 5.2.x or higher (contains Collection Agent 2.1.0 or higher) when using this plugin, then you must upgrade to Gateway 5.2.x or higher.
Collection Agent 2.2.x or higher.

For more information about installing Collection Agent, see Collection Agent setup.

Note
This plugin also requires an additional licence to use. Please contact your ITRS Account Manager or ITRS Sales.

Permissions Copied

The Kubernetes plugin requires the following permissions:

Access to the Kubernetes API with permission to read pods and watch events in specific or all namespaces.
Read-only volume mounts for the following host directories:
- /var/log/containers
- /var/log/pods
- /var/lib/docker/containers
If disk persistence is enabled, a read and write persistent volume is required. You can configure the required size for this volume.
In OpenShift, the Collection Agent container must run in privileged mode in order to use HostPorts and to access the host volume mounts.

Configuration reference Copied

Below is an example YAML file which may require some changes for your project’s configuration:

collectors:
- type: plugin
  name: kube-metrics
  className: KubernetesMetricsCollector

  # The namespaces and namespaceSelectors settings restrict the collection by namespace. 
  # If both are undefined, all namespaces are collected. If both are defined, 
  # namespaces will have a higher priority, and namespaceSelectors will be ignored. 
  # These settings can be defined here (which applies to both events and metrics), 
  # or under the events and metrics sections separately. If defined in both, 
  # the effective value is the union of both settings.

  # Restrict collection to specific namespaces.
  namespaces:
  - geneos       

  # Restrict collection to filtered namespaces based on label selectors.
  # In the case of multiple label selectors, a logical AND will be used to combine them.
  namespaceSelectors:
  - purpose=Production
  - department in (Engineering)

  # Whether to collect metrics/events for nodes and other non-namespaced resources. Defaults to false.
  excludeNonNamespaced: false

  # Events module configuration
  events:
  
    # Whether events collection is enabled.  Defaults to true.
    enabled: true

    # The namespaces and namespaceSelectors settings restrict the collection by namespace. 
    # If both are undefined, all namespaces are collected. If both are defined, 
    # namespaces will have a higher priority, and namespaceSelectors will be ignored. 
    # If values are listed here and above, the effective value is the union of both settings.
    
    # Restrict collection to specific namespaces.
    namespaces:
    - ns1

    # Restrict collection to filtered namespaces based on label selectors.
    # In the case of multiple label selectors, a logical AND will be used to combine them.
    namespaceSelectors:
    - purpose=Events

    # Name of the data point.  Default value shown.
    dataPointName: kubernetes_event
  
  # Metrics module configuration
  metrics:
  
    # Whether metrics collection is enabled.  Defaults to true.
    enabled: true
    
    # Number of milliseconds between reporting intervals.  Default value shown.
    reportingInterval: 10000

    # The namespaces and namespaceSelectors settings restrict the collection by namespace. 
    # If both are undefined, all namespaces are collected. If both are defined, 
    # namespaces will have a higher priority, and namespaceSelectors will be ignored. 
    # If values are listed here and above, the effective value is the union of both settings.

    # Restrict collection to specific namespaces.
    namespaces:
    - ns2

    # Restrict collection to filtered namespaces based on label selectors.
    # In the case of multiple label selectors, a logical AND will be used to combine them.
    namespaceSelectors:
    - purpose=Metrics

- type: plugin
  name: kube-logs
  className: KubernetesLogCollector
  
  # Container log directory.
  # Required.  On a Kubernetes or OpenShift node, logs are usually in /var/log/containers.
  logDirectory: /var/log/containers
  
  # Directory where the collector will save position files for each container log.
  # Required.  Must have read/write privileges to this directory.
  persistenceDirectory: /var/lib/itrs/collection-agent/log-collector
  
  # Whether to read newly discovered log files from the beginning of the file.
  # If false, only lines written to the log after the collector starts will be read.
  # Defaults to false.
  readFromBeginning: false
  
  # Number of worker threads (i.e. concurrent log readers).  Increasing this may improve 
  # performance, especially if there are several very active log files.
  # Default value shown.
  workerThreads: 5
  
  # Number of milliseconds to wait before pausing a worker that is blocking other workers from running.
  # Default value shown.
  longRunningWorkerThreshold: 30000

  # Number of milliseconds between log processing intervals, i.e. how long to wait before checking
  # if a log has new data to read. 
  # Default value shown.
  processingInterval: 5000

  # The namespaces and namespaceSelectors settings restrict the collection by namespace. 
  # If both are undefined, all namespaces are collected. If both are defined, 
  # namespaces will have a higher priority, and namespaceSelectors will be ignored. 
  
  # Restrict log collection to specific namespaces. Defaults to all namespaces.
  namespaces:
  - ns1
  - ns2

  # Restrict collection to filtered namespaces based on label selectors.
  # In the case of multiple label selectors, a logical AND will be used to combine them.
  namespaceSelectors:
  - purpose=Production
  - department in (Engineering)

Log collection Copied

The Kubernetes Collection Agent plugin supports log collection on kubernetes clusters using containerd/CRI-O container runtimes.

Label Selector configuration Copied

The namespaceSelectors setting follows the Label Selector that is described in the Kubernetes Documentation.

Additionally, this setting supports both the Equality-based and Set-based requirements.

Equality-based requirement Copied

namespaceSelectors:
- environment = production
- tier != frontend

Set-based requirement Copied

namespaceSelectors:
- environment in (production, qa)
- tier notin (frontend, backend)
- partition
- !partition

Collection of Kubernetes object labels Copied

The labels for all monitored objects are collected as Attribute data points. The dimensions of an attribute will correspond exactly to the dimensions of the monitored object.

Additionally, an attribute indicating the object kind is also published for each object: kubernetes.itrsgroup.com/kind = [Node|Pod|etc...]

All attributes are sent periodically, 30 seconds after startup, then every 5 minutes.

Load an include file Copied

A sample kubernetes_mapping.xml include file for the Kubernetes Collection Agent plugin is provided in /templates directory of the downloaded Gateway binaries. To load an include file into the Gateway Setup Editor:

Open the Gateway Setup Editor.
In the Navigation panel, click Includes to create a new file.
Enter the location of the file to include in the Location field.
Update the Priority field. This can be any value except 1. If you input a priority of 1, the returns an error.
Expand the file location in the Include section.
Select Click to load.
Click Yes to load the new include file and save your setup.

Collected metrics Copied

All metrics are collected from the Summary API and cAdvisor of each node.

Note
Certain container and pod metrics collected from cAdvisor will subsequently be moved to CRI metric collection or potentially deprecated. For more information, see Kubernetes enhancements.

Namespace metrics Copied

Metric	Type	Unit	Dimensions	Description
kube_namespace_status	status		namespace	Describes the current state of the namespace. Possible values are `Active` and `Terminating`.

Node metrics Copied

Metric	Type	Unit	Dimensions	Description
kube_node_conditions	status		node	Comma-delimited list of conditions of the node. Possible conditions are `Ready`, `DiskPressure`,`MemoryPressure`,`PIDPressure`, and `NetworkUnavailable`.
kube_node_cpu_capacity	gauge	millicores	node	Number of CPU cores on a node.
kube_node_cpu_allocatable	gauge	millicores	node	Number of allocatable CPU cores on a node.
kube_node_cpu_usage	gauge	%	node	Percentage of CPU usage from allocatable CPU cores of the node.
kube_node_cpu_core_usage	counter	nanocores	node	CPU usage in nanocores (sum of all cores).
kube_node_cpu_usage_time	counter	nanoseconds	node	CPU usage in time (sum of all cores).
kube_node_kubelet_version	attribute		node	Version of kubelet.
kube_node_kubeproxy_version	attribute		node	Version of kube-proxy.
kube_node_mem_capacity	gauge	bytes	node	Bytes of memory on a node.
kube_node_mem_allocatable	gauge	bytes	node	Bytes of allocatable memory on a node.
kube_node_mem_used	gauge	bytes	node	Total memory in use.
kube_node_memory_free	gauge	bytes	node	Available memory for use.
kube_node_net_rx	counter	bytes	node, interface	Windowed count of bytes received since last sample.
kube_node_net_rx_rate	counter	bytes/sec	node, interface	Windowed rate of bytes received since last sample.
kube_node_net_rx_errors	counter		node, interface	Windowed count of errors received since the last sample.
kube_node_net_rx_error_rate	gauge	per sec	node, interface	Windowed rate of errors received since the last sample.
kube_node_net_tx	counter	bytes	node, interface	Windowed count of bytes sent since last sample.
kube_node_net_tx_rate	gauge	bytes/sec	node, interface	Windowed rate of bytes sent since last sample.
kube_node_net_tx_errors	counter		node, interface	Windowed count of errors sent since last sample.
kube_node_net_tx_error_rate	gauge	per sec	node, interface	Windowed rate of errors sent since last sample.
kube_node_fs_size	gauge	bytes	node, volume	Size of the filesystem
kube_node_fs_used	gauge	bytes	node, volume	Number of bytes used.
kube_node_fs_usage	gauge	%	node, volume	Percentage of the filesystem used. The percentage is calculated by dividing `kube_node_fs_used` by `kube_node_fs_size`. Possible values can be any number between `0` and `100`.
kube_node_fs_free	gauge	bytes	node, volume	Number of bytes free.
kube_node_fs_inodes_used	gauge		node, volume	Number of used inodes by the filesystem. Total number of inodes may not equal `kube_node_fs_inodes_free + kube_node_fs_inodes_used` because this filesystem may share inodes with other filesystems.
kube_node_fs_inodes_free	gauge		node, volume	Number of free inodes.
kube_node_taints	attribute		node	Comma-delimited list of taints. Taints are described in `key=<value>:effect` format.

Note
Filesystem metrics for a node represent the root filesystem whose volume_name dimension is fs by default.

Pod metrics Copied

Pods filesystem metrics come from different dimensions:

ephemeral-storage — reports the total filesystem usage for the containers and emptyDir-backed volumes in the measured Pod.
Volumes — stats pertaining to volume usage of filesystem resources, whose dimension is the volume_name.

Metric	Type	Unit	Dimensions	Description
kube_pod_containers_ready	gauge		node, namespace, pod	Number of ready containers.
kube_pod_containers_running	gauge		node, namespace, pod	Number of running containers.
kube_pod_containers_terminated	gauge		node, namespace, pod	Number of terminated containers.
kube_pod_containers_waiting	gauge		node, namespace, pod	Number of waiting containers.
kube_pod_cpu_cfs_periods	counter		node, namespace, pod	Number of elapsed enforcement period intervals of the pod. This is acquired using the cAdvisor.
kube_pod_cpu_cfs_throttled_periods	counter		node, namespace, pod	Number of throttled period intervals of the pod. This is acquired using the cAdvisor.
kube_pod_cpu_cfs_throttled_seconds	counter	seconds	node, namespace, pod	Total time duration the pod has been throttled. This is acquired using the cAdvisor.
kube_pod_cpu_core_usage	gauge	nanocores	node, namespace, pod	CPU usage in nanocores (sum of all cores).
kube_pod_cpu_usage	gauge	%	node, namespace, pod	Percentage of CPU usage from allocatable CPU cores of the node.
kube_pod_cpu_usage_time	counter	nanoseconds	node, namespace, pod	CPU usage in time (sum of all cores).
kube_pod_created	attribute	epoch_milliseconds	node, namespace, pod	Pod creation timestamp.
kube_pod_fs_free	gauge	bytes	node, namespace, volume	Number of bytes free.
kube_pod_fs_inodes_free	gauge		node, namespace, volume	Number of free inodes.
kube_pod_fs_inodes_used	gauge		node, namespace, volume	Number of used inodes in the filesystem. Total number of inodes may not equal `kube_pod_fs_inodes_free +` `kube_pod_fs_inodes_used` because this filesystem may share inodes with other filesystems. For `ephemeral-storage volume`, it reports the sum of `kube_container_fs_inodes_used` for every container `rootfs` volume in the current pod.
kube_pod_fs_size	gauge	bytes	node, namespace, volume	Size of the filesystem.
kube_pod_fs_usage	gauge	%	node, namespace, volume	Percentage of the filesystem used. The percentage is calculated by dividing `kube_pod_fs_used` by `kube_pod_fs_size`. Possible values can be any number between `0` and `100`.
kube_pod_fs_used	gauge	bytes	node, namespace, volume	Number of bytes used. For `ephemeral-storage` volume, this is the sum of `kube_container_fs_used` from every container `rootfs` and `logs` storage plus the sum of `kube_pod_fs_used` for every volume of type `emptyDir`. For other volume types, it represents used bytes on the corresponding volume. See PodStats documentation.
kube_pod_ip	attribute		node, namespace, pod	Default IP address of the pod.
kube_pod_mem_free	gauge	bytes	node, namespace, pod	Available memory for use.
kube_pod_mem_used	gauge	bytes	node, namespace, pod	Memory in use.
kube_pod_net_rx	counter	bytes	node, namespace, interface	Windowed count of bytes received since last sample.
kube_pod_net_rx_errors	counter		node, namespace, interface	Windowed count of errors received since the last sample.
kube_pod_net_rx_error_rate	gauge	per sec	node, namespace, interface	Windowed rate of errors received since the last sample.
kube_pod_net_rx_rate	counter	bytes/sec	node, namespace, interface	Windowed rate of bytes received since last sample.
kube_pod_net_tx	counter	bytes	node, namespace, interface	Windowed count of bytes sent since last sample.
kube_pod_net_tx_rate	gauge	bytes/sec	node, namespace, interface	Windowed rate of bytes sent since last sample.
kube_pod_netw_tx_error_rate	gauge	bytes/sec	node, namespace, interface	Windowed rate of errors sent since last sample.
kube_pod_netw_tx_errors	counter		node, namespace, interface	Windowed count of errors sent since last sample.
kube_pod_oom_events	counter		node, namespace, pod	Count of out of memory events observed in the pod. This is acquired using the cAdvisor.
kube_pod_status	status		node, namespace, pod	Status of the pod’s deployment. Values: `Pending`, `Running`, `Succeeded`, `Failed`, `Unknown`, `Deleted` `Deleted` is a phase that this plugin uses to report that a pod has been successfully deleted. It is not used by the Kubernetes Summary API.
kube_pod_status_condition	attribute		node, namespace, pod	Latest status condition of the pod. Possible values are `PodScheduled`, `ContainersReady`, `Initialized`, and `Ready`.
kube_pod_status_condition_reason	attribute		node, namespace, pod	Reason for the latest status condition of the pod.

Container metrics Copied

Metric	Type	Unit	Dimensions	Description
kube_container_cpu_cfs_periods	counter		node, namespace, pod, container	Number of elapsed enforcement period intervals of the container. This is acquired using the cAdvisor.
kube_container_cpu_cfs_throttled_periods	counter		node, namespace, pod, container	Number of throttled period intervals of the container. This is acquired using the cAdvisor.
kube_container_cpu_cfs_throttled_seconds	counter	seconds	node, namespace, pod, container	Total time duration the container has been throttled. This is acquired using the cAdvisor.
kube_container_cpu_core_usage	gauge	nanocores	node, namespace, pod, container	CPU usage in nanocores (sum of all cores).
kube_container_cpu_limit	gauge	millicores	node, namespace, pod, container	CPU resource limit. See Kubernetes documentation for resource configuration details.
kube_container_cpu_limit_usage	gauge	%	node, namespace, pod, container	Percentage used of the configured CPU resource limit. See Kubernetes documentation for resource configuration details.
kube_container_cpu_request	gauge	millicores	node, namespace, pod, container	CPU resource request. See Kubernetes documentation for resource configuration details.
kube_container_cpu_request_usage	gauge	%	node, namespace, pod, container	Percentage of the configured CPU resource request. See Kubernetes documentation for resource configuration details.
kube_container_cpu_usage	gauge	%	node, namespace, pod, container	Percentage of CPU usage from allocatable CPU cores of the node.
kube_container_cpu_usage_time	counter	nanoseconds	node, namespace, pod, container	CPU usage in time (sum of all cores).
kube_container_fs_free	gauge	bytes	node, namespace, pod, container, volume	Number of bytes free.
kube_container_fs_inodes_free	gauge		node, namespace, pod, container, volume	Number of free inodes.
kube_container_fs_inodes_used	gauge		node, namespace, pod, container, volume	Number of used inodes in the filesystem. Total number of inodes may not equal `kube_container_fs_inodes_free +` `kube_container_fs_inodes_used` because this filesystem may share inodes with other filesystems. For `rootfs`, this is the number of inodes used only by that container and does not count inodes used by other containers.
kube_container_fs_size	gauge	bytes	node, namespace, pod, container, volume	Size of the filesystem.
kube_container_fs_usage	gauge	%	node, namespace, pod, container, volume	Percentage of the filesystem used. The percentage is calculated by dividing `kube_container_fs_used` by `kube_container_fs_size`. Possible values can be any number between `0` and `100`.
kube_container_fs_used	gauge	bytes	node, namespace, pod, container, volume	Number of bytes used. For `rootfs` volume reports, this is the number of bytes used for the container write layer; see Docker documentation. For `logs`, this is the number of bytes used for the container logs. For example, `sudo ls -l --block-size=1 /var/lib/docker /containers/<container_id>/`, and then get the total.
kube_container_mem_free	gauge	bytes	node, namespace, pod, container	Available memory for use.
kube_container_mem_limit	gauge	bytes	node, namespace, pod, container	Memory resource limit. See Kubernetes documentation for resource configuration details.
kube_container_mem_limit_usage	gauge	%	node, namespace, pod, container	Percentage used of the configured memory resource limit. See Kubernetes documentation for resource configuration details.
kube_container_mem_request	gauge	bytes	node, namespace, pod, container	Memory resource request. See Kubernetes documentation for resource configuration details.
kube_container_mem_request_usage	gauge	%	node, namespace, pod, container	Percentage used of the configured memory resource request. See Kubernetes documentation for resource configuration details.
kube_container_mem_rss	gauge	bytes	node, namespace, pod, container	Resident set size (RSS) memory in use.
kube_container_mem_used	gauge	bytes	node, namespace, pod, container	Memory in use.
kube_container_mem_working_set	gauge	bytes	node, namespace, pod, container	Working set memory in use.
kube_container_oom_events	counter		node, namespace, pod, container	Count of out of memory events observed in the container. This is acquired using the cAdvisor.
kube_container_status	status		node, namespace, pod, container, volume	Current state of the container. Values: `Running`, `Terminated`, `Waiting`, `Unknown`

ResourceQuota metrics Copied

Metric	Type	Unit	Dimensions	Description
kube_resource_quota_hard	gauge	millicores/bytes/none	namespace, quota, resource	Configured hard limit.
kube_resource_quota_used	gauge	millicores/bytes/none	namespace, quota, resource	Quota used amount.
kube_resource_quota_used_percent	gauge	%	namespace, quota, resource	Quota used percent.

Workload/Deployment metrics Copied

Metric	Type	Dimensions	Description
kube_deployment_spec_replicas	gauge	namespace, deployment	Number of desired pods.
kube_deployment_status_replicas	gauge	namespace, deployment	Total number of non-terminated pods targeted by the deployment.
kube_deployment_status_replicas_ready	gauge	namespace, deployment	Total number of ready pods targeted by the deployment.
kube_deployment_status_replicas_available	gauge	namespace, deployment	Total number of available pods, which are ready for at least `minReadySeconds`, targeted by the deployment.
kube_deployment_status_replicas_unavailable	gauge	namespace, deployment	Total number of unavailable pods targeted by the deployment. This is the required total number of pods for the deployment to have 100% available capacity. The pods may either be running but not yet available or have not been created yet.
kube_deployment_status_condition	status	namespace, deployment	Describes the current state of the deployment.

Workload/DaemonSet metrics Copied

Metric	Type	Dimensions	Description
kube_daemonset_status_number_available	gauge	namespace, daemonset	Number of nodes that are expected to run the daemon pod and have one or more running and available daemon pods.
kube_daemonset_status_number_unavailable	gauge	namespace, daemonset	Number of nodes that are expected to run the daemon pod but not having running and available daemon pods.
kube_daemonset_status_current_number_scheduled	gauge	namespace, daemonset	Number of nodes that are expected to run the daemon pod and have at least one running daemon pod.
kube_daemonset_status_desired_number_scheduled	gauge	namespace, daemonset	Total number of nodes expected to run the daemon pod.
kube_daemonset_status_number_misscheduled	gauge	namespace, daemonset	Number of nodes that are not expected to run the daemon pod but having a running daemon pod.
kube_daemonset_status_number_ready	gauge	namespace, daemonset	Number of nodes that are expected to run the daemon pod and have one or more running and ready daemon pods.
kube_daemonset_status_condition	status	namespace, daemonset	Describes the current state of the DaemonSet.

Workload/ReplicaSet metrics Copied

Metric	Type	Dimensions	Description
kube_replicaset_spec_replicas	gauge	namespace, replicaset_name	Number of desired replicas.
kube_replicaset_status	gauge	namespace, replicaset_name	Number of desired most recently observed replicas.
kube_replicaset_status_replicas_available	gauge	namespace, replicaset_name	Number of available replicas, which are ready for at least `minReadySeconds`, in the replica set.
kube_replicaset_status_replicas_ready	gauge	namespace, replicaset_name	Number of ready replicas for this replica set.
kube_replicaset_status_condition	attribute	namespace, replicaset_name	Describes the current state of the replica set.

Workload/StatefulSet metrics Copied

Metric	Type	Dimensions	Description
kube_statefulset_spec_replicas	gauge	namespace, statefulset	Desired number of replicas for the given template.
kube_statefulset_status_replicas_available	gauge	namespace, statefulset	Number of pods created by the StatefulSet controller.
kube_statefulset_status_replicas_current	gauge	namespace, statefulset	Number of pods created by the StatefulSet controller from the StatefulSet version indicated by currentRevision.
kube_statefulset_status_replicas_ready	gauge	namespace, statefulset	Number of pods created by the StatefulSet controller that have a `Ready` condition.
kube_statefulset_status_condition	status	namespace, statefulset	Describes the current state of the stateful set.

Workload/Job metrics Copied

Metric	Type	Unit	Dimensions	Description
kube_job_spec_completions	gauge		namespace, job	Desired number of successfully finished pods that should run with the job.
kube_job_spec_parallelism	gauge		namespace, job	Maximum desired number of pods that should run with the job at any given time.
kube_job_status_active	gauge		namespace, job	Number of actively running pods.
kube_job_status_succeeded	gauge		namespace, job	Number of successful pods.
kube_job_status_failed	gauge		namespace, job	Number of failed pods.
kube_job_status_start_time	gauge	epoch_milliseconds	namespace, job	Time when the job was acknowledged by the job controller.
kube_job_status_completion_time	gauge	epoch_milliseconds	namespace, job	Time when the job was completed.
kube_job_status_condition	status		namespace, job	Describes the current state of the job.

Kubernetes log rotation Copied

This table lists the supported options of the log collector rotation schemes:

Log rotation scheme	Description
Log rotation for containers that implement CRI	Supported
Logrotate create mode	Supported
Logrotate copy mode	Not supported
Logrotate copytruncate mode	Not supported
Collecting from compressed log files	Not supported

Previous article Next article

Kubernetes

Overview Copied

Prerequisites Copied

Permissions Copied

Configuration reference Copied

Log collection Copied

Label Selector configuration Copied

Equality-based requirement Copied

Set-based requirement Copied

Collection of Kubernetes object labels Copied

Load an include file Copied

Collected metrics Copied

Namespace metrics Copied

Node metrics Copied

Pod metrics Copied

Container metrics Copied

ResourceQuota metrics Copied

Workload/Deployment metrics Copied

Workload/DaemonSet metrics Copied

Workload/ReplicaSet metrics Copied

Workload/StatefulSet metrics Copied

Workload/Job metrics Copied

Kubernetes log rotation Copied

Was this topic helpful?

Your thoughts...

How can we improve this topic?

Your thoughts...

Thank you for your feedback!