Kubernetes

Overview Copied

The Kubernetes Collection Agent plugin collects logs, metrics, and events from OpenShift and Kubernetes.

Prerequisites Copied

The Kubernetes Collection Agent plugin requires the following versions of Geneos components.

For more information about installing Collection Agent, see Collection Agent setup.

Note

This plugin also requires an additional licence to use. Please contact your ITRS Account Manager or ITRS Sales.

Permissions Copied

The Kubernetes plugin requires the following permissions:

Configuration reference Copied

Below is an example YAML file which may require some changes for your project’s configuration:

collectors:
- type: plugin
  name: kube-metrics
  className: KubernetesMetricsCollector

  # The namespaces and namespaceSelectors settings restrict the collection by namespace. 
  # If both are undefined, all namespaces are collected. If both are defined, 
  # namespaces will have a higher priority, and namespaceSelectors will be ignored. 
  # These settings can be defined here (which applies to both events and metrics), 
  # or under the events and metrics sections separately. If defined in both, 
  # the effective value is the union of both settings.

  # Restrict collection to specific namespaces.
  namespaces:
  - geneos       

  # Restrict collection to filtered namespaces based on label selectors.
  # In the case of multiple label selectors, a logical AND will be used to combine them.
  namespaceSelectors:
  - purpose=Production
  - department in (Engineering)

  # Whether to collect metrics/events for nodes and other non-namespaced resources. Defaults to false.
  excludeNonNamespaced: false

  # Events module configuration
  events:
  
    # Whether events collection is enabled.  Defaults to true.
    enabled: true

    # The namespaces and namespaceSelectors settings restrict the collection by namespace. 
    # If both are undefined, all namespaces are collected. If both are defined, 
    # namespaces will have a higher priority, and namespaceSelectors will be ignored. 
    # If values are listed here and above, the effective value is the union of both settings.
    
    # Restrict collection to specific namespaces.
    namespaces:
    - ns1

    # Restrict collection to filtered namespaces based on label selectors.
    # In the case of multiple label selectors, a logical AND will be used to combine them.
    namespaceSelectors:
    - purpose=Events

    # Name of the data point.  Default value shown.
    dataPointName: kubernetes_event
  
  # Metrics module configuration
  metrics:
  
    # Whether metrics collection is enabled.  Defaults to true.
    enabled: true
    
    # Number of milliseconds between reporting intervals.  Default value shown.
    reportingInterval: 10000

    # The namespaces and namespaceSelectors settings restrict the collection by namespace. 
    # If both are undefined, all namespaces are collected. If both are defined, 
    # namespaces will have a higher priority, and namespaceSelectors will be ignored. 
    # If values are listed here and above, the effective value is the union of both settings.

    # Restrict collection to specific namespaces.
    namespaces:
    - ns2

    # Restrict collection to filtered namespaces based on label selectors.
    # In the case of multiple label selectors, a logical AND will be used to combine them.
    namespaceSelectors:
    - purpose=Metrics

- type: plugin
  name: kube-logs
  className: KubernetesLogCollector
  
  # Container log directory.
  # Required.  On a Kubernetes or OpenShift node, logs are usually in /var/log/containers.
  logDirectory: /var/log/containers
  
  # Directory where the collector will save position files for each container log.
  # Required.  Must have read/write privileges to this directory.
  persistenceDirectory: /var/lib/itrs/collection-agent/log-collector
  
  # Whether to read newly discovered log files from the beginning of the file.
  # If false, only lines written to the log after the collector starts will be read.
  # Defaults to false.
  readFromBeginning: false
  
  # Number of worker threads (i.e. concurrent log readers).  Increasing this may improve 
  # performance, especially if there are several very active log files.
  # Default value shown.
  workerThreads: 5
  
  # Number of milliseconds to wait before pausing a worker that is blocking other workers from running.
  # Default value shown.
  longRunningWorkerThreshold: 30000

  # Number of milliseconds between log processing intervals, i.e. how long to wait before checking
  # if a log has new data to read. 
  # Default value shown.
  processingInterval: 5000

  # The namespaces and namespaceSelectors settings restrict the collection by namespace. 
  # If both are undefined, all namespaces are collected. If both are defined, 
  # namespaces will have a higher priority, and namespaceSelectors will be ignored. 
  
  # Restrict log collection to specific namespaces. Defaults to all namespaces.
  namespaces:
  - ns1
  - ns2

  # Restrict collection to filtered namespaces based on label selectors.
  # In the case of multiple label selectors, a logical AND will be used to combine them.
  namespaceSelectors:
  - purpose=Production
  - department in (Engineering)

Docker logging configuration Copied

Log collection is supported only when using Docker with the json-file driver. For example /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "5"
  }
}

It is important to set max-size large enough so that the logs are not rotated too often and too quickly which may cause the collector to miss data. This is critical if there are applications in the cluster that log at a high frequency.

Label Selector configuration Copied

The namespaceSelectors setting follows the Label Selector that is described in the Kubernetes Documentation.

Additionally, this setting supports both the Equality-based and Set-based requirements.

Equality-based requirement Copied

namespaceSelectors:
- environment = production
- tier != frontend

Set-based requirement Copied

namespaceSelectors:
- environment in (production, qa)
- tier notin (frontend, backend)
- partition
- !partition

Collection of Kubernetes object labels Copied

The labels for all monitored objects are collected as Attribute data points. The dimensions of an attribute will correspond exactly to the dimensions of the monitored object.

Additionally, an attribute indicating the object kind is also published for each object: kubernetes.itrsgroup.com/kind = [Node|Pod|etc...]

All attributes are sent periodically, 30 seconds after startup, then every 5 minutes.

Load an include file Copied

A sample kubernetes_mapping.xml include file for the Kubernetes Collection Agent plugin is provided in /templates directory of the downloaded Gateway binaries. To load an include file into the Gateway Setup Editor:

  1. Open the Gateway Setup Editor.
  2. In the Navigation panel, click Includes to create a new file.
  3. Enter the location of the file to include in the Location field.
  4. Update the Priority field. This can be any value except 1. If you input a priority of 1, the returns an error.
  5. Expand the file location in the Include section.
  6. Select Click to load.
  7. Click Yes to load the new include file and save your setup.

Collected metrics Copied

All metrics are collected from the Summary API and cAdvisor of each node.

Note

Certain container and pod metrics collected from cAdvisor will subsequently be moved to CRI metric collection or potentially deprecated. For more information, see Kubernetes enhancements.

Namespace metrics Copied

Metric Type Unit Dimensions Description
kube_namespace_status status namespace Describes the current state of the namespace. Possible values are Active and Terminating.

Node metrics Copied

Metric Type Unit Dimensions Description
kube_node_conditions status node Comma-delimited list of conditions of the node. Possible conditions are Ready, DiskPressure,MemoryPressure,PIDPressure, and NetworkUnavailable.
kube_node_cpu_capacity gauge millicores node Number of CPU cores on a node.
kube_node_cpu_allocatable gauge millicores node Number of allocatable CPU cores on a node.
kube_node_cpu_usage gauge % node Percentage of CPU usage from allocatable CPU cores of the node.
kube_node_cpu_core_usage counter nanocores node CPU usage in nanocores (sum of all cores).
kube_node_cpu_usage_time counter nanoseconds node CPU usage in time (sum of all cores).
kube_node_kubelet_version attribute node Version of kubelet.
kube_node_kubeproxy_version attribute node Version of kube-proxy.
kube_node_mem_capacity gauge bytes node Bytes of memory on a node.
kube_node_mem_allocatable gauge bytes node Bytes of allocatable memory on a node.
kube_node_mem_used gauge bytes node Total memory in use.
kube_node_memory_free gauge bytes node Available memory for use.
kube_node_net_rx counter bytes node, interface Windowed count of bytes received since last sample.
kube_node_net_rx_rate counter bytes/sec node, interface Windowed rate of bytes received since last sample.
kube_node_net_rx_errors counter node, interface Windowed count of errors received since the last sample.
kube_node_net_rx_error_rate gauge per sec node, interface Windowed rate of errors received since the last sample.
kube_node_net_tx counter bytes node, interface Windowed count of bytes sent since last sample.
kube_node_net_tx_rate gauge bytes/sec node, interface Windowed rate of bytes sent since last sample.
kube_node_net_tx_errors counter node, interface Windowed count of errors sent since last sample.
kube_node_net_tx_error_rate gauge per sec node, interface Windowed rate of errors sent since last sample.
kube_node_fs_size gauge bytes node, volume Size of the filesystem
kube_node_fs_used gauge bytes node, volume Number of bytes used.
kube_node_fs_usage gauge % node, volume

Percentage of the filesystem used.

The percentage is calculated by dividing kube_node_fs_used by kube_node_fs_size. Possible values can be any number between 0 and 100.

kube_node_fs_free gauge bytes node, volume Number of bytes free.
kube_node_fs_inodes_used gauge node, volume

Number of used inodes by the filesystem. Total number of inodes may not equal

kube_node_fs_inodes_free + kube_node_fs_inodes_used because this filesystem may share inodes with other filesystems.

kube_node_fs_inodes_free gauge node, volume Number of free inodes.
kube_node_taints attribute node Comma-delimited list of taints. Taints are described in key=<value>:effect format.

Note

Filesystem metrics for a node represent the root filesystem whose volume_name dimension is fs by default.

Pod metrics Copied

Pods filesystem metrics come from different dimensions:

Metric Type Unit Dimensions Description
kube_pod_containers_ready gauge node, namespace, pod Number of ready containers.
kube_pod_containers_running gauge node, namespace, pod Number of running containers.
kube_pod_containers_terminated gauge node, namespace, pod Number of terminated containers.
kube_pod_containers_waiting gauge node, namespace, pod Number of waiting containers.
kube_pod_cpu_cfs_periods counter node, namespace, pod Number of elapsed enforcement period intervals of the pod. This is acquired using the cAdvisor.
kube_pod_cpu_cfs_throttled_periods counter node, namespace, pod Number of throttled period intervals of the pod. This is acquired using the cAdvisor.
kube_pod_cpu_cfs_throttled_seconds counter seconds node, namespace, pod Total time duration the pod has been throttled. This is acquired using the cAdvisor.
kube_pod_cpu_core_usage gauge nanocores node, namespace, pod CPU usage in nanocores (sum of all cores).
kube_pod_cpu_usage gauge % node, namespace, pod Percentage of CPU usage from allocatable CPU cores of the node.
kube_pod_cpu_usage_time counter nanoseconds node, namespace, pod CPU usage in time (sum of all cores).
kube_pod_created attribute epoch_milliseconds node, namespace, pod Pod creation timestamp.
kube_pod_fs_free gauge bytes node, namespace, volume Number of bytes free.
kube_pod_fs_inodes_free gauge node, namespace, volume Number of free inodes.
kube_pod_fs_inodes_used gauge node, namespace, volume

Number of used inodes in the filesystem. Total number of inodes may not equal

kube_pod_fs_inodes_free +

kube_pod_fs_inodes_used

because this filesystem may share inodes with other filesystems.

For ephemeral-storage volume, it reports the sum of kube_container_fs_inodes_used for every container rootfs volume in the current pod.

kube_pod_fs_size gauge bytes node, namespace, volume Size of the filesystem.
kube_pod_fs_usage gauge % node, namespace, volume

Percentage of the filesystem used.

The percentage is calculated by dividing kube_pod_fs_used by kube_pod_fs_size. Possible values can be any number between 0 and 100.

kube_pod_fs_used gauge bytes node, namespace, volume

Number of bytes used.

For ephemeral-storage volume, this is the sum of kube_container_fs_used from every container rootfs and logs storage plus the sum of kube_pod_fs_used for every volume of type emptyDir. For other volume types, it represents used bytes on the corresponding volume. See PodStats documentation.

kube_pod_ip attribute node, namespace, pod Default IP address of the pod.
kube_pod_mem_free gauge bytes node, namespace, pod Available memory for use.
kube_pod_mem_used gauge bytes node, namespace, pod Memory in use.
kube_pod_net_rx counter bytes node, namespace, interface Windowed count of bytes received since last sample.
kube_pod_net_rx_errors counter node, namespace, interface Windowed count of errors received since the last sample.
kube_pod_net_rx_error_rate gauge per sec node, namespace, interface Windowed rate of errors received since the last sample.
kube_pod_net_rx_rate counter bytes/sec node, namespace, interface Windowed rate of bytes received since last sample.
kube_pod_net_tx counter bytes node, namespace, interface Windowed count of bytes sent since last sample.
kube_pod_net_tx_rate gauge bytes/sec node, namespace, interface Windowed rate of bytes sent since last sample.
kube_pod_netw_tx_error_rate gauge bytes/sec node, namespace, interface Windowed rate of errors sent since last sample.
kube_pod_netw_tx_errors counter node, namespace, interface Windowed count of errors sent since last sample.
kube_pod_oom_events counter node, namespace, pod Count of out of memory events observed in the pod. This is acquired using the cAdvisor.
kube_pod_status status node, namespace, pod

Status of the pod’s deployment.

Values: Pending, Running, Succeeded, Failed, Unknown, Deleted

Deleted is a phase that this plugin uses to report that a pod has been successfully deleted. It is not used by the Kubernetes Summary API.

kube_pod_status_condition attribute node, namespace, pod Latest status condition of the pod. Possible values are PodScheduled, ContainersReady, Initialized, and Ready.
kube_pod_status_condition_reason attribute node, namespace, pod Reason for the latest status condition of the pod.

Container metrics Copied

Metric Type Unit Dimensions Description
kube_container_cpu_cfs_periods counter node, namespace, pod, container Number of elapsed enforcement period intervals of the container. This is acquired using the cAdvisor.
kube_container_cpu_cfs_throttled_periods counter node, namespace, pod, container Number of throttled period intervals of the container. This is acquired using the cAdvisor.
kube_container_cpu_cfs_throttled_seconds counter seconds node, namespace, pod, container Total time duration the container has been throttled. This is acquired using the cAdvisor.
kube_container_cpu_core_usage gauge nanocores node, namespace, pod, container CPU usage in nanocores (sum of all cores).
kube_container_cpu_limit gauge millicores node, namespace, pod, container

CPU resource limit.

See Kubernetes documentation for resource configuration details.

kube_container_cpu_limit_usage gauge % node, namespace, pod, container

Percentage used of the configured CPU resource limit.

See Kubernetes documentation for resource configuration details.

kube_container_cpu_request gauge millicores node, namespace, pod, container

CPU resource request.

See Kubernetes documentation for resource configuration details.

kube_container_cpu_request_usage gauge % node, namespace, pod, container

Percentage of the configured CPU resource request.

See Kubernetes documentation for resource configuration details.

kube_container_cpu_usage gauge % node, namespace, pod, container Percentage of CPU usage from allocatable CPU cores of the node.
kube_container_cpu_usage_time counter nanoseconds node, namespace, pod, container CPU usage in time (sum of all cores).
kube_container_fs_free gauge bytes node, namespace, pod, container, volume Number of bytes free.
kube_container_fs_inodes_free gauge node, namespace, pod, container, volume Number of free inodes.
kube_container_fs_inodes_used gauge node, namespace, pod, container, volume

Number of used inodes in the filesystem. Total number of inodes may not equal

kube_container_fs_inodes_free + kube_container_fs_inodes_used

because this filesystem may share inodes with other filesystems.

For rootfs, this is the number of inodes used only by that container and does not count inodes used by other containers.

kube_container_fs_size gauge bytes node, namespace, pod, container, volume Size of the filesystem.
kube_container_fs_usage gauge % node, namespace, pod, container, volume

Percentage of the filesystem used.

The percentage is calculated by dividing kube_container_fs_used by kube_container_fs_size. Possible values can be any number between 0 and 100.

kube_container_fs_used gauge bytes node, namespace, pod, container, volume

Number of bytes used.

For rootfs volume reports, this is the number of bytes used for the container write layer; see Docker documentation. For logs, this is the number of bytes used for the container logs. For example, sudo ls -l --block-size=1 /var/lib/docker /containers/<container_id>/, and then get the total.

kube_container_mem_free gauge bytes node, namespace, pod, container Available memory for use.
kube_container_mem_limit gauge bytes node, namespace, pod, container

Memory resource limit.

See Kubernetes documentation for resource configuration details.

kube_container_mem_limit_usage gauge % node, namespace, pod, container

Percentage used of the configured memory resource limit.

See Kubernetes documentation for resource configuration details.

kube_container_mem_request gauge bytes node, namespace, pod, container

Memory resource request.

See Kubernetes documentation for resource configuration details.

kube_container_mem_request_usage gauge % node, namespace, pod, container

Percentage used of the configured memory resource request.

See Kubernetes documentation for resource configuration details.

kube_container_mem_rss gauge bytes node, namespace, pod, container Resident set size (RSS) memory in use.
kube_container_mem_used gauge bytes node, namespace, pod, container Memory in use.
kube_container_mem_working_set gauge bytes node, namespace, pod, container Working set memory in use.
kube_container_oom_events counter node, namespace, pod, container Count of out of memory events observed in the container. This is acquired using the cAdvisor.
kube_container_status status node, namespace, pod, container, volume

Current state of the container.

Values: Running, Terminated, Waiting, Unknown

ResourceQuota metrics Copied

Metric Type Unit Dimensions Description
kube_resource_quota_hard gauge millicores/bytes/none namespace, quota, resource Configured hard limit.
kube_resource_quota_used gauge millicores/bytes/none namespace, quota, resource Quota used amount.
kube_resource_quota_used_percent gauge % namespace, quota, resource Quota used percent.

Workload/Deployment metrics Copied

Metric Type Unit Dimensions Description
kube_deployment_spec_replicas gauge namespace, deployment Number of desired pods.
kube_deployment_status_replicas gauge namespace, deployment Total number of non-terminated pods targeted by the deployment.
kube_deployment_status_replicas_ready gauge namespace, deployment Total number of ready pods targeted by the deployment.
kube_deployment_status_replicas_available gauge namespace, deployment Total number of available pods, which are ready for at least minReadySeconds, targeted by the deployment.
kube_deployment_status_replicas_unavailable gauge namespace, deployment Total number of unavailable pods targeted by the deployment. This is the required total number of pods for the deployment to have 100% available capacity. The pods may either be running but not yet available or have not been created yet.
kube_deployment_status_condition status namespace, deployment Describes the current state of the deployment.

Workload/DaemonSet metrics Copied

Metric Type Unit Dimensions Description
kube_daemonset_status_number_available gauge namespace, daemonset Number of nodes that are expected to run the daemon pod and have one or more running and available daemon pods.
kube_daemonset_status_number_unavailable gauge namespace, daemonset Number of nodes that are expected to run the daemon pod but not having running and available daemon pods.
kube_daemonset_status_current_number_scheduled gauge namespace, daemonset Number of nodes that are expected to run the daemon pod and have at least one running daemon pod.
kube_daemonset_status_desired_number_scheduled gauge namespace, daemonset Total number of nodes expected to run the daemon pod.
kube_daemonset_status_number_misscheduled gauge namespace, daemonset Number of nodes that are not expected to run the daemon pod but having a running daemon pod.
kube_daemonset_status_number_ready gauge namespace, daemonset Number of nodes that are expected to run the daemon pod and have one or more running and ready daemon pods.
kube_daemonset_status_condition status namespace, daemonset Describes the current state of the DaemonSet.

Workload/ReplicaSet metrics Copied

Metric Type Unit Dimensions Description
kube_replicaset_spec_replicas gauge namespace, replicaset_name Number of desired replicas.
kube_replicaset_status gauge namespace, replicaset_name Number of desired most recently observed replicas.
kube_replicaset_status_replicas_available gauge namespace, replicaset_name Number of available replicas, which are ready for at least minReadySeconds, in the replica set.
kube_replicaset_status_replicas_ready gauge namespace, replicaset_name Number of ready replicas for this replica set.
kube_replicaset_status_condition attribute namespace, replicaset_name Describes the current state of the replica set.

Workload/StatefulSet metrics Copied

Metric Type Unit Dimensions Description
kube_statefulset_spec_replicas gauge namespace, statefulset Desired number of replicas for the given template.
kube_statefulset_status_replicas_available gauge namespace, statefulset Number of pods created by the StatefulSet controller.
kube_statefulset_status_replicas_current gauge namespace, statefulset Number of pods created by the StatefulSet controller from the StatefulSet version indicated by currentRevision.
kube_statefulset_status_replicas_ready gauge namespace, statefulset Number of pods created by the StatefulSet controller that have a Ready condition.
kube_statefulset_status_condition status namespace, statefulset Describes the current state of the stateful set.

Workload/Job metrics Copied

Metric Type Unit Dimensions Description
kube_job_spec_completions gauge namespace, job Desired number of successfully finished pods that should run with the job.
kube_job_spec_parallelism gauge namespace, job Maximum desired number of pods that should run with the job at any given time.
kube_job_status_active gauge namespace, job Number of actively running pods.
kube_job_status_succeeded gauge namespace, job Number of successful pods.
kube_job_status_failed gauge namespace, job Number of failed pods.
kube_job_status_start_time gauge epoch_milliseconds namespace, job Time when the job was acknowledged by the job controller.
kube_job_status_completion_time gauge epoch_milliseconds namespace, job Time when the job was completed.
kube_job_status_condition status namespace, job Describes the current state of the job.

Kubernetes log rotation Copied

This table lists the supported options of the log collector rotation schemes:

Log rotation scheme Description
DockerJSON driver

Supported

  • For OpenShift, see the official documentation to find out how to configure logs.
  • For Kubernetes, configure log rotation in /etc/docker/daemon.json:
    {
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "100m",
        "max-file": "10"
      }
    }
    
Logrotate create mode Supported
Logrotate copy mode Not supported
Logrotate copytruncate mode Not supported
Collecting from compressed log files Not supported
["Geneos"] ["Geneos > Netprobe"] ["User Guide"]

Was this topic helpful?