Geneos ["Geneos"]
["Geneos > Netprobe"]["Technical Reference"]

Kubernetes

Overview

The Kubernetes Collection Agent plugin collects logs, metrics, and events from OpenShift and Kubernetes.

Prerequisites

The Kubernetes Collection Agent plugin requires the following versions of Geneos components.

  • Gateway and Netprobe 5.1.x or higher. If you are using a Netprobe 5.2.x or higher (contains Collection Agent 2.1.0 or higher) when using this plugin, then you must upgrade to Gateway 5.2.x or higher.

  • Collection Agent 2.2.x or higher.

For more information about installing Collection Agent, see Collection Agent setup.

Note: This plugin also requires an additional licence to use. Please contact your ITRS Account Manager or ITRS Sales.

Permissions

The Kubernetes plugin requires the following permissions:

  • Access to the Kubernetes API with permission to read pods and watch events in specific or all namespaces.
  • Read-only volume mounts for the following host directories:
    • /var/log/containers
    • /var/log/pods
    • /var/lib/docker/containers
  • If disk persistence is enabled, a read and write persistent volume is required. You can configure the required size for this volume.
  • In OpenShift, the Collection Agent container must run in privileged mode in order to use HostPorts and to access the host volume mounts.

Configuration reference

Below is an example YAML file which may require some changes for your project’s configuration:

Source: https://github.com/ITRS-Group/kubernetes-plugin/tree/support/2.x

collectors:
- type: plugin
  name: kube-metrics
  class-name: KubernetesMetricsCollector
  
  # Restrict collection to specific namespaces.  If undefined, all namespaces are collected.
  # This setting can be defined here (in which case it applies to both events and metrics), and also
  # can be defined under the events and metrics sections.  If namespaces are defined in both,
  # the effective value is the union of both settings.
  namespaces:
  - geneos
  
  # Whether to collect metrics/events for nodes and other non-namespaced resources. Defaults to false.
  exclude-non-namespaced: false
      
  # Events module configuration
  events:
  
    # Whether events collection is enabled.  Defaults to true.
    enabled: true

    # Restrict collection to specific namespaces.  If undefined, all namespaces are collected.
    # If namespaces are listed here and above, the effective value is the union of both settings.
    namespaces:
    - ns1
    
    # Name of the data point.  Default value shown.
    data-point-name: kubernetes_event
  
  # Metrics module configuration
  metrics:
  
    # Whether metrics collection is enabled.  Defaults to true.
    enabled: true
    
    # Number of milliseconds between reporting intervals.  Default value shown.
    reporting-interval: 10000

    # Restrict collection to specific namespaces.  If undefined, all namespaces are collected.
    # If namespaces are listed here and above, the effective value is the union of both settings.
    namespaces:
    - ns2
    
  - type: plugin
    name: kube-logs
    class-name: KubernetesLogCollector
    
    # Container log directory.
    # Required.  On a Kubernetes or OpenShift node, logs are usually in /var/log/containers.
    log-directory: /var/log/containers
    
    # Directory where the collector will save position files for each container log.
    # Required.  Must have read/write privileges to this directory.
    persistence-directory: /var/lib/itrs/collection-agent/log-collector
    
    # Whether to read newly discovered log files from the beginning of the file.
    # If false, only lines written to the log after the collector starts will be read.
    # Defaults to false.
    read-from-beginning: false
    
    # Number of worker threads (i.e. concurrent log readers).  Increasing this may improve 
    # performance, especially if there are several very active log files.
    # Default value shown.
    worker-threads: 5
    
    # Number of milliseconds to wait before pausing a worker that is blocking other workers from running.
    # Default value shown.
    long-running-worker-threshold: 30000

    # Number of milliseconds between log processing intervals, i.e. how long to wait before checking
    # if a log has new data to read. 
    # Default value shown.
    processing-interval: 5000
    
    # Glob patterns to include specific logs in log-directory for processing.
    # Defaults to undefined (include all).
    includes:
    - "*namespace1*"
    - "*namespace2*"
  
    # Glob patterns to exclude specific logs in log-directory from processing.
    # If a log file matches both an include and exclude, the exclusion will take precedence.
    # Defaults to undefined (no exclusions).
    excludes:
    - "*namespace3*"  

workflow:
  common:
    processors:
    - type: plugin
      name: kube-enricher
      class-name: KubernetesEnricher
      
      # Name of the enriched dimension that represents the application name.  Default value shown.
      app-dimension: kubernetes_app_name

Docker logging configuration

Log collection is supported only when using Docker with the json-file driver. For example /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "5"
  }
}

It is important to set max-size large enough so that the logs are not rotated too often and too quickly which may cause the collector to miss data. This is critical if there are applications in the cluster that log at a high frequency.

Load an include file

Load an include file

Example include files are provided for Gateway configuration. To load an include file into the Gateway Setup Editor:

  1. Open the Gateway Setup Editor.
  2. In the Navigation panel, click Includes to create a new file.
  3. Enter the location of the file to include in the Location field.
  4. Update the Priority field. This can be any value except 1. If you input a priority of 1, the Gateway Setup Editor returns an error.
  5. Expand the file location in the Include section.
  6. Select Click to load.
  7. Click Yes to load the new include file and save your setup.

A sample include file for the Kubernetes Collection Agent plugin is provided in the downloaded binaries at /include/kubernetes.xml.

Collected metrics

All the metrics are collected from the Summary API of each node.

Namespace metrics

Metric Type Unit Dimensions Description
kubernetes_namespace_status attribute   namespace Describes the current state of the namespace. Possible values are Active and Terminating.
         

Node metrics

Metric Type Unit Dimensions Description
kubernetes_node_num_cores attribute cores node_name Number of allocatable CPU cores on a node.
kubernetes_node_conditions attribute   node_name Comma-delimited list of conditions of the node. Possible conditions are Ready, DiskPressure, MemoryPressure, PIDPressure, and NetworkUnavailable.
kubernetes_node_cpu_capacity attribute cores node_name Number of CPU cores on a node.
kubernetes_node_cpu_usage gauge nanocores node_name Average of total CPU usage (sum of all cores).
kubernetes_node_cpu_percentage gauge % node_name Percentage of CPU usage from allocatable CPU cores of the node.
kubernetes_node_cpu_time counter nanoseconds node_name Cumulative CPU usage (sum of all cores).
kubernetes_node_kubelet_version attribute   node_name Version of kubelet.
kubernetes_node_kubeproxy_version attribute   node_name Version of kube-proxy.
kubernetes_node_memory_capacity gauge bytes node Bytes of memory on a node.
kubernetes_node_memory_allocatable gauge bytes node Bytes of allocatable memory on a node.
kubernetes_node_memory_usage gauge bytes node_name Total memory in use.
kubernetes_node_memory_available gauge bytes node_name Available memory for use.
kubernetes_node_network_rx_bytes counter bytes node_name, interface_name Cumulative count of bytes received.
kubernetes_node_network_rx_errors counter   node_name, interface_name Cumulative count of receive errors encountered.
kubernetes_node_network_tx_bytes counter bytes node_name, interface_name Cumulative count of bytes transmitted.
kubernetes_node_network_tx_errors counter   node_name, interface_name Cumulative count of transmit errors encountered.
kubernetes_node_fs_capacity gauge bytes node_name, volume_name Total capacity of the filesystem underlying storage.
kubernetes_node_fs_available gauge bytes node_name, volume_name Remaining storage space available for the filesystem.
kubernetes_node_fs_used gauge bytes node_name, volume_name Storage space used on the filesystem.
kubernetes_node_fs_inodes_free gauge   node_name, volume_name Number of free inodes in the filesystem.
kubernetes_node_fs_inodes_used gauge   node_name, volume_name Number of used inodes by the filesystem. Total number of inodes may not equal kubernetes_node_fs_inodes_free + kubernetes_node_fs_inodes_used because this filesystem may share inodes with other filesystems.
kubernetes_node_taints attribute   node_name Comma-delimited list of taints. Taints are described in key=<value>:effect format.
         

Note: Filesystem metrics for a node represent the root filesystem whose volume_name dimension is fs by default.

Pod metrics

Pods filesystem metrics come from different dimensions:

  • ephemeral-storage — reports the total filesystem usage for the containers and emptyDir-backed volumes in the measured Pod.

  • Volumes — stats pertaining to volume usage of filesystem resources, whose dimension is the volume_name.

Metric Type Unit Dimensions Description
kubernetes_pod_containers_ready gauge   node_name, pod_name, namespace Number of ready containers.
kubernetes_pod_containers_running gauge   node_name, pod_name, namespace Number of available containers.
kubernetes_pod_containers_terminated gauge   node_name, pod_name, namespace Number of terminated containers.
kubernetes_pod_containers_waiting gauge   node_name, pod_name, namespace Number of waiting containers.
kubernetes_pod_cpu_usage gauge nanocores node_name, pod_name, namespace Average of total CPU usage (sum of all cores).
kubernetes_pod_cpu_percentage gauge % node_name, pod_name, namespace Percentage of CPU usage from allocatable CPU cores of the node.
kubernetes_pod_cpu_time counter nanoseconds node_name, pod_name, namespace Cumulative CPU usage (sum of all cores).
kubernetes_pod_created attribute   node_name, pod_name, namespace Pod creation timestamp.
kubernetes_pod_memory_usage gauge bytes node_name, pod_name, namespace Total memory in use.
kubernetes_pod_memory_available gauge bytes node_name, pod_name, namespace Available memory for use.
kubernetes_pod_network_rx_bytes counter bytes node_name, pod_name, namespace, interface_name Cumulative count of bytes received.
kubernetes_pod_network_rx_errors counter   node_name, pod_name, namespace, interface_name Cumulative count of receive errors encountered.
kubernetes_pod_network_tx_bytes counter bytes node_name, pod_name, namespace, interface_name Cumulative count of bytes transmitted.
kubernetes_pod_network_tx_errors counter   node_name, pod_name, namespace, interface_name Cumulative count of transmit errors encountered.
kubernetes_pod_fs_capacity gauge bytes node_name, pod_name, namespace, interface_name Total capacity of the filesystem underlying storage.
kubernetes_pod_fs_available gauge bytes node_name, pod_name, namespace, interface_name Remaining storage space available for the filesystem.
kubernetes_pod_fs_used gauge bytes node_name, pod_name, namespace, interface_name

Storage space used for a specific task on the filesystem. This may differ from the total bytes used on the filesystem and may not equal [CapacityBytes - AvailableBytes].

For ephemeral-storage volume, this is the sum of kube_container_fs_used from every container rootfs and logs storage plus the sum of kube_pod_fs_used for every volume of type emptyDir. For other volume types, it represents used bytes on the corresponding volume. See PodStats documentation.

kubernetes_pod_fs_inodes_free gauge   node_name, pod_name, namespace, volume_name Number of free inodes in the filesystem.
kubernetes_pod_fs_inodes_used gauge   node_name, pod_name, namespace, volume_name

Number of used inodes in the filesystem. Total number of inodes may not equal kubernetes_pod_fs_inodes_free + kubernetes_pod_fs_inodes_used because this filesystem may share inodes with other filesystems.

For ephemeral-storage volume, it reports the sum of kubernetes_container_fs_inodes_used for every container rootfs volume in the current pod.

kubernetes_pod_status gauge   node_name, pod_name, namespace

Status of the pod's deployment.

Values: Pending, Running, Succeeded, Failed, Unknown, Deleted

Deleted is a phase that this plugin uses to report that a pod has been successfully deleted. It is not used by the Kubernetes Summary API.

kubernetes_pod_status_condition attribute   node_name, pod_name, namespace Latest status condition of the pod. Possible values are PodScheduled, ContainersReady, Initialized, and Ready.
kubernetes_pod_status_condition_reason attribute   node_name, pod_name, namespace Reason for the latest status condition of the pod.
kubernetes_pod_ip attribute   node_name, pod_name, namespace Default IP address of the pod.
         

Container metrics

Metric Type Unit Dimensions Description
kubernetes_container_cpu_usage gauge nanocores node_name, container_name, pod_name, namespace Average of total CPU usage (sum of all cores).
kubernetes_container_cpu_percentage gauge % node_name, container_name, pod_name, namespace Percentage of CPU usage from allocatable CPU cores of the node.
kubernetes_container_cpu_time counter nanoseconds node_name, container_name, pod_name, namespace Cumulative CPU usage (sum of all cores).
kubernetes_container_memory_usage gauge bytes node_name, container_name, pod_name, namespace Total memory in use.
kubernetes_container_memory_available gauge bytes node_name, container_name, pod_name, namespace Available memory for use.
kubernetes_container_fs_capacity gauge bytes node_name, container_name, pod_name, namespace, volume_name Total capacity of the filesystems underlying storage.
kubernetes_container_fs_available gauge bytes node_name, container_name, pod_name, namespace, volume_name Remaining storage space available for the filesystem.
kubernetes_container_fs_used gauge bytes node_name, container_name, pod_name, namespace, volume_name

Storage space used for a specific task on the filesystem. This may differ from the total bytes used on the filesystem and may not equal [CapacityBytes - AvailableBytes].

For rootfs volume reports, this is the number of bytes used for the container write layer; see Docker documentation. For logs, this is the number of bytes used for the container logs. For example, sudo ls -l --block-size=1 /var/lib/docker /containers/<container_id>/, and then get the total.

kubernetes_container_fs_inodes_free gauge   node_name, container_name, pod_name, namespace, volume_name Number of free inodes in the filesystem.
kubernetes_container_fs_inodes_used gauge   node_name, container_name, pod_name, namespace, volume_name

Number of used inodes in the filesystem. Total number of inodes may not equal kubernetes_container_fs_inodes_free + kubernetes_container_fs_inodes_used because this filesystem may share inodes with other filesystems.

For rootfs, this is the number of inodes used only by that container and does not count inodes used by other containers.

kubernetes_container_status gauge   node_name, container_name, pod_name, namespace

Current state of the container.

Values: Running, Terminated, Waiting, Unknown

kubernetes_container_cpu_request gauge millicores node_name, container_name, pod_name, namespace

CPU resource request.

See Kubernetes documentation for resource configuration details.

kubernetes_container_cpu_request_percentage gauge % node_name, container_name, pod_name, namespace

Percentage of the configured CPU resource request.

See Kubernetes documentation for resource configuration details.

kubernetes_container_cpu_limit gauge millircores node_name, container_name, pod_name, namespace

CPU resource limit.

See Kubernetes documentation for resource configuration details.

kubernetes_container_cpu_limit_percentage gauge % node_name, container_name, pod_name, namespace

Percentage used of the configured CPU resource limit.

See Kubernetes documentation for resource configuration details.

kubernetes_container_memory_request gauge bytes node_name, container_name, pod_name, namespace

Memory resource request.

See Kubernetes documentation for resource configuration details.

kubernetes_container_memory_request_percentage gauge % node_name, container_name, pod_name, namespace

Percentage used of the configured memory resource request.

See Kubernetes documentation for resource configuration details.

kubernetes_container_memory_limit gauge bytes node_name, container_name, pod_name, namespace

Memory resource limit.

See Kubernetes documentation for resource configuration details.

kubernetes_container_memory_limit_percentage gauge % node_name, container_name, pod_name, namespace

Percentage used of the configured memory resource limit.

See Kubernetes documentation for resource configuration details.

         

ResourceQuota metrics

Metric Type Unit Dimensions Description
kube_resource_quota_hard gauge millicores/bytes/none namespace, quota, resource Configured hard limit
kube_resource_quota_used gauge millicores/bytes/none namespace, quota, resource Quota used amount.
kube_resource_quota_used_percent gauge % namespace, quota, resource Quota used percent.
         

Workload/Deployment metrics

Metric Type Unit Dimensions Description
kubernetes_deployment_spec_replicas gauge   namespace, deployment_name Number of desired pods.
kubernetes_deployment_status_replicas gauge   namespace, deployment_name Total number of non-terminated pods targeted by the deployment.
kubernetes_deployment_status_replicas_ready gauge   namespace, deployment_name Total number of ready pods targeted by the deployment.
kubernetes_deployment_status_replicas_available gauge   namespace, deployment_name Total number of available pods, which are ready for at least minReadySeconds, targeted by the deployment.
kubernetes_deployment_status_replicas_unavailable gauge   namespace, deployment_name Total number of unavailable pods targeted by the deployment. This is the required total number of pods for the deployment to have 100% available capacity. The pods may either be running but not yet available or have not been created yet.
kubernetes_deployment_status_condition attribute   namespace, deployment_name Describes the current state of the deployment.
         

Workload/DaemonSet metrics

Metric Type Unit Dimensions Description
kubernetes_daemonset_status_number_available gauge   namespace, daemonset_name Number of nodes that are expected to run the daemon pod and have one or more running and available daemon pods.
kubernetes_daemonset_status_number_unavailable gauge   namespace, daemonset_name Number of nodes that are expected to run the daemon pod but not having running and available daemon pods.
kubernetes_daemonset_status_current_number_scheduled gauge   namespace, daemonset_name Number of nodes that are expected to run the daemon pod and have at least 1 running daemon pod.
kubernetes_daemonset_status_desired_number_scheduled gauge   namespace, daemonset_name Total number of nodes expected to run the daemon pod.
kubernetes_daemonset_status_number_misscheduled gauge   namespace, daemonset_name Number of nodes that are not expected to run the daemon pod but having a running daemon pod.
kubernetes_daemonset_status_number_ready gauge   namespace, daemonset_name Number of nodes that are expected to run the daemon pod and have one or more running and ready daemon pods.
kubernetes_daemonset_status_condition attribute   namespace, daemonset_name Describes the current state of the DaemonSet.
         

Workload/ReplicaSet metrics

Metric Type Unit Dimensions Description
kubernetes_replicaset_spec_replicas gauge   namespace, replicaset_name Number of desired replicas.
kubernetes_replicaset_status gauge   namespace, replicaset_name Number of desired most recently observed replicas.
kubernetes_replicaset_status_replicas_available gauge   namespace, replicaset_name Number of available replicas, which are ready for at least minReadySeconds, in the replica set.
kubernetes_replicaset_status_replicas_ready gauge   namespace, replicaset_name Number of ready replicas for this replica set.
kubernetes_replicaset_status_condition attribute   namespace, replicaset_name Describes the current state of the replica set.
         

Workload/StatefulSet metrics

Metric Type Unit Dimensions Description
kubernetes_statefulset_spec_replicas gauge   namespace, statefulset_name Desired number of replicas for the given Template.
kubernetes_statefulset_status_replicas_available gauge   namespace, statefulset_name Number of pods created by the StatefulSet controller.
kubernetes_statefulset_status_replicas_current gauge   namespace, statefulset_name Number of pods created by the StatefulSet controller from the StatefulSet version indicated by currentRevision.
kubernetes_statefulset_status_replicas_ready gauge   namespace, statefulset_name Number of Pods created by the StatefulSet controller that have a Ready Condition.
kubernetes_statefulset_status_condition attribute   namespace, statefulset_name Describes the current state of the stateful set.
         

Workload/Job metrics

Metric Type Unit Dimensions Description
kubernetes_job_spec_completions gauge   namespace, job_name Desired number of successfully finished pods that should run with the job.
kubernetes_job_spec_parallelism gauge   namespace, job_name Maximum desired number of pods that should run with the job at any given time.
kubernetes_job_status_active gauge   namespace, job_name Number of actively running pods.
kubernetes_job_status_succeeded gauge   namespace, job_name Number of successful pods.
kubernetes_job_status_failed gauge   namespace, job_name Number of failed pods.
kubernetes_job_status_start_time attribute   namespace, job_name Time when the job was acknowledged by the job controller.
kubernetes_job_status_completion_time attribute   namespace, job_name Time when the job was completed.
kubernetes_job_status_condition attribute   namespace, job_name Describes the current state of the job.
         

Kubernetes log rotation

This table lists the supported options of the log collector rotation schemes:

Log rotation scheme Description
DockerJSON driver

Supported

  • For OpenShift, see the official documentation to find out how to configure logs.
  • For Kubernetes, configure log rotation in /etc/docker/daemon.json:
    {
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "100m",
        "max-file": "10"
      }
    }
    
Logrotate create mode Supported
Logrotate copy mode Not supported
Logrotate copytruncate mode Not supported
Collecting from compressed log files Not supported