Geneos ["Geneos"]
You are currently viewing an older version of the documentation. You can find the latest documentation here.
["Geneos > Netprobe"]["User Guide"]

Troubleshooting in an orchestrated environment

Overview

This guide is intended to help you troubleshoot your Netprobe for Orchestrated Environments instances.

Upgrading Collection Agent

In orchestrated environments you must deploy both Netprobes and Collection Agents to monitor the environment. You should update the environment to use the latest version of each to achieve the best results.

The format used for workflow stores has changed from Collection Agent version 2.0.0 onwards and workflow stores created by Collection Agent version 1.0.0 are no longer compatible.

As a result additional steps must be taken when upgrading a monitoring environment using Collection Agent version 1.0.0.

To perform an upgrade:

  1. Remove all Netprobe for Orchestrated Environments DaemonSets.
  2. Remove the Kubernetes metrics deployment.
  3. On each node delete or archive the existing workflow stores created by the Collection Agent:
    cd /var/lib/geneos/collection-agent
    mv Workflow Workflow.old #Alternatively use the rm command to delete the Workflow
  4. Upgrade any connected Gateways. For more information about upgrading Gateways, see Gateway Installation Guide.
  5. Perform the standard installation procedures. For more information about installing Netprobe for Orchestrated Environments, see Installation in a Kubernetes or OpenShift environment.

Note: You can retain continuity in logging performed by the KubernetesLogCollector by specifying the same log directories in the new ConfigMaps as the previous installation.

Procedures

Datagrams sent from the applications do not reach Collection Agent after it is restarted

When Collection Agent is restarted, the StatsD data published from the instrumented applications that were already running before the restart may no longer reach the Collection Agent. The data sent from the applications that were started after the restart work fine.

To solve this issue, do either of the following:

  • Manually flush the connection tracking (conntrack) cache after restarting the Netprobe for Orchestrated Environments pod:
    conntrack -D -p udp --dport 8125
  • Use StatsD over TCP instead of UDP. This requires:
    • StatsD plug-in 1.1.0
    • StatsD client Java library 1.1.0
    • StatsD client Python library 1.1.0

This only occurs in certain versions of Kubernetes and OpenShift. For more information about the versions that exhibit this behaviour, see the GitHub documentation and Red Hat documentation.