Geneos ["Geneos"]
["Geneos > Netprobe"]["User Guide"]

Elasticsearch

Overview

Elasticsearch monitoring is a Gateway configuration file that enables monitoring of Elasticsearch Cluster through the Toolkit plug-in.

Elasticsearch is a distributed, search, and analytics engine that is capable of scaling horizontally, allowing to add more nodes to the cluster. This means that it can search and analyze large scale of data.

The elements that make Elasticsearch work are defined as follows:

  • Node is a running instance of Elasticsearch that is capable of knowing the location of the document.
  • Cluster consists of one or more nodes with the same cluster name that can share their data and load.

Track the following key areas when using Elasticsearch monitoring:

Key Area Description
Search performance Determine how the search function perform over time by monitoring the query operations, load or latency, field data cache and evictions.
Indexing performance Each shard in the index can be updated through flush and refresh process.

Shard is a container for data that can be either a primary or a replica shard. It is how the Elasticsearch distributes data in the clusters.

  • Index refresh - creates a new in-memory segment allowing the newly indexed documents searchable.
  • Index flush - new documents are added to the in-memory buffer, the segments are committed, and the transaction log is cleared.
Cluster health and node availability Monitors the current state of all clusters and nodes.
Resource utilisation Provides information on how the thread pool queues and rejection works in monitoring the bulk, index, merge, and operations.
System and network metrics Shows information about every node in the cluster, resource and memory usage, and active connections opened over time.
   

In this Elasticsearch monitoring template, you will see these metrics in your dataview:

  • Cluster health
  • Indexing performance
  • Search performance
  • Node and resource information
  • Thread pool

Intended audience

This guide is intended for users who are setting up, configuring, troubleshooting and maintaining this integration. This is also intended for users who will be using Active Console to monitor data from Elasticsearch. Once the integration is set up, the samplers providing the dataviews become available to that Gateway.

As a user, you should be familiar with Linux or any other database, and with the administration of the Elasticsearch services.

Prerequisites

The following requirements must be met prior to the installation and setup of the template:

  • A machine running the Netprobe must have access to the host where the Elasticsearch instance is installed and the port Elasticsearch is listening to.
  • Netprobe 4.6 or higher.
  • Gateway 4.8 or higher.
  • Python 2.7 or 3.6 installation on the machine where the Netprobe resides.
  • Elasticsearch 6.1.2.

Installation procedure

Ensure that you have read and can follow the system requirements prior to installation and setup of this integration template.

  1. Download the integration package geneos-integration-elasticsearch-<version>.zip from the ITRS Downloads site.
  2. Open Gateway Setup Editor.
  3. In the Navigation panel, click Includes to create a new file.
  4. Enter the location of the file to include in the Location field. In this example, it is the include/ElasticsearchMonitoring.xml.
  5. Update the Priority field. This can be any value except 1. If you input a priority of 1, the Gateway Setup Editor returns an error.
  6. Expand the file location in the Include section.
  7. Select Click to load.
  8. Click Yes to load the new Elasticsearch include file.
  9. Click Managed entities in the Navigation panel.
  10. Add the Elasticsearch type to the Managed Entity section that you will use to monitor Elasticsearch.
  11. Click Validate current document to check your configuration.
  12. Click Save current document to apply the changes.

Set up the samplers

These are the pre-configured samplers available to use in include/ElasticsearchMonitoring.xml.

Configure the required fields by referring to the table below:

Samplers
Elasticsearch-ClusterHealth
Elasticsearch-ThreadPool
Elasticsearch-Resource
Elasticsearch-NodeInfo
Elasticsearch-SearchPerf-ByIndex
Elasticsearch-SearchPerf-ByNode
Elasticsearch-IndexingPerf-ByIndex
Elasticsearch-IndexingPerf-ByNode

Set up the variables

The include/ElasticsearchMonitoring.xml template provides the following variables that are set in the Environments section:

Variable Description
ELASTICSEARCHMON_GROUP Sampler group name.
 
Default: Elasticsearch-Monitoring
ELASTICSEARCHMON_HOST IP/Hostname of the Elasticsearch Node.
 
Default: localhost
ELASTICSEARCHMON_PORT Port assigned to the Elasticsearch HTTP service .
 
Default: 9200
ELASTICSEARCHMON_PYTHON_EXE Name of the executable script that calls the python code.
   

Set up the rules

The ElasticsearchMonitoring-SampleRules.xml template also provides a separate sample rules that you can use to configure the Gateway Setup Editor.

Your configuration rules must be set in the Includes section. In the Navigation panel, click Rules.

The table below shows the included rule setup in the configuration file:

Rules Sample Rules
Resource Elasticsearch-Diskspace
Elasticsearch-FileDesc
Elasticsearch-Cpu
ClusterHealth Elasticsearch-ClusterStatus
Indexing Elasticsearch-IndexingLatency
Elasticsearch-RefreshLatency
Elasticsearch-FlushLatency
Search Elasticsearch-QueryLatency
Elasticsearch-FetchLatency
   

Metrics and dataviews

Elasticsearch cluster health

This monitors the overall health of the cluster by indicating how it is functioning:

Column Name Description
cluster Name of the cluster.
status

Health status of the cluster:

  • Green - all primary and replica shards are active.
  • Yellow - indicates that at least one replica shard is not properly allocated or missing.
  • Red - indicates that at least one primary shard is missing that can cause data loss.
nodeTotal Total number of nodes in the cluster.
nodeData Total number of nodes in the cluster that can store data.
shardsTotal Total number of shards.
shardsInitializing Number of initialising nodes.
shardsUnassigned Number of unassigned shards.
   

Elasticsearch indexingPerf-ByIndex

This dataview monitors indexing performance by index. Data is grouped per index:

Column Name Description
index Name of the index.
indexingIndexTotal Total number of indexing operations.
indexingIndexTime Time spent in indexing.
 
Unit: millisecond (ms)
indexingIndexCurrent Number of current indexing operations.
refreshTotal Total number of refreshes.
refreshTime Time spent in refresh operations.
 
Unit: millisecond (ms)
flushTotal Total number of flushes.
flushTotalTime Time spent in flushes.
 
Unit: millisecond (ms)
averageIndexingLatency Average time spent in indexing. This is computed from indexingIndexTime / indexingIndexTotal.
 
Unit: millisecond (ms) per indexing operation
averageRefreshLatency Average time spent in refresh operations. This is computed from refreshTime / refreshTotal.
 
Unit: millisecond (ms) per refresh
averageFlushLatency Average time spent in flush operations. This is computed from flushTotalTime / flushTotal.
 
Unit: millisecond (ms) per flush
   

Elasticsearch indexingPerfp-ByNode

This monitors indexing performance by node. Data is grouped per node:

Column Name Description
nodeID Unique node ID.
name Name of the node.
indexingIndexTotal Total number of indexing operations.
indexingIndexTime Time spent in indexing.
Default: millisecond (ms)
indexingIndexCurrent Number of current indexing operations.
refreshTotal Total number of refreshes.
refreshTime Time spent in refresh operations.
 
Unit: millisecond (ms)
flushTotal Total number of flushes.
flushTotalTime Time spent in flushes.
 
Unit: millisecond (ms)
averageIndexingLatency Average time spent in indexing. This is computed from indexingIndexTime / indexingIndexTotal.
 
Unit: millisecond (ms) per indexing operation
averageRefreshLatency Average time spent in refresh operations. This is computed from refreshTime / refreshTotal.
 
Unit: millisecond (ms) per refresh
averageFlushLatency Average time spent in flush operations. This is computed from flushTotalTime / flushTotal.
 
Unit: millisecond (ms) per flush
   

Elasticsearch nodeInfo

This displays information about the nodes in the cluster:

Column Name Description
nodeID Unique node ID.
name Name of the node.
IP IP address.
port Bound transport port.
http Bound http address and port.
version Elasticsearch version.
build Elasticsearch build hash.
jdk JDK version.
nodeRole

Role of the node. This can have more than one value:

  • m - master eligible node.
  • d - data note.
  • i - ingest node.
master

Current master node in the cluster:

  • * (asterisk) - current master.
  • - (hyphen) - non-master.
   

Elasticsearch resource

This monitors the resources of each node in the cluster:

Column Name Description
nodeID Unique node ID.
name Name of the node.
cpu CPU usage in percentage (%).
heapCurrent Current heap usage.
 
Unit: bytes
heapPercent Percent used heap.
ramCurrent Current RAM usage.
 
Unit: bytes
ramPercent Percent RAM used.
diskUsed Used disk space.
 
Unit: bytes
diskAvail Available disk space.
diskUsedPercent Percent disk used.
fileDescriptorCurrent Number of used file descriptors.
fileDescriptorPercent Percent file descriptors used.
   

Elasticsearch SearchPerf-ByIndex

This monitors search performance by index. Data is grouped per index:

Column Name Description
index Name of the index.
searchQueryTotal Number of query phase operations.
searchQueryTime Time spent in query phase.
 
Default: millisecond (ms)
searchQueryCurrent Number of current query phase operations.
searchFetchTotal Number of fetch phase operations.
searchFetchTime Time spent in fetch phase.
 
Default: millisecond (ms)
searchFetchCurrent Number of current fetch phase operations.
fielddataMemory Used fielddata cache.
fielddataEvictions Used fielddata evictions.
averageQueryLatency Average time spent in query phase that is computed from searchQueryTime/searchQueryTotal.
 
Default: millisecond (ms) per query
averageFetchLatency Average time spent in fetch phase that is computed from searchFetchTime/searchFetchTotal.
 
Default: millisecond (ms) per fetch
   

Elasticsearch searchPerf-ByNode

This monitors search performance by node. Data is grouped per node:

Column Name Description
nodeID Unique node ID.
name Name assigned to the node.
searchQueryTotal Number of query phase operations.
searchQueryTime Time spent in query phase.
 
Unit: millisecond (ms)
searchQueryCurrent Number of current query phase operations.
searchFetchTotal Number of fetch phase operations.
searchFetchTime Time spent in fetch phase.
 
Unit: millisecond (ms)
searchFetchCurrent Number of current fetch phase operations.
fielddataMemory Used fielddata cache.
fielddataEvictions Used fielddata evictions.
averageQueryLatency Average time spent in query phase that is computed from searchQueryTime/searchQueryTotal.
 
Unit: millisecond (ms) per query
averageFetchLatency Average time spent in fetch phase that is computed from searchFetchTime/searchFetchTotal.
 
Unit: millisecond (ms) per fetch
   

Elasticsearch ThreadPool

This monitors the bulk, index, and search thread pools of each node in the cluster:

Column Name Description
node_id/name Node ID/Thread Pool Name.
node_name Name of the node.
name Thread Pool name.
type Thread Pool Type.
active Number of active threads.
queue Number of tasks currently in queue.
rejected Number of rejected tasks.
size Number of threads.
queue_size Size of the queue with pending requests that have no threads to execute.