Hadoop

Overview

Hadoop monitoring is a Gateway configuration file that enables monitoring of the Hadoop cluster, nodes, and daemons through the JMX and Toolkit plug-ins.

This Hadoop integration template consists of the following components:

Hadoop Distributed File System (HDFS)
Yet Another Resource Negotiator (YARN)

The Hadoop Distributed File System or HDFS provides scalable data storage that can be deployed on hardware and optimised operations for large datasets.

The other component Yet Another Resource Negotiator or YARN assigns the computation resources for executing the application:

YARN ResourceManager - takes inventory of available and allocate resources to running applications.
YARN NodeManagers - monitors resource usage and communicates with the ResourceManager.

Intended audience

This guide is intended for users who are setting up, configuring, troubleshooting and maintaining this integration. This is also intended for users who will be using Active Console to monitor data from Hadoop. Once the integration is set up, the samplers providing the dataviews become available to that Gateway.

As a user, you should be familiar with Java or any other database, and with the administration of the Hadoop services.

Prerequisites

The following requirements must be met before the installation and setup of the template:

A machine running the Netprobe must have access to the host where the Hadoop instance is installed and the port Hadoop is listening to.
Netprobe 4.6 or higher.
Gateway 4.8 or higher.
Hadoop 3.0.0 or higher.
Python 2.7/3.6 or higher.

Installation procedure

Ensure that you have read and can follow the system requirements prior to installation and setup of this integration template.

Download the integration package geneos-integration-hadoop-<version>.zip from the ITRS Downloads site.
Open Gateway Setup Editor.
In the Navigation panel, click Includes to create a new file.
Enter the location of the file to include in the Location field. In this example, it is the include/HadoopMonitoring.xml.
Update the Priority field. This can be any value except 1. If you input a priority of 1, the Gateway Setup Editor returns an error.
Expand the file location in the Includes section.
Select Click to load.
Click Yes to load the new Hadoop include file.

Click Managed entities in the Navigation panel.
Add the Hadoop-Cluster and Hadoop-Node types to the Managed Entity section that you will use to monitor Hadoop.
Click Validate current document to check your configuration.
Click Save current document to apply the changes.

Set up the samplers

These are the pre-configured samplers available to use in HadoopMonitoring.xml.

Configure the required fields by referring to the table below:

Samplers
Hadoop-HDFS-NamenodeInfo
Hadoop-HDFS-NamenodeCluster
Hadoop-HDFS-SecondaryNamenodeInfo
Hadoop-HDFS-DatanodesSummary
Hadoop-HDFS-DatanodeVolumeInfo
Hadoop-YARN-ResourceManager
Hadoop-YARN-NodeManagersSummary

Set up the variables

The HadoopMonitoring.xml template provides the variables that are set in the Environments section.

Samplers	Description
HADOOP_HOST_NAMENODE	IP/Hostname where Namenode daemon is running.
HADOOP_HOST_SECONDARYNAMENODE	IP/Hostname where Secondarynamenode daemon is running.
HADOOP_HOST_DATANODE	IP/Hostname where the specific Datanode daemon is running.
HADOOP_HOST_RESOURCEMANAGER	IP/Hostname where ResourceManager is running.
HADOOP_PORT_JMX_NAMENODE	Namenode JMX port.
HADOOP_PORT_JMX_SECONDARYNAMENODE	Secondarynamenode JMX port.
HADOOP_PORT_WEBJMX_DATANODE	Datanode web UI port. Default: 9864
HADOOP_PORT_JMX_RESOURCEMANAGER	ResourceManager JMX port
HADOOP_PORT_WEBJMX_NAMENODE	Namenode UI port . Default: 9870
HADOOP_PORT_WEBJMX_RESOURCEMANAGER	ResourceManager web UI port. Default: 8088
PYTHON_EXECUTABLE_PATH	Script that runs the Python program.

Set up the rules

The HadoopMonitoring-SampleRules.xml template also provides a separate sample rules that you can use to configure the Gateway Setup Editor.

Your configuration rules must be set in the Includes section.

The table below shows the included rule setup in the configuration file:

Sample Rules	Description
Hadoop-NameNodeCluster-Disk-Remaining	Checks the remaining disk ratio of the entire Hadoop cluster.
Hadoop-DataNode-Disk-Remaining	Checks the remaining disk ration of a single datanode HADOOP_RULE_DISK_REMAINING_THRESHOLD: Possible values 1.0 - 100.
Hadoop-Datanodes-In-Errors	Checks the number of nodes with errors HADOOP_RULE_DATANODES_ERROR_THRESHOLD: Integer values.
Hadoop-Blocks-In-Error	Checks the number of blocks with error HADOOP_RULE_BLOCKS_ERROR_THRESHOLD: Integer values.
Hadoop-Nodemanager-In-Error:	Checks the number of nodemanagers with error HADOOP_RULE_NODEMANAGER_ERROR_THRESHOLD: Integer value.
Hadoop-Applications-In-Error:	Checks the number of application with error HADOOP_RULE_APPLICATION_ERROR_THRESHOLD: Integer values.
Hadoop-SecondaryNamenode-Status :	Checks the connection status of JMX plugin to Secondarynamenode service.
Hadoop-NodeManager-State:	Checks the state of nodemanager HADOOP_RULE_NODEMANAGER_UNHEALTHY: Default: UNHEALTY

Metrics and dataviews

Hadoop HDFS namenode info

Column Name	Description
Name	Name of the service.
SoftwareVersion	Namenode service version.
SecurityEnabled	Number security setup.
State	Namenode service active state.

Hadoop HDFS namenode cluster

Column Name	Description
Name	Name of the service.
CapacityUsedGB	Current used capacity across all datanodes.
CapacityRemainingGB	Current remaining capacity.
CapacityTotalGB	Current raw capacity.
FilesTotal	Number of files and directories.
TotalLoad	Number of connections.
NumLiveDataNodes	Number of live datanodes.
NumStaleDataNodes	Number of datanodes marked stale due to delayed hearbeat.
NumDeadDataNodes	Number of dead datanodes.
BlocksTotal	Number of allocated blocks in the system.
BlockCapacity	Number of block capacity.
CorruptBlocks	Number of blocks with corrupt replicas.
UnderReplicatedBlocks	Number of blocks under replicated.
MissingBlocks	Number of missing blocks.

Hadoop HDFS secondaryNamenode info

Column Name	Description
Name	Name of the service.
CheckpointDirectories	Secondarynamenode checkpoint directories.
CheckpointEditlogDirectories	Secondarynamenode checkpoint edit log directories.
SoftwareVersion	Secondarynamenode service version.

Hadoop HDFS datanodes summary

Column Name	Description
name	Datanode name and (dfs) port address.
infoAddr	Datanode Web UI address.
usedSpaceGB	Datanode used capacity.
nonDfsUsedSpaceGB	Datanode non-dfs used capacity.
capacityGB	Datanode raw capacity.
remainingGB	Datanode remaining capacity.
numBlocks	Number of blocks in the datanode.
version	Datanode service version.
volFails	Number of failed volumes in the datanode.

The number of rows displayed is equal to the number of datanodes set in the whole cluster.

Hadoop YARN resource manager

Column Name	Description
Name	Name of the service.
NumActiveNMs	Number of active nodemanagers.
NumDecommissionedNMs	Number of decomissioned nodemanagers.
NumLostNMs	Number of lost nodemanagers for not sending hearbeats.
NumUnhealthyNMs	Number of unhealthy nodemanagers.
AppsRunning	Number of running applications.
AppsFailed	Total number of failed application.
AllocatedMB	Current allocated memory in MB.
AvailableMB	Available memory in MB.

Hadoop YARN nodeManagers summary

Column Name	Description
Hostname	Hostname where the nodemanager service is running.
State	Current nodemanager state.
NodeID	nodemanager Node ID.
NodeHTTPAdress	nodemanager WEB UI address.
NodeManagerVersion	nodemanager service version.
HealthReport	nodemanager health report.

Note: The number of rows displayed is equal to the number of nodemangers running.

Hadoop node metrics dataview

Hadoop HDFS datanode volume info

Column Name	Description
dir	Path of volume directory.
numBlocks	Current number of blocks in the datanode volume.
usedSpace	Used space in the datanode volume.
freeSpace	Free space in the datanode volume.
reservedSpace	Reserved space for datanode volume.
storageType	Type of datanode volume storage.
reservedSpaceForReplicas	Reserved space for replicas.

Note: The number of rows displayed is equal to the number of volumes in a single datanode.

Services

Professional Services

Managed Services

Training

Online courses

Instructor-led training

Technical Support

Support portal

Raise a ticket

My requests

Documentation

Product guides

What’s new

Release notes