Gateway Hub

Troubleshooting

Overview

This guide is intended to help you troubleshoot your Gateway Hub instance.

Many troubleshooting operations require you to use MapR commands, to do so set the MAPR_TICKETFILE_LOCATION environment variable:

export MAPR_TICKETFILE_LOCATION=/opt/mapr/conf/mapruserticket

Obtain diagnostics

Check your Gateway Hub licence status

  1. Access your Web Console using your browser.
  2. Click Administration > Licence to navigate to the Licence page.

The status of your Gateway Hub licence is displayed in the General section.

Check your MapR licence status

  1. Log in to the MCS interface using your browser. See Log in to the MapR Control System (MCS) interface.
  2. Navigate to Admin > Cluster Settings using the toolbar at the top of the page.
  3. On the Admin / Cluster Settings page, select the Licenses tab.

The status of your licences is displayed in a table.

Obtain logs

Gateway Hub stores log files at the following locations:

  • /opt/mapr/logs/
  • /opt/mapr/hadoop/hadoop-2.7.0/logs/
  • /opt/kafka/current/logs/
  • /var/log/

Additional logs are created by individual services. To find the status of all services on a node and the location of their log files, run:

 maprcli service list -node <node_hostname>

Log retention policies

You can adjust the default log retention policies by editing the log configuration files.

Service Default retention Configuration file
hub-svc-kafka log.retention.hours: 72 /opt/kafka/current/config/server.properties
cldb

log4j.appender.R.MaxFileSize: 100 MB

log4j.appender.R.MaxBackupIndex: 9

/opt/mapr/conf/log4j.cldb.properties
hadoop hadoop.log.maxfilesize: 256 MB

hadoop.log.maxbackupindex: 20

/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/log4j.properties
spark spark.history.fs.cleaner.maxAge: 7 days /opt/mapr/spark/spark-2.3.2/conf/spark-defaults.conf
yarn yarn.nodemanager.localizer.cache.target-size-mb: 2000 /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml
hub-svc-* maxHistory: 60 days

totalSizeCap: 3GB

/usr/share/hub-svc-*/conf/logback.xml

Note: The asterisk * symbol indicates a wildcard which can take any value. For example hub-svc-snapshotd.

Obtain an info file

An info file containing basic information about your Gateway Hub installation can be sent to ITRS support to help diagnose problems with your Gateway Hub instance. You obtain this file using your Web Console.

For an introduction to the Web Console, see Geneos Web Console.

To obtain an info file, follow these steps:

  1. Access your Web Console using your browser.
  2. Click About ITRS Geneos to open the About page.
  3. Click the Get Diagnostic Info button to start the download.

This creates a Info.txt file in your default downloads folder.

Obtain a diagnostic file from the command line

You can create a comprehensive diagnostics file that packages the Gateway Hub log files from each node in the cluster as well as system information about the cluster and attached storage.

To obtain a diagnostic file from the command line, on any node run:

hubctl diagnostics <path_to_installation.json>

This creates a temporary file on the node. The location of the file is printed to stdout.

Procedures

Verify the REST endpoint is reachable

Use a browser, a dedicated client such as Postman, or curl -k in the command line, to query the REST address followed by /v0/admin/info. The default REST address is https://<hostname>:8080.

If the REST endpoint is reachable, this returns output similar to below:

{
  "buildDateTime" : "2018-07-31T15:50:31.02Z",
  "version" : "1.0.0-EA",
  "gitCommit" : "b27b5dadde830029cdb50c1ea834a34a0663ff62",
  "gitBranch" : "release/1.0.0",
  "javaInfo" : {
    "vendor" : "Oracle Corporation",
    "version" : {
      "major" : 1,
      "minor" : 8,
      "patch" : 0,
      "update" : 181,
      "arch" : "x64"
    },
    "vm" : "OpenJDK 64-Bit Server VM"
  },
  "os" : {
    "name" : "Linux(3.10.0-693.el7.x86_64)",
    "other" : [ "NAME=\"Red Hat Enterprise Linux Server\"", "VERSION=\"7.4 (Maipo)\"", "ID=\"rhel\"", "ID_LIKE=\"fedora\"", "VARIANT=\"Server\"", "VARIANT_ID=\"server\"", "VERSION_ID=\"7.4\"", "PRETTY_NAME=\"Red Hat Enterprise Linux Server 7.4 (Maipo)\"", "ANSI_COLOR=\"0;31\"", "CPE_NAME=\"cpe:/o:redhat:enterprise_linux:7.4:GA:server\"", "HOME_URL=\"https://www.redhat.com/\"", "BUG_REPORT_URL=\"https://bugzilla.redhat.com/\"", "REDHAT_BUGZILLA_PRODUCT=\"Red Hat Enterprise Linux 7\"", "REDHAT_BUGZILLA_PRODUCT_VERSION=7.4", "REDHAT_SUPPORT_PRODUCT=\"Red Hat Enterprise Linux\"", "REDHAT_SUPPORT_PRODUCT_VERSION=\"7.4\"", "Red Hat Enterprise Linux Server release 7.4 (Maipo)", "Linux version 3.10.0-693.el7.x86_64 (mockbuild@x86-038.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 19:56:57 EDT 2017" ]
  }
}

Fix a broken SSL installation

In a production environment the Gateway Hub requires a key store and a trust store to connect using TLS/SSL. The Gateway Hub may fail to start if the key store and trust store passwords are incorrect.

Validation

To determine the correct passwords you can use the Java keytool.

  1. To validate the key_store password, run the following command:

    keytool -list -v -keystore /path/to/key_store

    You will be prompted for a password. If the correct one is supplied, information about the key store is shown.

  2. To validate the trust_store password, run the following command:

    keytool -list -v -keystore /path/to/trust_store

    You will be prompted for a password. If the correct one is supplied, information about the trust store is shown.

  3. To validate the key_password you must export the keystore from the JKS format to the PKCS12 format by running:

    keytool -importkeystore \
        -srckeystore </path/to/ssl_keystore> \
        -srcstorepass <ssl_keystore_password> \
        -srckeypass <private_key_password> \
        -srcalias <key_alias> \
        -destkeystore </path/to/ssl_keystore.p12> \
        -deststoretype PKCS12 \
        -deststorepass <dest_keystore_password> \
        -destkeypass <dest_private_key_password>

    If the password is correct, the export will create a ssl_keystore.p12 file.

    If the key_password or key_store password is incorrect, an error is displayed.

Update the Gateway Hub

Once you have determined the correct passwords you should update the Gateway Hub by editing the JSON configuration file and running the installation scripts. For more information on using the installation scripts, see Install.

Manage services

Gateway Hub uses three types of services:

  • Data and process management services.
  • Resource management services.
  • ITRS services.

List services

To find the status of all services on a node, run:

 maprcli service list -node <node_hostname>

The state of each service is indicated with a status number using the following scheme:

State Definition
0 Not configured — the package for the service is not installed and/or the service is not configured (configure.sh has not run). This state is also returned for all the services if you run the command without specifying a node.
1 Configured — the package for the service is installed and configured.
2 Running — the service is installed, started by the warden, and is currently running.
3 Stopped — the service is installed and configure.sh has run, but the service is not running.
4 Failed — the service is installed and configured, but not running.
5 Standby — the service is installed and is in standby mode, waiting to take over in case of failure of another instance.

Verify that your node services are running

The MapR Control System (MCS) interface provides you a lot of information regarding the status and health of your nodes.

To log in to the MapR Control System (MCS) interface, follow these steps:

  1. Enter https://<hostname>:8443 in a web browser, replacing <hostname> with the hostname of the Gateway Hub server,
  2. On the log in screen, enter the username and password of your Gateway Hub runtime user.
  3. Click Log In.

In the Overview tab, there are several sections that provide you with information regarding your MapR instance.

If your instance is healthy:

  • All nodes in the Node Health section are blue.
  • There are no alerts in your Active Alarms section.

Note: It is expected in Cluster Utilization for the memory usage to be high.

Restart a service

To restart a service, run:

maprcli node services -nodes <node_hostname> -name <service_name> -action restart

Services

Gateway Hub uses the following services:

Category Name Managed by Description
Hadoop historyserver Warden Archives MapReduce application metrics and metadata.
ITRS hub-svc-normaliserd Warden Parses and normalises the messages sent by the Gateways to Gateway Hub.
ITRS hub-svc-persistenced Warden Writes the normalised data to durable storage.
ITRS hub-svc-apid Warden Exposes the Gateway Hub REST API.
ITRS hub-svc-errord Warden Records data ingestion or system errors to durable storage.
ITRS hub-svc-snapshotd Warden Updates Kafka with the latest known metrics and events for each entity.
ITRS hub-svc-schedulerd Warden Executes scheduled internal jobs. This is used to archive data from hot to cold storage.
ITRS hub-svc-publisherd Warden Publishes data externally.
ITRS hub-svc-gateway-validatord Warden Validates Gateway setup files stored using centralised configuration.
ITRS hub-svc-gateway-coordinatord Warden Coordinates Gateway setup validations.
Kafka hub-svc-kafka Warden Main message bus.
MapR Core mapr-warden Systemd Runs on every node, manages each node's contribution to the cluster. Warden is also responsible for managing the service state and its resource allocations on that node.
MapR Core mapr-zookeeper Systemd Provides coordination services. Enables high availability (HA) and fault tolerance for MapR clusters.
MapR Core cldb Warden Container Location Database (CLDB) service tracks the location, size and usage of containers and volumes in MapR-FS.
MapR Core fileserver Warden Manages disk storage for MapR-FS and MapR-DB on each node.
MapR Core apiserver Warden Allows you to perform cluster administration programmatically and supports the MapR Control System.
MapR Core hoststats Warden Collects node metrics.
Spark livy Warden REST service for submitting Spark jobs.
Spark spark-historyserver Warden Archives Spark application metrics and metadata.
YARN nodemanager Warden Manages node resources and monitors the health of the node. It works with the ResourceManager to manage YARN containers that run on the node.
YARN resourcemanager Warden Manages cluster resources and tracks resource usage and node health.