Troubleshooting

Overview

This guide is intended to help you troubleshoot your Gateway Hub instance.

Many troubleshooting operations require you to use MapR commands, to do so set the MAPR_TICKETFILE_LOCATION environment variable:

export MAPR_TICKETFILE_LOCATION=/opt/mapr/conf/mapruserticket

Obtain diagnostics

Check your Gateway Hub licence status

  1. Access your Web Console using your browser.
  2. Click Administration > Licence to navigate to the Licence page.

The status of your Gateway Hub licence is displayed in the General section.

Check your MapR licence status

  1. Log in to the MCS interface using your browser. See Log in to the MapR Control System (MCS) interface.
  2. Navigate to Admin > Cluster Settings using the toolbar at the top of the page.
  3. On the Admin / Cluster Settings page, select the Licenses tab.

The status of your licences is displayed in a table.

Obtain logs

Gateway Hub stores log files at the following locations:

  • /opt/mapr/logs/
  • /opt/mapr/hadoop/hadoop-2.7.0/logs/
  • /opt/kafka/current/logs/
  • /var/log/

Additional logs are created by individual services. To find the status of all services on a node and the location of their log files, run:

 maprcli service list -node <node_hostname>

Log retention policies

You can adjust the default log retention policies by editing the log configuration files.

Service Default retention Configuration file
hub-svc-kafka log.retention.hours: 72 /opt/kafka/current/config/server.properties
cldb

log4j.appender.R.MaxFileSize: 100 MB

log4j.appender.R.MaxBackupIndex: 9

/opt/mapr/conf/log4j.cldb.properties
hadoop hadoop.log.maxfilesize: 256 MB

hadoop.log.maxbackupindex: 20

/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/log4j.properties
yarn yarn.nodemanager.localizer.cache.target-size-mb: 2000 /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml
hub-svc-* maxHistory: 60 days

totalSizeCap: 3GB

/usr/share/hub-svc-*/conf/logback.xml

Note: The asterisk * symbol indicates a wildcard which can take any value. For example hub-svc-snapshotd.

Obtain an info file

An info file containing basic information about your Gateway Hub installation can be sent to ITRS support to help diagnose problems with your Gateway Hub instance. You obtain this file using your Web Console.

For an introduction to the Web Console, see Geneos Web Console.

To obtain an info file, follow these steps:

  1. Access your Web Console using your browser.
  2. Click About ITRS Geneos to open the About page.
  3. Click the Get Diagnostic Info button to start the download.

This creates a Info.txt file in your default downloads folder.

Obtain a diagnostic file from the command line

You can create a comprehensive diagnostics file that packages the Gateway Hub log files from each node in the cluster as well as system information about the cluster and attached storage.

To obtain a diagnostic file from the command line, on any node run:

hubctl diagnostics <path_to_installation.json>

This creates a temporary file on the node. The location of the file is printed to stdout.

Procedures

Verify the REST endpoint is reachable

Use a browser, a dedicated client such as Postman, or curl -k in the command line, to query the REST address followed by /v0/admin/info. The default REST address is https://<hostname>:8080.

If the REST endpoint is reachable, this returns output similar to below:

{
  "buildDateTime" : "2018-07-31T15:50:31.02Z",
  "version" : "1.0.0-EA",
  "gitCommit" : "b27b5dadde830029cdb50c1ea834a34a0663ff62",
  "gitBranch" : "release/1.0.0",
  "javaInfo" : {
    "vendor" : "Oracle Corporation",
    "version" : {
      "major" : 1,
      "minor" : 8,
      "patch" : 0,
      "update" : 181,
      "arch" : "x64"
    },
    "vm" : "OpenJDK 64-Bit Server VM"
  },
  "os" : {
    "name" : "Linux(3.10.0-693.el7.x86_64)",
    "other" : [ "NAME=\"Red Hat Enterprise Linux Server\"", "VERSION=\"7.4 (Maipo)\"", "ID=\"rhel\"", "ID_LIKE=\"fedora\"", "VARIANT=\"Server\"", "VARIANT_ID=\"server\"", "VERSION_ID=\"7.4\"", "PRETTY_NAME=\"Red Hat Enterprise Linux Server 7.4 (Maipo)\"", "ANSI_COLOR=\"0;31\"", "CPE_NAME=\"cpe:/o:redhat:enterprise_linux:7.4:GA:server\"", "HOME_URL=\"https://www.redhat.com/\"", "BUG_REPORT_URL=\"https://bugzilla.redhat.com/\"", "REDHAT_BUGZILLA_PRODUCT=\"Red Hat Enterprise Linux 7\"", "REDHAT_BUGZILLA_PRODUCT_VERSION=7.4", "REDHAT_SUPPORT_PRODUCT=\"Red Hat Enterprise Linux\"", "REDHAT_SUPPORT_PRODUCT_VERSION=\"7.4\"", "Red Hat Enterprise Linux Server release 7.4 (Maipo)", "Linux version 3.10.0-693.el7.x86_64 (mockbuild@x86-038.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 19:56:57 EDT 2017" ]
  }
}

Obtain the Subject Alternative Name of a certificate

You can extract the Subject Alternative Name from a certificate using the OpenSSL command line tool. This allows you to ensure it matches the Gateway Hub domain. For more information, see Configure transport layer security in Configure transport layer security.

To extract the Subject Alternative Name, run:

openssl x509 -in <certificate_file> -text -noout

Which will return output similar to:

X509v3 Key Usage:
    Digital Signature, Non Repudiation, Key Encipherment
X509v3 Extended Key Usage:
    TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Alternative Name:
    DNS:DNS-name-1, DNS:DNS-name-2, ...

Add Gateway Hub certificate authority to Grafana

In order for Grafana to connect to Gateway Hub securely, the certificate authority (CA) that has signed the TLS/SSL certificate used by Gateway Hub must be trusted by the system running Grafana.

If Gateway Hub is installed using self-signed certificates or using certificates signed by an non-trusted certificate authority, you must add the relevant CA certificate to the trust store of the Grafana host. If Gateway Hub has been configured using production certificates that are trusted across an organisation, this is not required.

If you attempt to connect Grafana and Gateway Hub using non-trusted certificates, the connection will fail and Grafana will receive no data. The server logs will include a Failed to get access token error and state certificate signed by unknown authority.

To add Gateway Hub to a Linux system's recognised certificate authorities:

  1. Locate the CA certificate used to sign Gateway Hub certificates. In a default installation, using self-signed certificates, this is /opt/hub/self-signed/CA.crt.
  2. Copy the CA certificate to the trust store of the Grafana host. In a CentOS or Red Hat system this is located at /etc/pki/ca-trust/source/anchors/.
  3. To update the recognised certificate authorities, run:
    update-ca-trust extract
    1. You can verify the updated list by running:
      trust list
  4. Restart Grafana.
  5. In the Grafana web interface, open the ITRS Geneos Gateway Hub Datasource settings and disable Skip TLS Verify.

Find the Kafka broker IDs of nodes

When performing a default installation Gateway Hub will assign each node a Kafka broker ID sequentially, starting from zero. This ensures that each node has a unique ID.

You can also manually specify the Kafka broker ID of a node during a local installation. If you have previously specified broker IDs manually you must also specify them when adding additional nodes.

To determine the Kafka broker IDs' of existing nodes, run:

/opt/mapr/zookeeper/zookeeper-3.4.11/bin/zkCli.sh -server localhost:5181 ls /brokers/ids | tail -n1

This will return a list of IDs.

For a default installation of a three node cluster you should see:

[0,1,2]

Specify Kafka broker IDs

To specify the Kafka broker ID of a node add the following to the installation JSON file:

"hub": {
        "user": "hub",
        "install": {
            "kafka_broker_id": 4
        }

Manage PostgreSQL

Local access to the built in PostgreSQL database is available. The default administrator user is postgres.

To log in as an administrator, run:

psql -U postgres

Gateway Hub accesses the PostgreSQL database with the user specified in the configuration JSON file. The default Gateway Hub use is hub.

To log in as Gateway Hub, run:

psql -U hub -d hub

For more information, see the official PostgreSQL documentation.

Manage services

Gateway Hub uses three types of services:

  • Data and process management services.
  • Resource management services.
  • ITRS services.

List services

To find the status of all services on a node, run:

 maprcli service list -node <node_hostname>

The state of each service is indicated with a status number using the following scheme:

State Definition
0 Not configured — the package for the service is not installed and/or the service is not configured (configure.sh has not run). This state is also returned for all the services if you run the command without specifying a node.
1 Configured — the package for the service is installed and configured.
2 Running — the service is installed, started by the warden, and is currently running.
3 Stopped — the service is installed and configure.sh has run, but the service is not running.
4 Failed — the service is installed and configured, but not running.
5 Standby — the service is installed and is in standby mode, waiting to take over in case of failure of another instance.

Verify that your node services are running

The MapR Control System (MCS) interface provides you a lot of information regarding the status and health of your nodes.

To log in to the MapR Control System (MCS) interface, follow these steps:

  1. Enter https://<hostname>:8443 in a web browser, replacing <hostname> with the hostname of the Gateway Hub server,
  2. On the log in screen, enter the username and password of your Gateway Hub runtime user.
  3. Click Log In.

In the Overview tab, there are several sections that provide you with information regarding your MapR instance.

If your instance is healthy:

  • All nodes in the Node Health section are blue.
  • There are no alerts in your Active Alarms section.

Note: It is expected in Cluster Utilization for the memory usage to be high.

Restart a service

To restart a service, run:

maprcli node services -nodes <node_hostname> -name <service_name> -action restart

Services

Gateway Hub uses the following services:

Category Name Managed by Description
Hadoop historyserver Warden Archives MapReduce application metrics and metadata.
ITRS hub-svc-normaliserd Warden Parses and normalises the messages sent by the Gateways to Gateway Hub.
ITRS hub-svc-persistenced Warden Writes the normalised data to durable storage.
ITRS hub-svc-apid Warden Exposes the Gateway Hub REST API.
ITRS hub-svc-errord Warden Records data ingestion or system errors to durable storage.
ITRS hub-svc-snapshotd Warden Updates Kafka with the latest known metrics and events for each entity.
ITRS hub-svc-schedulerd Warden Executes scheduled internal jobs. This is used to archive data from hot to cold storage.
ITRS hub-svc-publisherd Warden Publishes data externally.
ITRS hub-svc-gateway-validatord Warden Validates Gateway setup files stored using centralised configuration.
ITRS hub-svc-gateway-coordinatord Warden Coordinates Gateway setup validations.
Kafka hub-svc-kafka Warden Main message bus.
MapR Core mapr-warden Systemd Runs on every node, manages each node's contribution to the cluster. Warden is also responsible for managing the service state and its resource allocations on that node.
MapR Core mapr-zookeeper Systemd Provides coordination services. Enables high availability (HA) and fault tolerance for MapR clusters.
MapR Core cldb Warden Container Location Database (CLDB) service tracks the location, size and usage of containers and volumes in MapR-FS.
MapR Core fileserver Warden Manages disk storage for MapR-FS and MapR-DB on each node.
MapR Core apiserver Warden Allows you to perform cluster administration programmatically and supports the MapR Control System.
MapR Core hoststats Warden Collects node metrics.
YARN nodemanager Warden Manages node resources and monitors the health of the node. It works with the ResourceManager to manage YARN containers that run on the node.
YARN resourcemanager Warden Manages cluster resources and tracks resource usage and node health.