Install

Overview

Before installing Gateway Hub, make sure you meet the following requirements:

Installation of the Gateway Hub is performed through the command line on your installation machine, using information from a JSON file. The installation copies the binaries from the installation machine and configures your servers. The diagram below shows the stages of the installation:

Download and unpack Gateway Hub

The Gateway Hub binaries are downloaded and unpacked onto your installation machine.

Perform the following:

  1. Download the Gateway Hub binaries from the ITRS group website:
    • The binaries are packaged as a .tar.gz file named gateway-hub-<version>.tar.gz.
  2. Move the Gateway Hub.tar.gz file into the desired directory.
  3. Unpack the Gateway Hub binary using the command line.
    • This creates a folder called hub.

Configure the JSON file

The installation uses a set of scripts called hubctl. hubctl is located in the hub directory as part of the download package.

These scripts read information from a JSON file. You must create and configure this JSON file. You can use the same JSON file for all stages of the installation and configuration.

Different parts of this JSON file are used by the different commands. However, some information is relevant to all commands for installing and configuring Gateway Hub. You must specify a connection method, installation user, and a list of hosts (this cannot include the installation machine).

Note: Remember all of the information in a JSON file is contained within a set of curly brackets { }.

When you are finished with the install, store the configuration file safely in a known location. This JSON configuration file will be re-used when you upgrade Gateway Hub.

Add the connection method and installation user

The connection information is specified in key/value pairs in an object under the key connection . You must add the following to your JSON file under connection:

  1. Add the following section to your JSON configuration file:
  2. {
    	"connection" : {
    		"user": "<change_me>",
    		"private_key" : "<change_me>"
    		"ask_pass" : false
    	}
    }						
    
  3. Change the value of user to the installation user on your servers.
  4. Caution: You must have already created this user. The user must be on the sudoers list with passwordless access on the target machine. See Users. If you cannot provide passwordless access, see Additional configuration for installation user without passwordless sudo access.

  5. Complete either a or b:
    1. If you are using passwordless SSH:
      1. Change the value of private_key to the location of the private key file.
      2. Delete the line with ask_pass.
    2. If you want to be prompted for the password:
      1. Change the value of ask_pass to true.
      2. Delete the line with private_key.
  6. Save your file.

Additional configuration for installation user without passwordless sudo access

This section is only relevant if you cannot provide passwordless sudo access for the Gateway Hub installation user. If this is the case, you must add the following to your JSON file:

{
	"ansible" : {
		"flags" : ["--ask-become-pass"]
	}
}

This configuration makes Ansible request the sudo password from you on install. This is in addition to the regular SSH password if you are using ask_pass.

Examples

For example, if the host has a setup user hub-setup, you want to use passwordless SSH with the key location at ~/.ssh/HUB-SETUP-KEY.pem, the JSON is the following:

{
	"connection" : {
		"user": "hub-setup-user",
		"private_key" : "~/.ssh/HUB_SETUP_KEY.pem"
	}
}

For example, if the host has a setup user hub-setup, and you want to be prompted for the password of the user, the JSON is the following:

{
	"connection" : {
		"user": "hub-setup-user",
		"ask_pass" : true
	}
}

For example, if the host has a setup user hub-setup, you want to be prompted for the password of the user, and your user does not have passwordless sudo access, the JSON is the following:

{
	"connection" : {
		"user": "hub-setup-user",
		"ask_pass" : true
	}
	"ansible" : {
		"flags" : ["--ask-become-pass"]
	}
}

Add the hosts information

To specify the hosts that you are installing Gateway Hub on, add one of the following key/value pairs to your JSON file:

  • hosts — value must be an array containing a comma-separated list of addresses of target machines.
  • hosts_file — value must be a path to a newline-delimited list of hosts.

All hosts must be specified using a fully qualified domain name (FQDN), using an ip address will result in a failed installation.

Examples

In the examples below there are two hosts gwh-host1 and gwh-host2.

If we want to specify the hosts in the configuration file, the JSON is the following:

"hosts" : ["gwh-host1", "gwh-host2" ]

If we want to specify the hosts in a file at the location /tmp/hosts, the JSON is the following:

"hosts_file" : "/tmp/hosts"

Add install information to JSON file

To install Gateway Hub, you must add another section to the JSON file. This section contains information regarding the runtime user for Gateway Hub and the location of the MapR disks.

To configure your JSON file, perform the following:

  1. Add the following section to your JSON configuration file:
  2. {
    	"hub": {
    		"user": "<change_me>",					
    		"install": {
    			"mapr_disks": ["<change_me>"],
    			"disk_pool_size" :"3", 
    			"kafka_partition_logs" : ["<change_me>"],
    			"zk_txn_log_path": "<change_me>"
    		}
    	}
    }
  3. Change the value of user to the runtime user on your servers. See Users.
  4. Change the value of mapr_disks to the location of your MapR disks.
  5. If you want to change the size of the MapR storage pools, change the value of disk_pool_size from the default. For more information, see Appendix: Storage pool size.
  6. Change the value of kafka_partition_logs to the location of your Kafka partitions or disks.
    • These must be folders that exist and are empty.
  7. Change the value of zk_tkn_log_path to the path for your Zookeeper transaction logs. We recommend this is a separate disk.
  8. Save your file.

Additional configuration for Spark

For larger Geneos estates, we recommend a separate disk is provided for the Spark intermediate files. If you skip this step, the location of the Spark intermediate files defaults to /tmp on the OS disk.

To configure the JSON file, follow these steps:

  1. Add the following to the hub.install section in your JSON configuration file:
    • "spark_local_dir_path" : "<change_me>"
  2. Change the value to the desired location for your Spark intermediate files.
  3. Save your file.

See Example 2 below for a full JSON configuration including this setting.

Additional configuration for ports

You can modify the default ports used for some services in Gateway Hub by adding another section to the JSON file. The table below shows the ports that can be modified and their corresponding key in the JSON file.

Service Key in JSON file
Gateway Hub REST API api
Web Console ui
SSO Agent sso
MapR Monitoring Console mcs
Gateway Hub message bus kafka_listener

See Ports for the default ports used by Gateway Hub.

To configure the JSON file, follow these steps:

  1. Add the following to the hub section in your JSON configuration file:
    • "ports": {
      	"api": <change_me>,
      	"ui": <change_me>,
      	"sso": <change_me>,
      	"mcs": <change_me>,
      	"kafka_listener": <change_me>           
              }
  2. Change the value of each setting to the desired port number. Refer to the table above for which settings correspond to which service.
  3. Save your file.

See Example 3 below for a full JSON configuration including this setting.

Examples

Example 1

In the example below:

  • The SSH key for the setup user is located at ~/.ssh/HUB-SETUP-KEY.pem, specified with connection.private_key.
  • The host has an installation user hub-setup.
  • We are installing Gateway Hub to the host hub.example.com.
  • There is a runtime user on the host called hub.
  • The MapR storage disk is /dev/sdb.
  • The storage pool size is 1.
  • The Kafka disk is /home/kafka.
  • The path for the Zookeeper transaction logs is /var/zk.

The JSON configuration file is the following:

{
	"connection": {
		"private_key": "~/.ssh/HUB-SETUP-KEY.pem",
		"user": "hub-setup"
	},
	"hosts": ["hub.example.com"],
	"hub": {
		"user": "hub",				
		"install": {
			"mapr_disks": ["/dev/sdb"],
			"disk_pool_size" :"1", 
			"kafka_partition_logs" : ["/home/kafka"],
			"zk_txn_log_path": "/var/zk"
		}
	}
}

Example 2

In this example:

  • We have included the value connection.ask-pass, so are being prompted for the password of the setup user.
  • Every host has an installation user hub-setup.
  • We are installing Gateway Hub to multiple hosts.
  • The hosts are specified in a list in a newline-delimited file at /tmp/hosts.
  • There is a runtime user on every host called hub.
  • The MapR storage disks on each host are /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf, and /dev/sdg.
  • The storage pool size is 3.
  • The Kafka partitions are /var/kafkaa and /var/kafkab.
  • The path for the Zookeeper transaction logs is /var/zk.
  • We are specifying the location for the Spark intermediate files as a separate disk, /var/spark.

The JSON configuration file is the following:

{
	"connection": {
		"ask_pass": true,
		"user": "hub-setup"
	},
	"hosts_file": "/tmp/hosts",
	"hub": {
		"user": "hub",				
		"install": {
			"mapr_disks": ["/dev/sdb", "/dev/sdc", "/dev/sdd", "/dev/sde", "/dev/sdf", "/dev/sdg"],
			"disk_pool_size" :"3", 			
			"kafka_partition_logs" : ["/var/kafkaa", "/var/kafkab"],
			"zk_txn_log_path": "/var/zk",
			"spark_local_dir_path" : "/var/spark"		
		}
	}
}

Example 3

In this example:

  • We have included the value connection.ask-pass, so are being prompted for the password of the setup user.
  • Every host has an installation user hub-setup.
  • We are installing Gateway Hub to multiple hosts.
  • The hosts are specified in a list in a newline-delimited file at /tmp/hosts.
  • There is a runtime user on every host called hub.
  • The MapR storage disks on each host are /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf, and /dev/sdg.
  • The storage pool size is 3.
  • The Kafka partitions are /var/kafkaa and /var/kafkab.
  • The path for the Zookeeper transaction logs is /var/zk.
  • We are specifying the location for the Spark intermediate files as a separate disk, /var/spark.
  • We are specifying the port numbers for:
    • Gateway Hub REST API
    • Web Console
    • SSO Agent
    • MapR Monitoring Console
    • Gateway Hub message bus

The JSON configuration file is the following:

{
	"connection": {
		"ask_pass": true,
		"user": "hub-setup"
	},
	"hosts_file": "/tmp/hosts",
	"hub": {
		"user": "hub",				
		"install": {
			"mapr_disks": ["/dev/sdb", "/dev/sdc", "/dev/sdd", "/dev/sde", "/dev/sdf", "/dev/sdg"],
			"disk_pool_size" :"3", 			
			"kafka_partition_logs" : ["/var/kafkaa", "/var/kafkab"],
			"zk_txn_log_path": "/var/zk",
			"spark_local_dir_path" : "/var/spark",
		}
		"ports": {
			"api": 30001,
			"ui": 30002,
			"sso": 30003,
			"mcs": 30004,
			"kafka_listener": 30005           
		}				
	}
}

Note: Remember that the users must have the same UID and GID on all hosts. Care must be taken if creating users on each node manually.

How to install Gateway Hub on a server

The steps here assume you have already unpacked the Gateway Hub binaries to your chosen folder.

Check you have configured your JSON file with the correct information. See Add install information to JSON file.

You can use the flag hubctl <command> --help to bring up the command line help during the installation.

To install Gateway Hub to a server, follow these steps:

  1. Go to the folder called hub in your chosen directory.
    • This folder was created when you unpacked Gateway Hub.
  2. Run hubctl setup install <JSON file>, replacing <JSON file> with the location of your JSON file.
  3. Wait for the installation to finish.

If successful, the PLAY RECAP output on the command line states a number of ok or changed configurations, and zero unreachable or failed, similar to the example below:

PLAY RECAP *******************************************************************
node1.example.itrs.com : ok=287  changed=183  unreachable=0    failed=0
localhost                  : ok=6    changed=0    unreachable=0    failed=0

Success

Appendix: Storage pool size

MapR storage architecture consists of multiple storage pools that reside on each node in a cluster. A storage pool consists of one or more disks grouped together.

In a storage pool:

  • The total size of the pool is sum of the size of all the disks.
  • Write operations are striped across disks to improve write performance.
  • All disks are used simultaneously and are treated as a single unit.
  • Failure of one disk in the storage pool requires the contents of the entire storage pool to be re-replicated from other nodes.

By default, MapR stripes three disks per storage pool. This default represents a compromise between performance and reliability.

Distribution of disks in storage pools

Disks are allocated to storage pools to produce the maximum number of full-sized pools. The table below shows how disks are allocated with a default size of three:

Number of disks on a node Disk distribution Number of storage pools
3 3 1
4 3:1 2
5 3:2 2
6 3:3 2
7 3:3:1 3
8 3:3:2 3
9 3:3:3 3
10 3:3:3:1 4
... ... ...

Changing the disk pool size

You can change the size of the MapR storage pools used by Gateway Hub by changing the value of disk_pool_size from the default in the JSON install file. See Add install information to JSON file.

If you are contemplating increasing the storage pool size to more than three disks, consider the trade-off between performance of the wider stripe, reliability, and rebuild time. Increasing the storage pool size provides:

  • Increased performance. More disks improves data throughput of a node.
  • Decreased reliability. A storage pool is treated as a single unit and failure of one disk in the storage pool requires the contents of the entire storage pool to be re-replicated from other nodes.
  • Increased rebuild time on failure.

Note: There is no best value of storage pool size. It depends on the environment, the disk workload, and recovery needed in case of failure.