Plan the cluster


A Gateway Hub installation consists of a number of individual servers called nodes. A set of nodes collectively form a cluster. Gateway Hub creates a custom file system across the cluster and all Gateway Hub applications run on this file system.

Using a custom file system provides significant advantages in speed and resiliency. However, this means that the Gateway Hub installation process is complex and you must ensure that all prerequisites are fully met.

The first step in deploying Gateway Hub is planning how many servers are required to form the cluster. This is done based on the work that Gateway Hub is expected to perform and the specifications of the nodes.

To ensure resilience, at least 5 nodes are recommended. At minimum of 3 nodes are required.

To determine whether an individual server is capable of contributing to the cluster, check Hardware requirements and Software requirements. One of the most common reasons for installation failure is the unsuitability of a node in the cluster.

Cluster sizing tool

You can use the hubsize tool to estimate the hardware requirements of your installation environment. This script is located in the hub/hubsize directory of the Gateway Hub download.

The hubsize tool can also be downloaded separately from ITRS Downloads.


The hubsize tool has the following dependencies:

  • PyYAML

You can install all dependencies using the included requirements.txt file:

pip install -r requirements.txt

Define the requirements

The requirements of a Gateway Hub installation vary between use cases. The hubsize tool provides an estimate of the requirements of your installation based on the following criteria:

Parameter Description
clusterSize Number of nodes in the cluster.
numberOfGeneosProbes Number of connected Netprobes.
shouldStoreHistoricalData Specify if Gateway Hub should store historical data (required for all features except centralised configuration and publishing).
retentionPeriodDays Duration, in days, historical data is stored. This must be specified separately for Kafka and MapR data.
replicationFactor Number of nodes used for replication. This must be specified separately for Kafka and MapR data.

You must specify these parameters in a definition.yml file that the hubsize tool will read. A human readable example file is included.

Run the hubsize tool

To estimate the cluster requirements run:

./hubsize definition.yml

The hubsize tool has the following command line options:

Option Description
--help Print a help text to standard out.
--version Print the tool version to standard out.
--out Specify a file to write the estimated hardware requirements to.