Data Schema

Overview Copied

Data schemas give Gateway Hub information about the type of data being published from Gateway.

Each dataview that is published to Gateway Hub requires a data schema definition.

The data schema for a dataview specifies:

Schemas are defined in the Publishing tab of the sampler in the Gateway Setup Editor (GSE).

The Gateway comes packaged with some schemas. See Sampler schema types.

Metrics collected using the client library provide their own schemas automatically, so long as a valid Dynamic Entities mapping is defined.

When a data schema does not exist, you can create user defined schema and add these to the sampler. See Create a data schema.

Caution

Data schemas describe dataviews sent from Gateway to Gateway Hub. This should not be confused with the configuration schema that describes the correct XML formatting of Gateway setup files.

Built-in schemas Copied

The Gateway is packaged with built-in schemas for plug-ins with dataviews containing a set of known column names and data types.

Built-in schemas specify pivoting for dataviews containing rows that have only one value column and the data type of this column varies between the rows. For example, see the Hardware Plug-in - Technical Reference. This allows Gateway Hub to treat the rows of these dataviews as columns.

If a built-in schema exists for a sampler, This <sampler name> sampler has predefined schema(s) is displayed in the Publishing tab of a sampler in the GSE. For a list of samplers with built-in schemas, see Sampler schema types.

The GSE also has a command that can be used to view the schemas currently defined for dataviews on a sampler.

If you make any changes to the dataviews in these samplers, for example by adding headlines and columns using the Compute Engine, you must add these additions to the existing schema.

User-defined schemas Copied

Some plug-ins do not come with any data schema definitions. These are plug-ins where the columns are data types are unknown. You must define the data schema for:

Consider the following examples where a data schema definition is required for pre-existing dataviews:

For how to create a schema, see Create a data schema.

Pivoted dataviews Copied

If you add rows to pre-existing pivoted dataviews, you must define them as if they were additional columns. You do not specify the pivot option if there is a built-in schema for the sampler. If you do, and it conflicts with the built-in schema, it is discarded in favour of the user-defined schema.

View existing schemas Copied

To view existing schemas currently defined for dataviews on a sampler, open Active Console and select Show Current Schema.

In the GSE, the Show Current Schema command is available in the Publishing tab of the sampler.

When the command is run, a window opens showing one table per dataview, describing the schema defined on the sampler. There is a table for every dataview that has a defined schema, irrespective of whether the dataview is in use. Only dataviews that have a schema definition are shown.

Each table combines information from the built-in schema shipped with the Gateway and any additional information added in the GSE.

A description of the columns shown in each table is below:

Column Name Description
Component Describes if this component is a headline or a column.
Name Name of the headline or column.
Type Data type of the headline or column. The data types are: boolean ; date ; dateTime ; float32 ; float64 ; int32 ; int64 ; string .
Units Unit of measure assigned to this headline or column (if present).
Source

Origin of the headline or column and the schema. Cells in this column display one of the following:

  • Base — The built-in schema shipped with the Gateway.
  • Overridden — A built-in schema exists for this headline or column, but has been superseded by the user-defined schema in the Publishing tab of the sampler.
  • Enriched by Compute Engine — Headline or column added to the dataview via the Compute Engine. Schema has been defined in the Publishing tab of the sampler.
  • Defined by User in plugin configuration — Headline or column added to the dataview via a method other than the Compute Engine e.g. a Toolkit plug-in. Schema has been defined in the Publishing tab of the sampler.

Create a data schema Copied

When connected to a running Gateway you should use the Propose Schema command to create data schemas.

When creating and editing XML configuration files without being connected to a Gateway, you should create schemas manually.

The Propose Schema command Copied

Before using the Propose Schema command, please note the following:

You can run the Propose Schema command from the Active Console or the GSE, it will act differently depending on which component you are using.

When using the Gateway Setup Editor:

When using the Active Console:

Examples Copied

The following are examples of the output of Propose Schema:

Propose a schema in the GSE Copied

When using the Gateway Setup Editor running the Propose Schema command will query the Gateway and all its running instances of the sampler in order to build the schema.

Caution

Proposing a schema is computationally intensive and may impact the performance of a Gateway.

To generate a schema definition using the Propose Schema command in the GSE, follow these steps:

  1. Select the desired sampler in the GSE.
  2. Navigate to the Publishing tab of the sampler.
  3. Select Propose Schema.

Note

If user-defined schema is already present for the sampler, a dialog opens asking you to if you wish to overwrite the schema information.

The generated schema is directly added to the Publishing section. Any existing schema definitions are overwritten.

After using the command, perform the following:

  1. Select Schema > Dataviews > Data.
  2. Review the schema for errors because the data types and pivoting inferred by the Propose Schema command may be incorrect.

    Note

    If the value for Pivot is incorrect you must create the schema manually.
  3. Add any units of measure to the headlines and/or columns.

For more information about types, units of measure, and pivoting, see Gateway Hub configuration.

Propose a schema in Active Console Copied

In Active Console you can use the Propose Schema command to build schema for specific dataviews. This will not effect any connected Gateways.

To generate a schema definition using the Propose Schema command in Active Console, follow these steps:

  1. Make sure Gateway Hub is enabled in the GSE.
  2. Right-click a sampler or dataview.
  3. Navigate to Sampler Schema.
  4. Select Propose Schema. The generated XML schema definition for the sampler appears in a new window.
  5. Right-click the window with the generated XML in the Active Console and select Copy All.
  6. Navigate to the GSE, right-click the correct sampler and select Paste Schema.

Warning

No checks are performed when using Paste Schema on a sampler. Any existing schema is overwritten, and any copied schema can be pasted on to any sampler.

After using Paste Schema, perform the following:

  1. Navigate to the Publishing tab of the sampler to view the schema definition.
  2. Select Schema > Dataviews > Data.
  3. Review the schema for errors because the data types inferred by the Propose Schema command may be incorrect.
  4. Add any units of measure to the headlines and/or columns.
  5. (Optional) Specify pivoting.

For more information about types, units of measure, and pivoting, see Gateway Hub configuration.

Note

Paste Schema is only available when valid XML has been copied to the clipboard.

How to use Paste Schema to create static variables Copied

The generated XML output of the Propose Schema command in the Active Console can also be used to create sampler schemas as static variables. Schemas saved as static variables can be selected in the Publishing tab of a sampler.

To use the output of the Propose Schema command to produce schemas as static variables, follow these steps:

  1. Right-click on the window with the generated XML in the Active Console and select Copy All.
  2. Navigate to the GSE.
  3. Right-click Static variables > Sampler-schemas and select Paste Schema.

A static variable containing a schema definition for each dataview in the copied sampler is created. The name used for the static variable is the name of the dataview.

After using Paste Schema, perform the following for each static variable:

  1. Review the schema for errors because the data types inferred by the Propose Schema command may be incorrect.
  2. Add any units of measure to the headlines and/or columns.
  3. (Optional) Specify pivoting.

For more information about types, units of measure, and pivoting, see Gateway Hub configuration.

Note

Paste Schema is only available when valid XML has been copied to the clipboard.

How to define a schema manually Copied

To define a schema manually, perform the following:

  1. Open your Gateway Setup Editor.
  2. Navigate to the Publishing tab of the sampler you want to make a schema for.
  3. In Schema > Dataviews, click Add new. You must provide an entry for each dataview in the sampler that requires a schema definition.
  4. In the Dataview field, enter the name of the dataview.
  5. In the Schema field, choose data.
  6. Click Data.
  7. Add as many new Headlines and Columns entries as you require.
  8. In the Name field, add the name of the headline or column in the dataview.
  9. Under options, choose the type of data represented by the headline or column. If you choose Int32, Int64, Float32 or Float64, select the appropriate Unit of measure.
  10. (Optional) If the dataview is pivoted, tick the box by Pivot.
  11. Close the tab.
  12. Click Validate current document to review any errors.
  13. Click Save current document .

For more information about types, units of measure, and pivoting, see Gateway Hub configuration.

Schema inference Copied

Beginning with version 2.4.x, Gateway Hub can use the schema inference feature to infer data schemas for data published by a Gateway when a built-in or user-defined data schema does not exist. This is useful in cases where you have a large number of toolkits, and creating user-defined schemas for each may take a long time.

Inference produces only a best guess of the appropriate data schema, and the ultimate quality of an inferred data schema is dependent on the quality and consistency of the published data.

Note the following limitations:

In most cases, it is highly recommended that the you provide your own data schemas since this is the best way to ensure data schema accuracy and prevent loss of data due to misconfiguration.

Inference modes Copied

You can configure schema inference to run in one of three modes: Naive, Basic, or Smart. By default, schema inference will run in Smart mode and this is the recommenced setting.

Caution

In all cases, the data schema inferred is only as good as the data that has been observed. If the data structure changes after inference, then you must update the data schema manually or risk dropping further datapoints.
Mode Description Advantages Disadvantages
Naive Naive inference uses a single datapoint to infer a very basic data schema where all fields are type string.
  • Simplicity means that the generated data schema ensures that data is always ingested providing that the structure of the data does not change after inference.

  • Using one datapoint ensures that you do not lose data during inference.

  • Some metric functionality may be unavailable, since all fields are typed as string.

  • If the data structure changes after inference then ingestion is impacted.

Basic Basic inference uses a single datapoint to infer a more detailed data schema than Naive inference. Where field data is parsed as numeric, they are assigned the type float64 making it accessible to all metric query functionality. Non-numeric fields are assigned the type string.
  • Improved numerical data handing compared to Naive inference.

  • Using one datapoint ensures that you do not lose data during inference.

  • Increased likelihood of errors resulting from using a single datapoint. Especially if new fields are added after inference.

Smart

Smart inference uses a multi-datapoint inference model. This is the default and recommend setting.

You must configure the minimum number of datapoints to use over a defined inference period. Once the inference period is over (measured using sample time not clock time), if the inference engine has at least the minimum number of datapoints, it will perform a smart evaluation of property types. All supported types will be inferred, and any numerics will always be float64.

When setting the minimum number of datapoints, you should consider that you lose the datapoints used for inference. Additionally, any datapoints currently in use for inference are lost if the normaliser is shut down. This restarts the inference period and requires that datapoints are collected again. The higher the number of datapoints used for inference the higher the accuracy, but this also increases the amount of datapoints that can be lost.

Variations in inference duration are small, as no inference will be performed until the full period is complete.

  • Significantly improved inference compared to Naive and Basic methods. Includes increased user control.

  • Can handle new fields added during the inference period. Where any field cannot be inferred confidently, the engine will revert to a Naive inference and assign the string type.

  • Additional configuration required.

  • Datapoints used to infer the schema are lost.

Consider as an example, a sequence of four datapoints received over the inference time period such as 10 minutes. Each inference mode will treat the same data differently.

Datapoint Naive Basic Smart (using 3 samples for inference)
123.45 days (string) 123.45 days (string) 123.45 (float64) Used as training data, not stored.
250.56 days (string) 250.56 days (string) 250.56 (float64) Used as training data, not stored.
unavailable (string) unavailable (string) ingestion error Used as training data, not stored.
4.36 days (string) 4.36 days (string) 4.36 (float64) 4.36 days (string)

In cases with missing data, this can change the inferences made.

Datapoint Naive Basic Smart (using 3 samples for inference)
no data omitted omitted Used as training data, not stored.
no data omitted omitted Used as training data, not stored.
321.54 days (string) ingestion error ingestion error Used as training data, not stored.
4.36 days (string) ingestion error ingestion error 4.36 (float64)

Gateway Hub configuration Copied

You can configure data schema inference in Gateway Hub during installation or using hubctl with your installation descriptor.

For the most up-to-date information about configuration options, see Install Gateway Hub and hubctl tool.

The following configuration options are available:

Option Description
hub_normaliserd_inference_enabled Enable or disable data schema inference. Choose from true or false.
hub_normaliserd_inference_mode Set the inference mode. Choose from Naive, Basic or Smart.
hub_normaliserd_inference_smart_min_samples

Minimum number of samples required before Smart inference can be used. This setting only applies if hub_normaliserd_inference_mode is set to Smart.

Smart inference occurs after a duration set by inferenceWaitDurationSeconds. If at that time Gateway Hub has received at least minSamplesForInference samples, then Smart inference is performed. Otherwise, Naive inference is used.

hub_normaliserd_inference_smart_duration_seconds Duration in seconds to wait before performing Smart inference. This setting only applies if hub_normaliserd_inference_mode is set to Smart.
hub_normaliserd_inference_smart_threshold

Percentage of samples received inside the inference duration that must be of a specific type, for a field to be matched to that type.

For example, if Gateway Hub has received 10 samples by the end of the inference duration, and the threshold is 0.5, then 5 samples must be type numeric and the remainder null (or effectively null) in order for the associated field to be considered type numeric.

Gateway configuration Copied

Gateway version 5.7.x or later is required in order to publish dataviews without a schema. If dataviews without a schema are published to a Gateway Hub version that does not include schema inference, then a large number of ingestion errors will be reported.

Gateway will try to publish with a data schema if possible, and will not publish data if it has a data schema with errors.

As a result, the following scenarios are possible:

Data schema parameters Copied

Units of measure used in schemas Copied

Name Symbol
percent %
seconds s
milliseconds ms
microseconds μs
nanoseconds ns
days d
per second s-1
megahertz MHz
bytes B
kibibytes KiB
mebibytes MiB
gibibytes GiB
bytes per second B/s
megabits Mbit
megabits per second Mbit/s

Sampler schema types Copied

Below is a list of samplers and if they have an entirely built-in schema, a partially built-in schema, or are entirely user-defined.

Plugin Type Comments
Gateway-breachPredictor Built-in
Gateway-clientConnectionData Built-in
Gateway-databaseLogging Built-in
Gateway-exportedData Built-in
Gateway-gatewayHubData Built-in
Gateway-gatewayLoad Built-in
Gateway-importedData Built-in
Gateway-includesData Built-in
Gateway-licenceUsage Built-in
Gateway-severityCount Built-in
Gateway-includesData Built-in
Gateway-licenceUsage Built-in
Gateway-managedEntitiesData Partial
Gateway-probeData Built-in
Gateway-scheduledCommandData Built-in
Gateway-scheduledCommandsHistoryData Built-in
Gateway-severityCount Built-in
Gateway-severityData Built-in
Gateway-snoozeData Built-in
Gateway-sql User-defined
Gateway-userAssignmentData Built-in
api User-defined
api-streams Built-in
bloomberg-bpipe Built-in
citrix-apps Built-in
citrix-processes Built-in
citrix-sessions Built-in
citrix-summary Built-in
clearvision-status Built-in
combo User-defined
component-versions Built-in
control-m Built-in
cpu Built-in
desktop-pc-monitoring Built-in
deviceio Built-in
disk Built-in
e4jms-bridges Built-in
e4jms-connections Built-in
e4jms-durables Built-in
e4jms-non-durables Built-in
e4jms-queues Built-in
e4jms-routes Built-in
e4jms-server Built-in
e4jms-topics Built-in
e4jms-usersummary Built-in
euem Built-in
extractor User-defined
fidessa Built-in
fidessa-dq User-defined
fix Built-in
fix-analyser2 Partial

Admin data view schema provided, user must define schema for all other dataviews.

fkm Partial
flm Partial

User must define schema for additional data displayed based on configuration .

ftm Built-in
gl-greffon Built-in
gl-lostorders User-defined
gl-orderbook User-defined
gl-permissions Built-in
gl-router Built-in
gl-slc Partial

User must define schema for additional data displayed based on configuration or SLC log file.

gl-slc-relay Built-in
gl-sle Built-in
gl-sle-tcp Built-in
hardware Built-in
ibmi-job Built-in
ibmi-message Built-in
ibmi-pool Built-in
ibmi-queue Built-in
ibmi-subsystem Built-in
ibmi-system Built-in
informix Built-in
ipc Built-in
ix-ma User-defined
jmx-server User-defined
jmx-threadinfo Built-in
market-data-monitor User-defined
message-tracker Built-in
mibmon User-defined
miss-x Built-in
mq-channel Built-in
mq-qinfo Built-in
mq-queue Built-in
net-ping Built-in
network Built-in
nyxt-papastats Built-in
oracle Built-in
orc Built-in
pats-status Built-in
pats-trading-breaches Built-in
pats-users Built-in
perfmon User-defined
processes Built-in
rest-extractor User-defined
rmc-interface User-defined
sets-slc Built-in
sql-toolkit User-defined
stateTracker User-defined

User must define schema for user defined custom column names.

su Built-in
sybase Built-in
sybase-server Built-in
tcp-links Built-in
tib-rv Built-in
tib-rvpublisher Built-in
tib-rvstream Built-in
toolkit User-defined
top Built-in
trading-technologies Built-in
trapmon Partial

User must define schema for user-defined columns in custom view.

unix-users Built-in
veritas-cluster-server Built-in
web-mon Built-in
win-cluster Built-in
win-services Built-in
winapps Built-in
wmi User-defined
wts-licenses Built-in
wts-processes Built-in
wts-sessions Built-in
wts-summary Built-in
x-broadcast Built-in
x-mcast Built-in
x-multicast Built-in
x-ping Built-in
x-route Built-in
x-services Built-in
x-top Built-in
x-traffic Built-in
["Geneos"] ["Geneos > Gateway"] ["User Guide"]

Was this topic helpful?