Notifications
The Notifications app closes the loop between data collection, signal generation, and alerting. When the condition of your IT estate needs attention, the app conveniently notifies you through its integration with external systems such as Slack.
You can configure notifications in two ways: either for groups of entities or for individual entities. Grouped notifications are enabled if there is at least one grouping in the configuration.
Grouped notification lifecycle Copied
Grouped notifications are formed by bundling Obcerv entities that have common characteristics. For example, if you group by container
, it will create a group per Kubernetes container that grows or shrinks seamlessly (as entities such as pods or volumes match or cease to match the grouping parameters).
Grouped notifications are triggered when at least one entity in the group exceeds the configured warning or critical trigger interval. Once a notification is triggered, reminders are periodically sent following the reminder trigger interval.
The notification is cleared when all entities in the group exceed the clear trigger interval and no additional entities in the group have been triggered.
Entity notification lifecycle Copied
Entity notifications are triggered when an individual entity exceeds the configured warning or critical trigger interval. Once a notification is sent, reminders are periodically sent following the reminder trigger interval. The notification is cleared when the entity exceeds the clear trigger interval.
Configuration Copied
To create a new configuration, follow these steps:
- From the Web Console, select Notifications.
- Click Add Notification.
- Specify the Name for the new configuration. You can also add a description.
- On the Filter field, set whether to include or exclude entities when evaluating notifications for individual entities or groups.
- On the Group by field, select entities with common attributes or dimensions that will be grouped together. A single notification will be sent for each group.
A group is an implicit filter so any entities that do not have the group attributes or dimensions will be excluded. For example:
Entity (Illustrative name) | Environment | OS |
---|---|---|
Prod_Linux | Prod | Linux |
Prod_Mac | Prod | MacOS |
Prod_Win | Prod | Windows |
Test_Linux | Test | Linux |
Test_Mac | Test | MacOS |
Test_Win | Test | Windows |
Dev_Linux | Dev | Linux |
Dev_Mac | Dev | MacOS |
Dev_Win | Dev | Windows |
Dev_None | Dev |
Given the above entities and grouping configuration OS
, the following groups will be created with these entities:
Group | Entities |
---|---|
Linux | Prod_Linux Test_Linux Dev_Linux |
MacOS | Prod_MacOS Test_MacOS Dev_MacOS |
Windows | Prod_Win Test_Win Dev_Win |
Only one notification per group will be sent and entity Dev_None
will be disregarded since it doesn’t have the group attribute OS
.
- Under Targets, select a target type, and then choose an option from the Targets drop-down list. Depending on the target type, you can set more options. For example, if you selected SLACK, you will see the Reply in Slack thread toggle.
Note
If you need to create a new target, go to Notifications > Targets, and then click beside the target type. For more information about the available target types, see Integrations.
- Check the options under the Triggers and Messages section. For more information, see Triggers and Messages.
- Click Save to add the new configuration.
Triggers Copied
A notification is triggered and cleared based on an entity’s severity.
Warning / Critical (triggered) Copied
If an entity has had a critical or warning severity for longer than the supplied duration, then it is considered triggered and a notification is sent, either individually for the entity or as part of a group (if groups are configured). Where a severity is rapidly changing between warning and critical states, the warning trigger also considers entities with critical severity.
Cleared Copied
If an entity has been previously triggered and has had no severity for the supplied duration, then it is considered cleared and a notification is sent indicating that the entity is healthy.
Reminders Copied
Reminder notifications apply to triggered entities and groups, and are periodically sent as a reminder that the entity/group is still in a triggered state.
Messages Copied
Notification messages must be configured for each enabled trigger. It consists of the main message body and an optional short title. The title’s usage depends on the integration. For example, for Slack notifications, it is used as a header and as the text of push notifications. It is not used for Webhooks.
Messages may contain placeholders of the form ${placeholder}
that will be interpolated by the app. The available placeholders can be seen by clicking
.
Some placeholders are only available for grouped notifications while others are only available for entity notifications. The supported placeholders can be seen below.
Placeholder | Group | Entity | Description |
---|---|---|---|
${date} |
✔ | ✔ | The current date in UTC (for example, 2011-12-03). |
${time} |
✔ | ✔ | The current time in UTC (for example, 15:14:11). |
${dateTime} |
✔ | ✔ | The current date-time in UTC (for example, 2007-12-03T10:15:30). |
${url} |
✔ | ✔ | The URL to the Notifications app. |
${severity} |
✔ | ✔ | The entity’s severity or the group’s maximum severity. |
${entity} |
✘ | ✔ | The entity object as a JSON object. |
${dimensions} |
✘ | ✔ | The entity dimensions as a JSON object. |
${entity.attribute[<attribute>]} |
✘ | ✔ | The entity attribute value. If the attribute is not found, an empty string is returned. |
${entity.dimension[<dimension>]} |
✘ | ✔ | The entity dimension value. If the dimension key is not found, an empty string is returned. |
${triggeredCount} |
✔ | ✘ | The number of entities in the group that have been triggered. |
${criticalCount} |
✔ | ✘ | The number of entities in the group that have critical severity. |
${okCount} |
✔ | ✘ | The number of entities in the group that are triggered but have no severity at the time the notification is sent. This is relevant for reminder notifications where an entity’s severity has recently been cleared but does not meet the configured cleared trigger criteria yet. |
${warningCount} |
✔ | ✘ | The number of entities in the group that have a warning severity. |
${clearedCount} |
✔ | ✘ | The number of entities in the group that are no longer triggered. |
${group} |
✔ | ✘ | The entity group as a list of key/value pairs (e.g. [Kind: Database, Environment: DevOps]). |
${group[<group>]} |
✔ | ✘ | The entity group value. If the group is not found, an empty string is returned. |
Illustrative example Copied
Consider the above timeline of an entity’s severity where the numbers represent minutes, and assuming that the configuration of all triggers is set to 1 minute.
Minute(s) | Action |
---|---|
0-4 | No notifications are sent because the entity has no severity. |
4-5 | No notifications are sent because the entity’s severity is not continuously warning or critical for one minute. |
5-6 | No notifications are sent because the entity has no severity. |
7 | A triggered notification is sent because the entity’s severity has been in warning state for at least one minute. |
7-12 | Reminders are sent every minute as the entity is still in warning or critical state. |
13 | A clear notification is sent because the entity has had no severity for at least one minute. |
Integrations Copied
The app integrates with the following third-party systems:
Instrumentation Copied
The app leverages an in-house StatsD client to record metrics regarding its internal state. The following is a complete list of metrics collected by the app:
Metric | Type | Unit | Dimensions | Description |
---|---|---|---|---|
Notifications Queued | gauge | instance, system, notification_id | The total number of notifications currently queued to be sent. | |
Notifications Sent | gauge | instance, system, notification_id | The total number of notifications sent. | |
Notification Count | counter | instance, system, notification_id | The notification count accrued over regular intervals. | |
Notifications Rejected | gauge | instance, system, notification_id | The total number of notifications that were rejected. This may occur if the queue is already full. | |
Notifications Failed | gauge | instance, system, notification_id | The total number of notifications that were attempted to be sent but failed. | |
Notifications Evicted | gauge | instance, system, notification_id | The total number of notifications evicted from the queue. This may occur if the configuration associated with the notification was removed. | |
Notification Failed At | gauge | epoch milliseconds | instance, system, notification_id, target_id | The timestamp of the most recent notification failure expressed in milliseconds since 01 January 1970 UTC. |
Notification Succeeded At | gauge | epoch milliseconds | instance, system, notification_id, target_id | The timestamp of the most recent successful notification expressed in milliseconds since 01 January 1970 UTC. |
Notification Failure Message | attribute | instance, system, notification_id, target_id | The notification failure message corresponding to the most recent notification failure. | |
Notification Queue Size | gauge | node, namespace, pod, container, notifier | The current notification queue occupancy. | |
Notification Queue Capacity | gauge | node, namespace, pod, container, notifier | The maximum number of notifications that can be queued. | |
Response Time | histogram | nanoseconds | node, namespace, pod, container, notifier | The response time of the remote call that sends the notification to an external system. |
Number of Entries | gauge | node, namespace, pod, container, cache | The total number of entries held in the cache. | |
Average Entry Size | gauge | bytes | node, namespace, pod, container, cache | The average cache entry size. |
Average Chunk Size | gauge | bytes | node, namespace, pod, container, cache | The average chunk size. |
Average Entries Per Chunk | gauge | node, namespace, pod, container, cache | The average number of entries in each chunk. | |
Number of Chunks | gauge | node, namespace, pod, container, cache | The total number of chunks. | |
Entity Updates | gauge | node, namespace, pod, container, cache | The number of entity updates processed. | |
Entity Removals | gauge | node, namespace, pod, container, cache | The number of entity evictions processed. |
Notification storage and retrieval Copied
Notifications are currently recorded in logs and can be retrieved from the Logs screen of the Web Console.
From the Logs screen, set the dimension filter in the From field by inputting the string {container="obcerv-app-notifications-notifier"}|logfmt
followed by additional filtering parameters for the information you would like to extract.
Notification examples Copied
All triggered notifications that have been sent:
{container="obcerv-app-notifications-notifier"}|logfmt|state="SENT"|type="TRIGGERED"
All notifications that were attempted to be sent but failed:
{container="obcerv-app-notifications-notifier"}|logfmt|state="FAILED"
All triggered notifications for the configuration called “Obcerv License”:
{container="obcerv-app-notifications-notifier"}|logfmt|state="SENT"|type="TRIGGERED"|notificationName="Obcerv License"
The following filtering tags are available:
Tag | Value(s) |
---|---|
state |
QUEUED , SENT , FAILED , EVICTED |
type |
TRIGGERED , REMINDER , CLEARED |
notifier |
SLACK , WEBHOOK |
targetName |
The name of the notification target. |
notificationId |
The ID of the notification. |
notificationName |
The name of the notification. |
severity |
The entity’s severity or the group’s maximum severity. |
message |
A message providing additional context regarding the notification state. |
group |
The triggered group (for example, [Kind: Database, Environment: DevOps] ). |
triggeredCount |
The number of entities that have been triggered. |
criticalCount |
The number of critical entities. |
warningCount |
The number of entities that have a warning severity. |
clearedCount |
The number of entities that have been cleared. |
dimensions |
The entity dimensions (for example, {pod=web-console-abc, node=itrlab} ). |
Audit log storage and retrieval Copied
Audit logs can be retrieved from the Logs screen of the Web Console.
From the Logs screen, set the dimension filter in the From field by inputting the string {container="obcerv-app-notifications"}|logfmt|class="AUDIT"
followed by additional filtering parameters for the information you would like to extract.
Audit log examples Copied
All notifications that have been created, updated, or deleted by user djohn
:
{container="obcerv-app-notifications"}|logfmt|class="AUDIT"|resource="NOTIFICATION"|user="djohn"
All targets that were deleted by user admin
:
{container="obcerv-app-notifications"}|logfmt|class="AUDIT"|resource="TARGET"|action="DELETE"|user="admin"
Notifications or targets that were updated by any user:
{container="obcerv-app-notifications"}|logfmt|class="AUDIT"|action="UPDATE"
The following filtering tags are available:
Tag | Value(s) |
---|---|
class |
AUDIT |
action |
CREATE , UPDATE , DELETE |
resource |
NOTIFICATION , TARGET |
type |
SLACK , WEBHOOK , SERVICE_NOW , NONE |
name |
Name of the notification or target. |
id |
ID of the notification or target. |
message |
A human readable audit message. |
user |
User that initiated the change. |