Choosing your ITRS Analytics deployment
This guide helps you choose how to deploy ITRS Analytics based on what matters most to your organization, including where the platform is hosted, who operates it, resiliency, and backup capabilities. It also explains the trade-offs associated with each option. Your choice of deployment model directly affects high availability, operational continuity, and your ability to meet uptime and compliance requirements.
Use the steps below to align your deployment with your organization’s requirements.
Identify your business requirements Copied
Before choosing a deployment model, first define your organization’s requirements. Consider the following:
- Data residency — Where must your data be stored? Is an AWS region acceptable, or must it remain on-premises or within a private cloud?
- Operational ownership — Who will operate the platform? Should ITRS manage the infrastructure and operations, or will your team be responsible for running it?
- Availability and resilience — Do you require production-grade high availability or backup and restore capabilities, or is a simpler setup sufficient for evaluation purposes?
- Kubernetes environment — Do you already have a Kubernetes platform or team in place (for example, EKS, AKS, GKE, OpenShift, or a self-hosted Kubernetes cluster)?
- Compliance requirements — Are there compliance, audit, or data retention requirements that influence where or how the platform must run?
- Cost model preference — Do you prefer a predictable managed bundle (software and hosting) or a software-only model using your own infrastructure?
Choose between SaaS and Self-hosted Copied
Your first decision is whether ITRS manages the platform for you (SaaS) or your organization operates it on your own infrastructure (self-hosted). This choice determines who is responsible for running, maintaining, and securing the platform, as well as where the system is hosted.
| Feature | SaaS | Self-hosted |
|---|---|---|
| Hosting location | AWS Public Cloud, managed by ITRS | On-premises, private cloud, or public cloud via Bring Your Own (Kubernetes) Cluster or Embedded Cluster |
| Platform management | ITRS Cloud Operations teams | Internal Kubernetes DevOps team |
| Data residency | Client’s choice of AWS Region (costs may vary by region) | Client’s full choice of location and environment |
| Security | TLS, mTLS, Equinix Cloud Connect | TLS, mTLS, service mesh |
| High availability | Deployment spans two availability zones within a single region | Requires an HA Kubernetes design: minimum three controller nodes for Embedded Cluster HA, workload replicas distributed across failure domains, and a customer-managed load balancer; see resource and hardware requirements |
| Data backup | Daily automated backups | Daily backups available for BYOC deployments |
| Upgrade management | Planned upgrades, client-approved, no more than two weeks after a release | Client’s responsibility for image mirroring and upgrade execution |
| Application customization | Full identity, role, and access management available with SSO | Full identity, role, and access management available with SSO |
| Cost model | Software costs plus hosting costs (Small, Medium, Large, Extra-Large sizes); optional 2 TB additional storage | Software costs only; infrastructure costs are the client’s responsibility |
When to choose SaaS Copied
Choose SaaS when:
- You want ITRS to manage platform operations, upgrades, and infrastructure health.
- Your team does not have dedicated Kubernetes or cloud operations expertise.
- You require a predictable, fully managed cost model that bundles software and hosting.
- Data residency requirements can be met by an available AWS region.
- You need enterprise-grade high availability across two availability zones without managing the underlying infrastructure yourself.
When to choose Self-hosted Copied
Choose self-hosted when:
- Your organization requires data to remain within a specific on-premises environment or private cloud that is not covered by SaaS regional options.
- You have an existing Kubernetes operations team capable of managing platform lifecycle.
- Your organization’s procurement, security, or compliance policies require full infrastructure ownership.
- You need to integrate ITRS Analytics into an existing internal platform ecosystem (networking, security tooling, storage, observability pipelines).
Note
With self-hosted deployments, your internal teams are responsible for patching, upgrades, backup management, and maintaining infrastructure resiliency. Ensure you have the appropriate platform expertise before selecting this model.
If your requirements point to self-hosted, the next step is how you run Kubernetes.
Key resiliency concepts Copied
When planning your ITRS Analytics deployment, these fundamental concepts work together to define the platform’s operational characteristics.
High availability (HA) Copied
High availability ensures that your observability platform continues to operate without interruption, even if individual components fail. This is achieved by deploying redundant services, load balancers, and failover mechanisms so that if one workload becomes unavailable, another seamlessly takes over.
Key characteristics:
- Both BYOC and Embedded Cluster support full high availability.
- In multi-node deployments, no node should have a round-trip time (RTT) greater than 10 ms to any other node in the Kubernetes cluster.
- The key architectural difference is storage:
- BYOC typically uses network-attached persistent volumes, so workloads may be rescheduled to surviving nodes after a failure, provided sufficient spare capacity exists.
- Embedded Cluster uses node-local persistent volumes, so when a node fails, workloads that depend on that storage cannot be rescheduled elsewhere.
- As a result, an Embedded Cluster continues running in a degraded state until the affected node returns, with fewer replicas, reduced capacity, and lower fault tolerance.
Continuous operations Copied
Continuous operations means that the platform keeps running during localized failures (pod or node outages) within a single cluster.
Important
ITRS Analytics does not provide built-in cross-site or cross-region disaster recovery. For protection against data center or regional failures, you must run multiple independent deployments and implement your own DR strategy (sync, failover, runbooks).
Deploy Kubernetes in a Self-hosted environment Copied
ITRS Analytics is built on a Kubernetes-native architecture, designed for continuous high availability, scalable deployments, and resilient operations. If you select self-hosted, you must then choose how Kubernetes is provisioned: either by Bring Your Own (Kubernetes) Cluster (BYOC) or by using ITRS’s bundled Kubernetes distribution (Embedded Cluster).
Note
Bring Your Own (Kubernetes) Cluster (BYOC) is the recommended deployment model for production. It typically uses network-attached persistent volumes, which means workloads can be rescheduled to surviving nodes after a failure if sufficient spare capacity exists and the volumes remain accessible over the network.
Embedded Cluster also supports high availability, but it uses node-local persistent volumes. If a node fails, workloads that depend on that storage cannot be rescheduled elsewhere, so the cluster continues in a degraded state until the affected node returns. Choose Embedded Cluster for production environments where Kubernetes expertise is not available.
Designing a resilient ITRS Analytics deployment Copied
- Select BYOC when enterprise high availability, backup and restore, standard security tooling integration, and predictable scaling are required, and when your Kubernetes platform can reschedule workloads onto surviving nodes with sufficient spare capacity.
- Select Embedded Cluster for production environments where Kubernetes expertise is unavailable; note that a failed node causes the cluster to run in a degraded state with fewer replicas and reduced capacity until that node returns, because workloads cannot be rescheduled to surviving nodes.
| Feature | Bring Your Own Cluster (BYOC) | Embedded Cluster |
|---|---|---|
| Platform ownership | Customer-managed Kubernetes (EKS, AKS, GKE, OpenShift, self-hosted) | Kubernetes bundled and managed through ITRS-packaged K0s |
| High availability | Full HA with workload rescheduling across surviving nodes when a node fails, provided sufficient spare capacity exists and network-attached storage remains accessible | Full HA through application-level replication; a node failure causes degraded operation with fewer replicas and reduced capacity until the node returns, because workloads cannot be rescheduled due to node-local storage |
| Storage architecture | Persistent volumes are decoupled through storage classes; supports dynamic expansion | Storage is tied to local node disks; data loss risk if a node fails without HA configured |
| Backup and restore | Supported using platform tooling such as Velero | Infrastructure-level backup should be used; Velero does not support the node-local filesystem storage classes used by Embedded Cluster, so use VM snapshots from the hypervisor or storage-level snapshots from the underlying storage platform |
| Load balancing and networking | Supports native cloud and enterprise load balancers with DNS integration, such as AWS NLB, Azure Load Balancer, GCP Load Balancer, or F5 | No built-in load balancer; customers must supply and manage their own, such as HAProxy, keepalived, F5, AWS NLB, Azure Load Balancer, or GCP Load Balancer. A bundled software load balancer is not included because these solutions depend on environment-specific external network cooperation such as ARP/GARP or BGP, which is commonly restricted or unsupported across cloud and many on-premises networks |
| Security | Kubernetes-native security model integrates cleanly with platform operations and customer-controlled controls such as network policies, admission controllers, and pod security standards | Host-level security controls such as antivirus, EDR agents, SSL/TLS inspection, and host firewalls must be validated and exclusions configured before installation; these controls can block image pulls, container runtime activity, or inter-node communication |
| Operational responsibility | Clearly divided across infrastructure, platform, and application teams; cluster issues resolved at the appropriate layer | Cluster issues must be escalated to ITRS because customers do not have direct access to the bundled K0s Kubernetes layer or its diagnostic tools |
| Maintenance and patching | Integrates with existing customer patching and lifecycle processes for Kubernetes, OS, storage, and networking | Increased coordination risk; patching and upgrades may require downtime and careful change management |
| Disaster recovery | Not built-in; deploy multiple independent ITRS Analytics instances for DR | Not built-in; deploy multiple independent ITRS Analytics instances for DR |
The following scenarios illustrate how the choice between BYOC and Embedded Cluster plays out in practice across key operational areas: load balancing, storage scalability, recovery behavior, security, and team responsibilities.
Ensuring resilient access with load balancers Copied
Scenario: Your organization runs multiple ITRS Analytics ingestion services and UIs that must remain accessible even during high traffic spikes.
In a Bring Your Own Cluster environment, especially in cloud-based setups, a load balancer is typically readily available and integrates seamlessly with Kubernetes. It distributes traffic across multiple service replicas and often integrates with DNS services, helping maintain stable URLs and endpoints during scaling events or network changes.
In Embedded Cluster deployments, a load balancer is still required but is not provided as part of the deployment. Customers must supply and manage their own load balancer, which can be hardware-based or software-based. This usually requires additional planning and coordination with the network or infrastructure team.
Scaling storage dynamically with decoupled storage classes Copied
Scenario: Your ClickHouse workload grows steadily from 500GB to several terabytes of data over time.
In a Bring Your Own (Kubernetes) Cluster (BYOC) environment, storage is decoupled from individual nodes. Persistent volumes remain on network-attached backing storage, so rescheduled pods can mount the same volumes from surviving nodes that can reach the storage network, and extensible storage classes allow volumes to grow seamlessly as data increases.
In Embedded Cluster (EC) deployments, storage is tied to local node disks. If a node becomes unavailable, the workloads depending on that node cannot be rescheduled elsewhere, and the system may operate in a degraded state until the node comes back online.
Understanding degraded operation after node failure Copied
Scenario: A node in your ITRS Analytics deployment fails unexpectedly during peak monitoring hours.
In a Bring Your Own (Kubernetes) Cluster (BYOC) environment, Kubernetes may reschedule affected workloads onto surviving nodes if sufficient spare capacity exists and network-attached persistent volumes remain accessible. Services can remain available with limited disruption, but this depends on the resilience and capacity of the underlying Kubernetes platform.
In an Embedded Cluster (EC) deployment, surviving replicas continue running automatically, but workloads tied to the failed node’s local storage cannot be rescheduled elsewhere. The cluster therefore runs in a degraded state with fewer replicas, reduced capacity, and lower fault tolerance until the node comes back online. If another node fails while the cluster is already degraded, the risk of service disruption or data unavailability increases because there is less remaining redundancy.
Meeting security requirements Copied
Scenario: Your IT security team requires visibility and control over Kubernetes-native security policies.
In a Bring Your Own Cluster environment, customers can apply and manage Kubernetes-native controls such as network policies, admission controllers, and pod security standards as part of standard cluster governance.
In an Embedded Cluster deployment, this is a less meaningful differentiator because customers have already accepted a more black-box operating model for the Kubernetes layer running on the provided virtual machines.
Scenario: Your organization secures all servers with antivirus, EDR agents, SSL/TLS inspection, and host firewalls.
For Embedded Cluster, these host-level controls must be validated before installation. Customers should configure exclusions or bypass rules so that security tooling does not block installation steps, image pulls, container runtime activity, service-to-service TLS, or inter-node communication. In BYOC environments, these controls are typically handled through the organization’s existing Kubernetes platform hardening and node management practices.
Streamlined support across teams Copied
Scenario: Your organization has separate teams for infrastructure, platform, and application operations.
In a Bring Your Own Cluster setup, responsibilities are clearly divided: infrastructure teams manage nodes, platform teams administer Kubernetes, and application teams deploy and manage ITRS Analytics. Issues can be addressed at the appropriate layer.
With Embedded Cluster, cluster-level issues must be escalated to ITRS because customers do not have direct access to the bundled K0s Kubernetes layer or the usual platform-level diagnostic tools needed to investigate those problems themselves.
Deployment scenarios Copied
The following sections describe various deployment scenarios, each with specific benefits and trade-offs. Understanding these helps you select the right configuration for your requirements.
Non-HA single or multi-node (BYOC) Copied
This configuration is suitable for proof-of-concept deployments and smaller production environments where high availability is not a strict requirement.
Common use cases:
- SaaS proof-of-concepts
- Small SaaS Geneos or Opsview observability deployments
- Development and testing environments
Characteristics:
- Lower infrastructure costs
- Backup and restore available with 24-hour recovery time objective
- Suitable for environments with flexible uptime requirements
- Managed by ITRS cloud operations teams for SaaS deployments
Note
Proof-of-concept deployments come with no guarantee of high availability for stored data due to their exploratory nature.
Non-HA single or multi-node (Embedded Cluster) Copied
This configuration is similar to the Bring Your Own Cluster non-HA configuration, but it is deployed on-premises using Embedded Cluster. This option has additional limitations around data protection.
Common use cases:
- On-premises proof-of-concept deployments
- Small production use cases with relaxed uptime requirements
Important considerations:
- Lower infrastructure costs
- Velero-based backup and restore is not supported for node-local filesystem storage classes; use infrastructure-level backups such as hypervisor VM snapshots or storage-platform snapshots instead
- Risk of complete data loss if a node fails catastrophically
- Requires complete rebuild if storage is lost
Warning
Before using Embedded Cluster in production, plan and validate an alternative backup strategy such as hypervisor VM snapshots or storage-level snapshots. Without a tested backup strategy, a node failure that causes disk loss can result in permanent data loss with no recovery path.
Plan your rollout Copied
Use the following summary to confirm your deployment choice before proceeding to installation.
| Requirement | Recommended model |
|---|---|
| ITRS manages all infrastructure | SaaS |
| Data must stay in a specific AWS region | SaaS |
| Data must stay on-premises or in a private cloud | Self-hosted |
| Full control over Kubernetes platform | Self-hosted and BYOC |
| No in-house Kubernetes expertise; evaluation or constrained production requirements | Self-hosted and Embedded Cluster |
| No in-house Kubernetes expertise; production deployment with sufficient replicas on all nodes and a validated external backup strategy | Self-hosted and Embedded Cluster |
| Production-grade high availability required | SaaS or Self-hosted (BYOC or Embedded Cluster with sufficient replicas and accepted degraded-operation trade-offs) |
| Backup and restore required | SaaS or Self-hosted and BYOC |
| Security team must enforce Kubernetes-native controls such as network policies, admission controllers, or image and pod-layer scanning | Self-hosted and BYOC |
| Existing cloud Kubernetes service (EKS, AKS, GKE) | Self-hosted and BYOC |
| Deployment on VMs or bare metal without Kubernetes | Self-hosted and Embedded Cluster |
| Strict compliance or audit data retention requirements | SaaS or Self-hosted and BYOC |
To continue planning your deployment, refer to the following resources:
- For backup and restore procedures, see the Backup and restore documentation.
- For infrastructure sizing and resource requirements, see ITRS Analytics Sizer.