- 22 Sep 2021
- 3 Minutes to read
- Print
- DarkLight
About Clustering
- Updated on 22 Sep 2021
- 3 Minutes to read
- Print
- DarkLight
Overview
A Cluster is a configuration of one or more Decisions Application Servers running on the same database back-end. The servers work together to appropriately distribute load/work so that one server is not over-or under-utilized.
High Availability
High Availability configurations require the removal of any single point of failure in the processing chain. This is done by creating a Cluster of at least two Decisions Application Servers, but also requires that the customer configures and maintains a Load Balancer that is capable of routing traffic and running Health Checks on Application Servers.
High Availability is configured by establishing an Active-Active Cluster, preferably with geographic separation (or different regions or zones in MS or Amazon's Cloud offerings). The Active-Active Cluster is able to provide day-to-day processing benefits by having more computing power dedicated to the system, but most importantly the two application servers provide redundancy.
Transaction Data and Peer Communication
The Decisions platform relies on a set of services and capabilities that are mostly stateless, such as designing a Rule or a Workflow. The processes that customers produce may be very long-running and stateful, long-running and stateless, or short running and stateless.
All Workflows and Rules have the ability to store data. This Stored Data is immediately written and not at risk during an outage. Uncommitted Data in a Flow or a Rule is the only data that is potentially at risk.
Stateful Workflows are able to be resumed at their last state making them more resilient when service is interrupted. Short Running Workflows execute in milliseconds so the possibility of loss during an outage is minimal.
The servers in a Cluster communicate with one another to clear Cached Data. The servers do not send large complex messages to one another to maintain state. Instead, the servers let one another know when data has changed and should, therefore, be reloaded from the system of record. This makes Cluster communication very efficient.
This approach ensures performance, and when considering how Clusters Failover, also has implications. The easiest way to think about the implication is to imagine a currently executing Flow or Rule that is actively executing on the VM (or the Container's CPU Resources), and is only present and understood on that VM (or container) at that moment. If the server experiences an abortive interruption, such as a power outage, that execution of the Flow or Rule engine will be lost. In a Clustered Environment this is also true, but any subsequent executions of the Rule and Flow engine will be run on the still operating server to minimize disruption.
When a Flow or Rule's execution is critical, and even this very small chance of interruption is unacceptable, there is a pattern of "Leased Work" and "Work Queues" that can be used for reliable execution of a Flow/Rule and retry attempts.
Multi-Tenant and Clusters
With a Multi-Tenant Decisions environment, there is not a technical difference in the way that the pieces operate. There is one caveat: Decisions Multi-tenant allows the Administrator to assign a Tenant Instance to one or more servers in a Cluster without assigning the Tenant to all Nodes. This configuration is uncommon and will require additional configuration on the Load Balancer.