The background task failover plugin

In the base configuration, Guidewire provides a BackgroundTaskFailoverPlugin plugin implementation that you configure to manage component lease failover. The default implementation class for this plugin is Gosu class DefaultBackgroundTaskFailoverPlugin. This default implementation makes the implicit assumption that there is no manual cleanup work required for any batch process, message destination, or startable plugin after a component lease failover.

At the failover of a component lease, the failover logic postpones the start of the failover process by several minutes, the value of static variable INITIAL_POSTPONE_TIMEOUT. Failover logic implements the timeout to handle the case in which the database was unavailable for a relatively long period of time, and then comes back online. The failover postponement provides some time for the cluster members to recover and renew their leases.

After the failover postponement completes:

  • If the cluster member that owns the lease does not return to the cluster, the automatic failover process continues.
  • If the cluster member that owns the lease does return to the cluster, the automatic failover process fails.

Active external monitoring

Some clusters use external monitoring and management software to watch the JVM processes of cluster members. Guidewire provides Gosu class ActiveExternalMonitoringBackgroundTaskFailoverPlugin as a template that you can to use in such installations. You must implement your own logic for the notifyExternalMonitoringAboutExpiredlease method in this class.

Passive external monitoring

Guidewire provides Gosu class PassiveExternalMonitoringBackgroundTaskFailoverPlugin as a template to use in requesting a cluster member status report from external monitoring and management software. In this case, the external monitoring and management software is not actively watching the JVM processes of cluster members. Instead, it provides cluster member data on request.

Any monitoring and management software that you use for this purpose needs to be able to do the following:

  • Check if a JVM process on a specified cluster member is alive and return the process uptime in seconds if the process is running.
  • Terminate a specified JVM process.
  • Start a new JVM process.

The plugin implementation manages the failover.