« Generic_clear trigger not working | Main | Applying 7.1.0.3-TIV-SRM-IF0002 resulted in errors while attempting to run reports »

OMNIbus Probe Peer to Peer failover

In Tivoli Netcool/OMNIbus it isn't only servers that are highly resilient. OMNIbus probes can also be configured to provide hot-standby.

Two OMNIbus probes can be configured to act as a peer-to-peer failover pair, in a master/slave relationship. Both probes receive a copy of every incoming event but the Slave will only forward them to the ObjectServer if it determines that the Master has failed.

The probe property ‘Mode’ determines which role the probe assumes on starting. It can be set to either “master” or “slave”.

The Master/Slave relationship relies on a heartbeat passing between the two probes. Two properties, ‘PeerHost’ and ‘PeerPort’, tell each probe where to find its partner and the port on which to communicate with it.

A heartbeat poll is sent by the Master at a frequency determined, as a number of seconds, by the 'BeatInterval' property. The default is 2 seconds. So that both probes use the same figure, the Slave receives the value of BeatInterval from the Master, ignoring the value set in its local properties.

Between each heartbeat the Slave stores all the events it receives, discarding all old events each time it receives a new heartbeat.

Should the Slave fail to receive a heartbeat, the property ‘BeatThreshold’ specifies the extra period that it waits before switching to active mode and forwarding all the events in its cache to the ObjectServer. The default is 1 second. Note that there is also a 1 second timeout when waiting for heartbeats so there is a maximum delay of (BeatInterval + Threshold + 1) seconds before the Slave forwards its cached events. The Slave continues to forward all events until it receives another heartbeat from the original Master.

Note that there is no mechanism by which the Slave can know which of the events in its cache have also been forwarded to the ObjectServer by the Master. As such it is possible for some events to be duplicated (typically affecting the Tally and LastOccurrence fields).

It is also possible to switch the mode of a probe between Master and Slave in the rules file. For example, to switch a probe instance to become the master, use the rules file syntax: %Mode = "master"

There is a delay of up to one second before the mode change takes effect which can result in duplicate events if two probe instances are switching mode.

Note: Peer-to-peer failover is not supported for all probes. To determine if it is supported by a particular probe, run the command $OMNIHOME/probes/nco_p_probename -dumpprops

If the properties Mode, PeerHost, and PeerPort are listed then peer-to-peer failover is supported.

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>