[Pacemaker] Best way to check if PM is alive

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Dec 14 08:45:52 EST 2010


On Thu, Dec 09, 2010 at 02:58:33PM +0300, Evgeniy Ivanov wrote:
> On Thu, Dec 9, 2010 at 2:32 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> > On Thu, Dec 9, 2010 at 12:14 PM, Evgeniy Ivanov <lolkaantimat at gmail.com> wrote:
> >> Hi,
> >>
> >> What is a best way to check if PM is still alive?
> >
> > "ps axf | grep crmd" is one approach
> It just means that crmd is alive, but doesn't give information about
> its state, e.g. theoretically it can hang in some internal logic
> (something like  "endless loop"). So we need something to ask "Hey,
> PM! Are your brains still OK?".

The closest is "crmadmin -D" followed by "crmadmin -S" to check
the status of the DC node. Or crmadmin -S on all nodes.



> >> We tried following approach: there is a softdog timer (max value is
> >> 300s + extra 60s to give PM another chance) initially started and
> >> checked by third party. Clone named HA_alive fails in monitor (except
> >> first time), monitor interval is 200s. HA_alive:start should reset
> >> that softdog timer. It looks like sometimes PM doesn't restart failed
> >> resource for that 360s with no reason: system is almost IDLE.
> >
> > Strange.  Should work. Details?
> It's dual-node cluster based on openais-0.80.3-26.1 and
> pacemaker-1.0.3-4.1. Solution I've described worked fine on my
> cluster, but regularly failed without a reason on some another
> clusters. The logs (/var/log/messages) say, that PM noticed a failure
> in monitor, but later it didn't restart (no stop and no start) the
> HA_alive resource, thus in 360s system died. I didn't notice anything
> else in logs...
> I will be able to share some /var/log/messages, if I get access to
> failed clusters.
> -- 
> Evgeniy Ivanov
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

More information about the Pacemaker mailing list