<br><br>On Wednesday, December 5, 2012, David Vossel wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
----- Original Message -----<br>
> From: "Kazunori INOUE" <<a href="javascript:;" onclick="_e(event, 'cvml', 'inouekazu@intellilink.co.jp')">inouekazu@intellilink.co.jp</a>><br>
> To: "The Pacemaker cluster resource manager" <<a>pacemaker@oss.clusterlabs.org</a>><br>
> Sent: Monday, December 3, 2012 11:41:56 PM<br>
> Subject: Re: [Pacemaker] node status does not change even if pacemakerd dies<br>
><br>
> (12.12.03 20:24), Andrew Beekhof wrote:<br>
> > On Mon, Dec 3, 2012 at 8:15 PM, Kazunori INOUE<br>
> > <<a>inouekazu@intellilink.co.jp</a>> wrote:<br>
> >> (12.11.30 23:52), David Vossel wrote:<br>
> >>><br>
> >>> ----- Original Message -----<br>
> >>>><br>
> >>>> From: "Kazunori INOUE" <<a>inouekazu@intellilink.co.jp</a>><br>
> >>>> To: "pacemaker@oss" <<a>pacemaker@oss.clusterlabs.org</a>><br>
> >>>> Sent: Friday, November 30, 2012 2:38:50 AM<br>
> >>>> Subject: [Pacemaker] node status does not change even if<br>
> >>>> pacemakerd dies<br>
> >>>><br>
> >>>> Hi,<br>
> >>>><br>
> >>>> I am testing the latest version.<br>
> >>>> - ClusterLabs/pacemaker 9c13d14640(Nov 27, 2012)<br>
> >>>> - corosync 92e0f9c7bb(Nov 07, 2012)<br>
> >>>> - libqb 30a7871646(Nov 29, 2012)<br>
> >>>><br>
> >>>><br>
> >>>> Although I killed pacemakerd, node status did not change.<br>
> >>>><br>
> >>>> [dev1 ~]$ pkill -9 pacemakerd<br>
> >>>> [dev1 ~]$ crm_mon<br>
> >>>> :<br>
> >>>> Stack: corosync<br>
> >>>> Current DC: dev2 (2472913088) - partition with quorum<br>
> >>>> Version: 1.1.8-9c13d14<br>
> >>>> 2 Nodes configured, unknown expected votes<br>
> >>>> 0 Resources configured.<br>
> >>>><br>
> >>>><br>
> >>>> Online: [ dev1 dev2 ]<br>
> >>>><br>
> >>>> [dev1 ~]$ ps -ef|egrep 'corosync|pacemaker'<br>
> >>>> root 11990 1 1 16:05 ? 00:00:00 corosync<br>
> >>>> 496 12010 1 0 16:05 ? 00:00:00<br>
> >>>> /usr/libexec/pacemaker/cib<br>
> >>>> root 12011 1 0 16:05 ? 00:00:00<br>
> >>>> /usr/libexec/pacemaker/stonithd<br>
> >>>> root 12012 1 0 16:05 ? 00:00:00<br>
> >>>> /usr/libexec/pacemaker/lrmd<br>
> >>>> 496 12013 1 0 16:05 ? 00:00:00<br>
> >>>> /usr/libexec/pacemaker/attrd<br>
> >>>> 496 12014 1 0 16:05 ? 00:00:00<br>
> >>>> /usr/libexec/pacemaker/pengine<br>
> >>>> 496 12015 1 0 16:05 ? 00:00:00<br>
> >>>> /usr/libexec/pacemaker/crmd<br>
> >>>><br>
> >>>><br>
> >>>> We want the node status to change to<br>
> >>>> OFFLINE(stonith-enabled=false),<br>
> >>>> UNCLEAN(stonith-enabled=true).<br>
> >>>> That is, we want the function of this deleted code.<br>
> >>>><br>
> >>>> <a href="https://github.com/ClusterLabs/pacemaker/commit/dfdfb6c9087e644cb898143e198b240eb9a928b4" target="_blank">https://github.com/ClusterLabs/pacemaker/commit/dfdfb6c9087e644cb898143e198b240eb9a928b4</a><br>
> >>><br>
> >>><br>
> >>> How are you launching pacemakerd? The systemd service script<br>
> >>> relaunches<br>
> >>> pacemakerd on failure and pacemakerd has the ability to attach to<br>
> >>> all the<br>
> >>> old processes if they are still around as if nothiAh yes, that is a problem.<br>
<br>
Having pacemaker still running when the init script says it is down... that is bad. Perhaps we should just make the init script smart enough to check to make sure all the pacemaker components are down after pacemakerd is down.<br>
<br>
The argument of whether or not the failure of pacemakerd is something that the cluster should be alerted to is something i'm not sure about. With the corosync 2.0 stack, pacemakerd really doesn't do anything except launch processes/relaunch processes. A cluster can be completely functional without a pacemakerd instance running anywhere. If any of the actual pacemaker components on a node fail, the logic that causes that node to get fenced has nothing to do with pacemakerd.</blockquote>
<div><br></div><div>The point about init being confused is valid, but I'd prefer to find a way to get the right info than to fix the problem by causing more downtime.</div><div><br></div><div>Having pacemakerd reattach to the existing services when it gets respawned works for systemd, but we should probably make the LSB status check smarter too.<span></span></div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
-- Vossel<br>
<br>
<br>
> > This isnt the case when the plugin is in use though, but then I'd<br>
> > also<br>
> > have expected most of the processes to die also.<br>
> ><br>
> Since node status will also change if such a result is brought,<br>
> we desire to become so.<br>
><br>
> >><br>
> >> ----<br>
> >> $ cat /etc/redhat-release<br>
> >> Red Hat Enterprise Linux Server release 6.3 (Santiago)<br>
> >><br>
> >> $ ./configure --sysconfdir=/etc --localstatedir=/var<br>
> >> --without-cman<br>
> >> --without-heartbeat<br>
> >> -snip-<br>
> >> pacemaker configuration:<br>
> >> Version = 1.1.8 (Build: 9c13d14)<br>
> >> Features = generated-manpages agent-manpages<br>
> >> ascii-docs<br>
> >> publican-docs ncurses libqb-logging libqb-ipc lha-fencing<br>
> >> corosync-native<br>
> >> snmp<br>
> >><br>
> >><br>
> >> $ cat config.log<br>
> >> -snip-<br>
> >> 6000 | #define BUILD_VERSION "9c13d14"<br>
> >> 6001 | /* end confdefs.h. */<br>
> >> 6002 | #include <gio/gio.h><br>
> >> 6003 |<br>
> >> 6004 | int<br>
> >> 6005 | main ()<br>
> >> 6006 | {<br>
> >> 6007 | if (sizeof (GDBusProxy))<br>
> >> 6008 | return 0;<br>
> >> 6009 | ;<br>
> >> 6010 | return 0;<br>
> >> 6011 | }<br>
> >> 6012 configure:32411: result: no<br>
> >> 6013 configure:32417: WARNING: Unable to support systemd/upstart.<br>
> >> You need<br>
> >> to use glib >= 2.26<br>
> >> -snip-<br>
> >> 6286 | #define BUILD_VERSION "9c13d14"<br>
> >> 6287 | #define SUPPORT_UPSTART 0<br>
> >> 6288 | #define SUPPORT_SYSTEMD 0<br>
> >><br>
> >><br>
> >> Best Regards,<br>
> >> Kazunori INOUE<br>
> >><br>
> >><br>
> >>><br>
> >>>> related bugzilla:<br>
> >>>> <a href="http://bugs.clusterlabs.org/show_bug.cgi?id=5064" target="_blank">http://bugs.clusterlabs.org/show_bug.cgi?id=5064</a><br>
> >>>><br>
> >>>><br>
> >>>> Best Regards,<br>
> >>>> Kazunori INOUE<br>
> >>>><br>
> >>>> _______________________________________________<br>
> >>>> Pacemaker mailing list: <a>Pacemaker@oss.clusterlabs.org</a><br>
> >>>> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
> >>>><br>
> >>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> >>>> Getting started:<br>
> >>>> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> >>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
> >>>><br>
> >>><br>
> >>> _______________________________________________<br>
> >>> Pacemaker mailing list: <a>Pacemaker@oss.clusterlabs.org</a><br>
> >>> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
> >>><br>
> >>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> >>> Getting started:<br>
> >>> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> >>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
> >><br>
> >><br>
> >> _______________________________________________<br>
> >> Pacemaker mailing list: <a>Pacemaker@oss.clusterlabs.org</a><br>
> >> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
> >><br>
> >> Project Hom</blockquote>