[Pacemaker] S_POLICY_ENGINE state continues being maintained

Andrew Beekhof andrew at beekhof.net
Thu May 23 02:58:46 EDT 2013


On 23/05/2013, at 4:44 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:

> Hi,
> 
> I'm using pacemaker-1.1 (c3486a4a8d. the latest devel).
> After fencing caused by split-brain failed 11 times, S_POLICY_ENGINE state is kept even if I recover split-brain.

Well thats annoying, I'll have a look in the morning.

> 
> 1. disconnect network connection
> [dev1 ~]$ crm_mon
> Last updated: Thu May 23 13:16:41 2013
> Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1
> Stack: corosync
> Current DC: dev1 (3232261525) - partition WITHOUT quorum
> Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> 
> 
> Node dev2 (3232261523): UNCLEAN (offline)
> Online: [ dev1 ]
> 
> f1      (stonith:external/libvirt.NG):  Started dev2
> f2      (stonith:external/libvirt.NG):  Started dev1
> 
> [dev2 ~]$ crm_mon
> Last updated: Thu May 23 13:16:41 2013
> Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1
> Stack: corosync
> Current DC: dev2 (3232261523) - partition WITHOUT quorum
> Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> 
> 
> Node dev1 (3232261525): UNCLEAN (offline)
> Online: [ dev2 ]
> 
> f1      (stonith:external/libvirt.NG):  Started dev2
> f2      (stonith:external/libvirt.NG):  Started dev1
> 
> 
> 2. wait until fencing failed 11 times
> [dev1 ~]$ egrep "CRIT|too_many_st_failures" /var/log/ha-log
> May 23 13:16:46 dev1 stonith: [24981]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev2 failed with rc 1
> (snip)
> May 23 13:17:24 dev1 stonith: [25105]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev2 failed with rc 1
> May 23 13:17:28 dev1 stonith: [25118]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev2 failed with rc 1
> May 23 13:17:28 dev1 crmd[24868]:   notice: too_many_st_failures: Too many failures to fence dev2 (11), giving up
> 
> [dev2 ~]$ egrep "CRIT|too_many_st_failures" /var/log/ha-log
> May 23 13:16:46 dev2 stonith: [7177]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev1 failed with rc 1
> (snip)
> May 23 13:17:23 dev2 stonith: [7295]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev1 failed with rc 1
> May 23 13:17:28 dev2 stonith: [7309]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev1 failed with rc 1
> May 23 13:17:28 dev2 crmd[7107]:   notice: too_many_st_failures: Too many failures to fence dev1 (11), giving up
> 
> 
> 3. recover network disconnection
> [dev1 ~]$ crm_mon
> Last updated: Thu May 23 13:24:23 2013
> Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1
> Stack: corosync
> Current DC: dev2 (3232261523) - partition with quorum
> Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> 
> 
> Online: [ dev1 dev2 ]
> 
> f1      (stonith:external/libvirt.NG):  Started dev2
> f2      (stonith:external/libvirt.NG):  Started dev1
> 
> 
> S_POLICY_ENGINE state continues being maintained although a member's join seems to have succeeded.
> 
> [13:47:54 root at dev1 ~]$ crmadmin -S dev2
> Status of crmd at dev2: S_POLICY_ENGINE (ok)
> 
> 
> Best Regards,
> Kazunori INOUE
> <keeping-S_POLICY_ENGINE.tar.bz2>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list