[ClusterLabs] PCMK_node_start_state=standby sometimes does not work
Ken Gaillot
kgaillot at redhat.com
Fri Dec 1 18:02:24 EST 2017
On Tue, 2017-11-28 at 09:36 +0000, 井上 和徳 wrote:
> Hi,
>
> Sometimes a node with 'PCMK_node_start_state=standby' will start up
> Online.
>
> [ reproduction scenario ]
> * Set 'PCMK_node_start_state=standby' to /etc/sysconfig/pacemaker.
> * Delete cib (/var/lib/pacemaker/cib/*).
> * Start pacemaker at the same time on 2 nodes.
> # for i in rhel74-1 rhel74-3 ; do ssh -f $i systemctl start
> pacemaker ; done
>
> [ actual result ]
> * crm_mon
> Stack: corosync
> Current DC: rhel74-3 (version 1.1.18-2b07d5c) - partition with
> quorum
> Last change: Wed Nov 22 06:22:50 2017 by hacluster via crmd on
> rhel74-3
>
> 2 nodes configured
> 0 resources configured
>
> Node rhel74-3: standby
> Online: [ rhel74-1 ]
>
> * cib.xml
> <nodes>
> <node id="3232261507" uname="rhel74-1"/>
> <node id="3232261509" uname="rhel74-3">
> <instance_attributes id="nodes-3232261509">
> <nvpair id="nodes-3232261509-standby" name="standby"
> value="on"/>
> </instance_attributes>
> </node>
> </nodes>
>
> * pacemaker.log
> Nov 22 06:22:50 [20755] rhel74-1 crmd: (cib_native.c:462 )
> warning: cib_native_perform_op_delegate: Call failed: No such
> device or address
> Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320
> ) info: update_attr_delegate: Update <node
> id="3232261507">
> Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320
> ) info: update_attr_delegate: Update <instance_attribut
> es id="nodes-3232261507">
> Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320
> ) info: update_attr_delegate: Update <nvpair
> id="nodes-3232261507-standby" name="standby" value="on"/>
> Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320
> ) info: update_attr_delegate: Update </instance_attribu
> tes>
> Nov 22 06:22:50 [20755] rhel74-1 crmd: ( cib_attrs.c:320
> ) info: update_attr_delegate: Update </node>
>
> * I attached crm_report to GitHub (too big to attach to this email),
> so look at it.
> https://github.com/inouekazu/pcmk_report/blob/master/pcmk-Wed-22-N
> ov-2017.tar.bz2
>
>
> I think that the additional timing of <node id="3232261507">*1 and
> <instance_attributes id="nodes-3232261507">*2 is the cause.
> *1 <node id="3232261507" uname="rhel74-1"/>'
> *2 <instance_attributes id="nodes-3232261507">
> <nvpair id="nodes-3232261507-standby" name="standby"
> value="on"/>
>
> I expect to be fixed, but if it's difficult, I have two questions.
> 1) Does this only occur if there is no cib.xml (in other words, there
> is no <node> element)?
I believe so. I think this is the key message:
Nov 22 06:22:50 [20750] rhel74-1 cib: ( callbacks.c:1101 )
warning: cib_process_request: Completed cib_modify operation for
section nodes: No such device or address (rc=-6, origin=rhel74-
1/crmd/12, version=0.3.0)
PCMK_node_start_state works by setting the "standby" node attribute in
the CIB. However, it does this via a "modify" command that assumes the
<nodes> tag already exists.
If there is no CIB, pacemaker will quickly create one -- but in this
case, the node tries to set the attribute before that's happened.
Hopefully we can come up with a fix. If you want, you can file a bug
report at bugs.clusterlabs.org, to track the progress.
> 2) Is there any workaround other than "Do not start at the same
> time"?
>
> Best Regards
Before starting pacemaker, if /var/lib/pacemaker/cib is empty, you can
create a skeleton CIB with:
cibadmin --empty > /var/lib/pacemaker/cib/cib.xml
That will include an empty <nodes/> tag, and the modify command should
work when pacemaker starts.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list