[ClusterLabs] PCMK_node_start_state=standby sometimes does not work

Ken Gaillot kgaillot at redhat.com
Fri Dec 1 18:02:24 EST 2017


On Tue, 2017-11-28 at 09:36 +0000, 井上 和徳 wrote:
> Hi,
> 
> Sometimes a node with 'PCMK_node_start_state=standby' will start up
> Online.
> 
> [ reproduction scenario ]
>  * Set 'PCMK_node_start_state=standby' to /etc/sysconfig/pacemaker.
>  * Delete cib (/var/lib/pacemaker/cib/*).
>  * Start pacemaker at the same time on 2 nodes.
>   # for i in rhel74-1 rhel74-3 ; do ssh -f $i systemctl start
> pacemaker ; done
> 
> [ actual result ]
>  * crm_mon
>   Stack: corosync
>   Current DC: rhel74-3 (version 1.1.18-2b07d5c) - partition with
> quorum
>   Last change: Wed Nov 22 06:22:50 2017 by hacluster via crmd on
> rhel74-3
> 
>   2 nodes configured
>   0 resources configured
> 
>   Node rhel74-3: standby
>   Online: [ rhel74-1 ]
> 
>  * cib.xml
>   <nodes>
>     <node id="3232261507" uname="rhel74-1"/>
>     <node id="3232261509" uname="rhel74-3">
>       <instance_attributes id="nodes-3232261509">
>         <nvpair id="nodes-3232261509-standby" name="standby"
> value="on"/>
>       </instance_attributes>
>     </node>
>   </nodes>
> 
>  * pacemaker.log
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: (cib_native.c:462 )
> warning: cib_native_perform_op_delegate:	Call failed: No such
> device or address
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )    info: update_attr_delegate:	Update   <node
> id="3232261507">
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )    info: update_attr_delegate:	Update     <instance_attribut
> es id="nodes-3232261507">
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )    info: update_attr_delegate:	Update       <nvpair
> id="nodes-3232261507-standby" name="standby" value="on"/>
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )    info: update_attr_delegate:	Update     </instance_attribu
> tes>
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )    info: update_attr_delegate:	Update   </node>
> 
>  * I attached crm_report to GitHub (too big to attach to this email),
> so look at it.
>    https://github.com/inouekazu/pcmk_report/blob/master/pcmk-Wed-22-N
> ov-2017.tar.bz2
> 
> 
> I think that the additional timing of <node id="3232261507">*1 and
> <instance_attributes id="nodes-3232261507">*2 is the cause.
> *1 <node id="3232261507" uname="rhel74-1"/>'
> *2 <instance_attributes id="nodes-3232261507">
>      <nvpair id="nodes-3232261507-standby" name="standby"
> value="on"/>
> 
> I expect to be fixed, but if it's difficult, I have two questions.
> 1) Does this only occur if there is no cib.xml (in other words, there
> is no <node> element)?

I believe so. I think this is the key message:

Nov 22 06:22:50 [20750] rhel74-1        cib: ( callbacks.c:1101  )
warning: cib_process_request:        Completed cib_modify operation for
section nodes: No such device or address (rc=-6, origin=rhel74-
1/crmd/12, version=0.3.0)

PCMK_node_start_state works by setting the "standby" node attribute in
the CIB. However, it does this via a "modify" command that assumes the
<nodes> tag already exists.

If there is no CIB, pacemaker will quickly create one -- but in this
case, the node tries to set the attribute before that's happened.

Hopefully we can come up with a fix. If you want, you can file a bug
report at bugs.clusterlabs.org, to track the progress.

> 2) Is there any workaround other than "Do not start at the same
> time"?
> 
> Best Regards

Before starting pacemaker, if /var/lib/pacemaker/cib is empty, you can
create a skeleton CIB with:

 cibadmin --empty > /var/lib/pacemaker/cib/cib.xml

That will include an empty <nodes/> tag, and the modify command should
work when pacemaker starts.
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list