[ClusterLabs] strange cluster state

Ken Gaillot kgaillot at redhat.com
Wed Oct 18 17:41:52 EDT 2017


On Fri, 2017-09-29 at 15:32 +0200, Václav Mach wrote:
> Hello,
> 
> I am trying to setup simple 2 node cluster. The setup is done with 
> ansible. The whole project is available on github at 
> https://github.com/lager1/cesnet_HA (README is written in czech, but 
> other parts may be relevant).
> 
> The cluster consist of two servers - r1nren.et.cesnet.cz (r1,
> r1nren) 
> and r2nret.et.cesnet.cz (r2, r2nren). Configuration uses group for 
> resources to utilize created dependencies and colocation rules.
> 
> The resources are:
> - ping_gw
> - standby_ip
> - offline_file
> - radiator
> - racoon
> - eduroam_ping
> - mailto
> 
> Resource ping_gw is cloned to be run on both nodes.
> All the remainning resources are added to group.
> 
> When testing cluster behavior I've managed to get the cluster in an
>> strange state:
> 
> Node r2nren.et.cesnet.cz: standby
> Online: [ r1nren.et.cesnet.cz ]
> 
> Full list of resources:
> 
>   Clone Set: clone_ping_gw [ping_gw]
>       Started: [ r1nren.et.cesnet.cz ]
>       Stopped: [ r2nren.et.cesnet.cz ]
>   Resource Group: group_eduroam.cz
>       standby_ip	(ocf::heartbeat:IPaddr2):	Started
> r2nren.et.cesnet.cz
>       offline_file	(systemd:offline_file):	Stopped
>       radiator	(systemd:radiator):	Started
> r1nren.et.cesnet.cz
>       racoon	(systemd:racoon):	Stopped
>       eduroam_ping	(systemd:eduroam_ping):	Stopped
>       mailto	(ocf::heartbeat:MailTo):	Started
> r1nren.et.cesnet.cz
> 
> How is this state even possible?

What happened is that the configuration was edited during this time,
and resources that originally not part of the group were added to the
group. So, a resource may have started on one node, then was added to
the group, which required that it be moved to the other node ... but it
takes time for that to happen, so the status can show the intermediate
state.

If you set record-pending to true in the operation defaults (which will
be  the default in Pacemaker 2.0), the status will be able to show (for
example) "Stopping" instead of "Started" when it is in the process of
changing but hasn't finished yet.

Same for standby -- setting standby is instant, but it takes time to
actually move resources off the node, so status can continue to show
some running for some time. (We have a to-do item to make crm_mon
display "standby (with active resources)" rather than just "standby"
for this case.)

> According to the docs, the node may not run any resources when it is
> in 
> standby state. Also all the resources should run on same node and
> all 
> the resources should be started in the defined order. The output
> above 
> does not match that.
> 
> I'm not totally sure if the attached logs were created when this
> problem 
> occured, but I think they should.
> 
> Thanks for help.
> 
> Regards,
> Vaclav
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list