[ClusterLabs] host in standby causes havoc

Thu Jun 15 07:23:29 EDT 2023

On 15.06.2023 13:58, Kadlecsik József wrote:
> Hello,
> 
> We had a strange issue here: 7 node cluster, one node was put into standby
> mode to test a new iscsi setting on it. During configuring the machine it
> was rebooted and after the reboot the iscsi didn't come up. That caused a
> malformed communication (atlas5 is the node in standby) with the cluster:
> 
> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  warning: Unexpected
> result (error) was recorded for probe of ocsi on atlas5 at Jun 15 10:09:32 2023

It sounds like resource agent problem. You need to investigate why probe 
returned an error.

> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  notice: If it is not
> possible for ocsi to run on atlas5, see the resource-discovery option for
> location constraints
> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  error: Resource ocsi
> is active on 2 nodes (attempting recovery)
> 
> The resource was definitely not active on 2 nodes. And that caused a storm
> of killing all virtual machines as resources.
> 
> How could one prevent such cases to come up?
> 

standby does not stop cluster from running, it simply tells pacemaker to 
exclude this node from possible candidates to run resources. To avoid 
any unwanted interaction (also due to possible resource agent or other 
software bugs) you could simply stop pacemaker and disable auto-startup.