[ClusterLabs] host in standby causes havoc
Andrei Borzenkov
arvidjaar at gmail.com
Thu Jun 15 07:23:29 EDT 2023
On 15.06.2023 13:58, Kadlecsik József wrote:
> Hello,
>
> We had a strange issue here: 7 node cluster, one node was put into standby
> mode to test a new iscsi setting on it. During configuring the machine it
> was rebooted and after the reboot the iscsi didn't come up. That caused a
> malformed communication (atlas5 is the node in standby) with the cluster:
>
> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: warning: Unexpected
> result (error) was recorded for probe of ocsi on atlas5 at Jun 15 10:09:32 2023
It sounds like resource agent problem. You need to investigate why probe
returned an error.
> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: notice: If it is not
> possible for ocsi to run on atlas5, see the resource-discovery option for
> location constraints
> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: error: Resource ocsi
> is active on 2 nodes (attempting recovery)
>
> The resource was definitely not active on 2 nodes. And that caused a storm
> of killing all virtual machines as resources.
>
> How could one prevent such cases to come up?
>
standby does not stop cluster from running, it simply tells pacemaker to
exclude this node from possible candidates to run resources. To avoid
any unwanted interaction (also due to possible resource agent or other
software bugs) you could simply stop pacemaker and disable auto-startup.
More information about the Users
mailing list