[ClusterLabs] host in standby causes havoc

Ken Gaillot kgaillot at redhat.com
Thu Jun 15 10:13:15 EDT 2023


On Thu, 2023-06-15 at 12:58 +0200, Kadlecsik József wrote:
> Hello,
> 
> We had a strange issue here: 7 node cluster, one node was put into
> standby 
> mode to test a new iscsi setting on it. During configuring the
> machine it 
> was rebooted and after the reboot the iscsi didn't come up. That
> caused a 
> malformed communication (atlas5 is the node in standby) with the
> cluster:
> 
> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  warning:
> Unexpected 
> result (error) was recorded for probe of ocsi on atlas5 at Jun 15
> 10:09:32 2023
> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  notice: If it is
> not 
> possible for ocsi to run on atlas5, see the resource-discovery option
> for 
> location constraints
> Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  error: Resource
> ocsi 
> is active on 2 nodes (attempting recovery)

Newer versions reword this as "might be active". The idea is that if
the probe returns an error, we don't know the state of the resource on
that node. From an HA perspective, we have to assume the worst, that
the resource could be active there.

> The resource was definitely not active on 2 nodes. And that caused a
> storm 
> of killing all virtual machines as resources.

The cluster would first try to stop ocsi on that node as well as the
node where it's known to be running. If a stop fails, then the cluster
will try to fence that node.

> How could one prevent such cases to come up?

It sounds like maybe the agent can't probe or stop in certain
situations. It may be possible to improve the agent. For example, some
agents return an error if key software isn't installed, but for a probe
or stop, that's fine -- if the software isn't installed, it's
definitely not running.

> 
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.hu
> PGP key: https://wigner.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics
>          H-1525 Budapest 114, POB. 49, Hungary
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list