[ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Dec 17 06:23:42 EST 2020
>>> Gabriele Bulfon <gbulfon at sonicle.com> schrieb am 17.12.2020 um 09:11 in
Nachricht <2129123894.1061.1608192712316 at www>:
> Yes, sorry took same bash by mistake...here are the correct logs.
>
> Yes, xstha1 has delay 10s so that I'm giving him precedence, xstha2 has
> delay 1s and will be stonished earlier.
> During the short time before xstha2 got powered off, I saw it had time to
> turn on NFS IP (I saw duplicated IP on xstha1).
> And becase configuration has "order zpool_data_order inf: zpool_data (
> xstha1_san0_IP )", that means xstha2 had imported the zpool for a small time
> before being stonished, and this must never happen.
>
> What suggests me that resources were started on xstha2 (and duplicated IP is
> an effect) are these logs portions of xstha2.
> These tells me it could not turn off resources on xstha1 (correct, it
> couldn't contact xstha1):
>
> Dec 16 15:08:56 [667] pengine: warning: custom_action: Action
> xstha1_san0_IP_stop_0 on xstha1 is unrunnable (offline)
> Dec 16 15:08:56 [667] pengine: warning: custom_action: Action
> zpool_data_stop_0 on xstha1 is unrunnable (offline)
> Dec 16 15:08:56 [667] pengine: warning: custom_action: Action
> xstha2-stonith_stop_0 on xstha1 is unrunnable (offline)
> Dec 16 15:08:56 [667] pengine: warning: custom_action: Action
> xstha2-stonith_stop_0 on xstha1 is unrunnable (offline)
>
I wonder: Did you remove the hostnames from the log messages? Also are the
times in sync, wondering that at the same second a resource is fallged
"unrunnable" and being recovered at the same second?
> These tells me xstha2 took control of resources, that were actually running
> on xstha1:
>
> Dec 16 15:08:56 [667] pengine: notice: LogAction: * Move
> xstha1_san0_IP ( xstha1 -> xstha2 )
> Dec 16 15:08:56 [667] pengine: info: LogActions: Leave
> xstha2_san0_IP (Started xstha2)
> Dec 16 15:08:56 [667] pengine: notice: LogAction: * Move
> zpool_data ( xstha1 -> xstha2 )
> Dec 16 15:08:56 [667] pengine: info: LogActions: Leave
> xstha1-stonith (Started xstha2)
> Dec 16 15:08:56 [667] pengine: notice: LogAction: * Stop
> xstha2-stonith ( xstha1 ) due to node availability
>
> The last stonith request is the last beacuse xstha2 was killed by xsrtha1
> before the 10s delay, which is what I wanted.
Also note that "Stop xstha2-stonith ( xstha1 ) due to node
availability" is NOT a stonith request; I have the feeling that your cluster
does not use STONITH at all.
Also the logs are really rather incomplete to tell details...
>
> Gabriele
>
>
> Sonicle S.r.l. : http://www.sonicle.com
> Music: http://www.gabrielebulfon.com
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
>
>
>
>
>
>
----------------------------------------------------------------------------
> ------
>
> Da: Andrei Borzenkov <arvidjaar at gmail.com>
> A: users at clusterlabs.org
> Data: 17 dicembre 2020 6.38.33 CET
> Oggetto: Re: [ClusterLabs] Antw: [EXT] delaying start of a resource
>
>
> 16.12.2020 17:56, Gabriele Bulfon пишет:
>> Thanks, here are the logs, there are infos about how it tried to start
> resources on the nodes.
>
> Both logs are from the same node.
>
>> Keep in mind the node1 was already running the resources, and I simulated a
> problem by turning down the ha interface.
>>
>
> There is no attempt to start resources in these logs. Logs end with
> stonith request. As this node had delay 10s, it probably was
> successfully eliminated by another node, but there are no logs from
> another node.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list