[ClusterLabs] Antw: [EXT] delaying start of a resource

Gabriele Bulfon gbulfon at sonicle.com
Thu Dec 17 03:11:52 EST 2020


Yes, sorry took same bash by mistake...here are the correct logs.
 
Yes, xstha1 has delay 10s so that I'm giving him precedence, xstha2 has delay 1s and will be stonished earlier.
During the short time before xstha2 got powered off, I saw it had time to turn on NFS IP (I saw duplicated IP on xstha1).
And becase configuration has "order zpool_data_order inf: zpool_data ( xstha1_san0_IP )", that means xstha2 had imported the zpool for a small time before being stonished, and this must never happen.
 
What suggests me that resources were started on xstha2 (and duplicated IP is an effect) are these logs portions of xstha2.
These tells me it could not turn off resources on xstha1 (correct, it couldn't contact xstha1):

Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action xstha1_san0_IP_stop_0 on xstha1 is unrunnable (offline)
Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action zpool_data_stop_0 on xstha1 is unrunnable (offline)
Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action xstha2-stonith_stop_0 on xstha1 is unrunnable (offline)
Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action xstha2-stonith_stop_0 on xstha1 is unrunnable (offline)
 
These tells me xstha2 took control of resources, that were actually running on xstha1:

Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Move       xstha1_san0_IP     ( xstha1 -> xstha2 )
Dec 16 15:08:56 [667]    pengine:     info: LogActions: Leave   xstha2_san0_IP  (Started xstha2)
Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Move       zpool_data         ( xstha1 -> xstha2 )
Dec 16 15:08:56 [667]    pengine:     info: LogActions: Leave   xstha1-stonith  (Started xstha2)
Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Stop       xstha2-stonith     (           xstha1 )   due to node availability
 
The last stonith request is the last beacuse xstha2 was killed by xsrtha1 before the 10s delay, which is what I wanted.
 
Gabriele
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 




----------------------------------------------------------------------------------

Da: Andrei Borzenkov <arvidjaar at gmail.com>
A: users at clusterlabs.org 
Data: 17 dicembre 2020 6.38.33 CET
Oggetto: Re: [ClusterLabs] Antw: [EXT] delaying start of a resource


16.12.2020 17:56, Gabriele Bulfon пишет:
> Thanks, here are the logs, there are infos about how it tried to start resources on the nodes.

Both logs are from the same node.

> Keep in mind the node1 was already running the resources, and I simulated a problem by turning down the ha interface.
>  

There is no attempt to start resources in these logs. Logs end with
stonith request. As this node had delay 10s, it probably was
successfully eliminated by another node, but there are no logs from
another node.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201217/dc6d32a2/attachment-0001.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: stonith1.txt
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201217/dc6d32a2/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: stonith2.txt
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201217/dc6d32a2/attachment-0003.txt>


More information about the Users mailing list