[ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

Thu Dec 17 06:23:42 EST 2020

>>> Gabriele Bulfon <gbulfon at sonicle.com> schrieb am 17.12.2020 um 09:11 in
Nachricht <2129123894.1061.1608192712316 at www>:
> Yes, sorry took same bash by mistake...here are the correct logs.
>  
> Yes, xstha1 has delay 10s so that I'm giving him precedence, xstha2 has 
> delay 1s and will be stonished earlier.
> During the short time before xstha2 got powered off, I saw it had time to 
> turn on NFS IP (I saw duplicated IP on xstha1).
> And becase configuration has "order zpool_data_order inf: zpool_data ( 
> xstha1_san0_IP )", that means xstha2 had imported the zpool for a small time

> before being stonished, and this must never happen.
>  
> What suggests me that resources were started on xstha2 (and duplicated IP is

> an effect) are these logs portions of xstha2.
> These tells me it could not turn off resources on xstha1 (correct, it 
> couldn't contact xstha1):
> 
> Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action 
> xstha1_san0_IP_stop_0 on xstha1 is unrunnable (offline)
> Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action 
> zpool_data_stop_0 on xstha1 is unrunnable (offline)
> Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action 
> xstha2-stonith_stop_0 on xstha1 is unrunnable (offline)
> Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action 
> xstha2-stonith_stop_0 on xstha1 is unrunnable (offline)
>  

I wonder: Did you remove the hostnames from the log messages? Also are the
times in sync, wondering that at the same second a resource is fallged
"unrunnable" and being recovered at the same second?

> These tells me xstha2 took control of resources, that were actually running

> on xstha1:
> 
> Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Move       
> xstha1_san0_IP     ( xstha1 -> xstha2 )
> Dec 16 15:08:56 [667]    pengine:     info: LogActions: Leave   
> xstha2_san0_IP  (Started xstha2)
> Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Move       
> zpool_data         ( xstha1 -> xstha2 )
> Dec 16 15:08:56 [667]    pengine:     info: LogActions: Leave   
> xstha1-stonith  (Started xstha2)
> Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Stop       
> xstha2-stonith     (           xstha1 )   due to node availability
>  
> The last stonith request is the last beacuse xstha2 was killed by xsrtha1 
> before the 10s delay, which is what I wanted.

 Also note that "Stop xstha2-stonith     (           xstha1 )   due to node
availability" is NOT a stonith request; I have the feeling that your cluster
does not use STONITH at all.
Also the logs are really rather incomplete to tell details...

>  
> Gabriele
>  
>  
> Sonicle S.r.l. : http://www.sonicle.com 
> Music: http://www.gabrielebulfon.com 
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets 
>  
> 
> 
> 
> 
>
----------------------------------------------------------------------------
> ------
> 
> Da: Andrei Borzenkov <arvidjaar at gmail.com>
> A: users at clusterlabs.org 
> Data: 17 dicembre 2020 6.38.33 CET
> Oggetto: Re: [ClusterLabs] Antw: [EXT] delaying start of a resource
> 
> 
> 16.12.2020 17:56, Gabriele Bulfon пишет:
>> Thanks, here are the logs, there are infos about how it tried to start 
> resources on the nodes.
> 
> Both logs are from the same node.
> 
>> Keep in mind the node1 was already running the resources, and I simulated a

> problem by turning down the ha interface.
>>  
> 
> There is no attempt to start resources in these logs. Logs end with
> stonith request. As this node had delay 10s, it probably was
> successfully eliminated by another node, but there are no logs from
> another node.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/