[ClusterLabs] Antw: [EXT] Re: crm resource stop VirtualDomain ‑ but VirtualDomain shutdown start some minutes later

Wed Feb 16 07:01:36 EST 2022

>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 16.02.2022 um
12:35 in Nachricht
<879647182.178210820.1645011316841.JavaMail.zimbra at helmholtz-muenchen.de>:

> 
> ----- On Feb 16, 2022, at 12:52 AM, kgaillot kgaillot at redhat.com wrote:
> 
> 
>>> Any idea ?
>>> What is about that transition 128, which is aborted ?
>> 
>> A transition is the set of actions that need to be taken in response to
>> current conditions. A transition is aborted any time conditions change
>> (here, the target-role being changed in the configuration), so that a
>> new set of actions can be calculated.
>> 
>> Someone once defined a transition as an "action plan", and I'm tempted
>> to use that instead. Plus maybe replace "aborted" with "interrupted",
>> so then we'd have "Action plan interrupted" which is maybe a little
>> more understandable.
>> 
>>> 
>>> Transition 128 is finished:
>>> Feb 15 21:04:26 [15370] ha-idg-2       crmd:   notice:
>>> run_graph:       Transition 128 (Complete=1, Pending=0, Fired=0,
>>> Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-
>>> 3548.bz2): Complete
>>> 
>>> And one second later the shutdown starts. Is that normal that there
>>> is such a big time gap ?
>>>
>> 
>> No, there should be another transition calculated (with a "saving
>> input" message) immediately after the original transition is aborted.
>> What's the timestamp on that?
>> --
> 
> Hi Ken,
> 
> this is what i found:
> 
> Feb 15 20:54:30 [15369] ha-idg-2    pengine:   notice: process_pe_message:   
>    Calculated transition 128, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-3548.bz2
> Feb 15 20:54:30 [15370] ha-idg-2       crmd:     info: do_state_transition:  
>    State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
> Feb 15 20:54:30 [15370] ha-idg-2       crmd:   notice: do_te_invoke:    
> Processing graph 128 (ref=pe_calc-dc-1644954870-403) derived from 
> /var/lib/pacemaker/pengine/pe-input-3548.bz2
> Feb 15 20:54:30 [15370] ha-idg-2       crmd:   notice: te_rsc_command:  
> Initiating stop operation vm_pathway_stop_0 locally on ha-idg-2 | action 76
> 
> Feb 15 21:04:26 [15369] ha-idg-2    pengine:   notice: process_pe_message:   
>    Calculated transition 129, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-3549.bz2
> Feb 15 21:04:26 [15370] ha-idg-2       crmd:     info: do_state_transition:  
>    State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
> Feb 15 21:04:26 [15370] ha-idg-2       crmd:   notice: do_te_invoke:    
> Processing graph 129 (ref=pe_calc-dc-1644955466-405) derived from 
> /var/lib/pacemaker/pengine/pe-input-3549.bz2

Bernd,

I guess the syslog(/journal of the DC has better logs.
As I see it now, it seems stop of vm_pathway takes a few minutes, and no other action is started befor that is done.
I think I once said it "Clusters are not for the impatient", i.e.: Don't start a noew action when the previous action did not complete yet.
Maybe more recent versions of pacemaker can "preempt" action plans (transitions), but I don't know...

Regards,
Ulrich

> 
> Bernd