[ClusterLabs] Antw: Re: two virtual domains start and stop every 15 minutes

Mon Jul 1 06:32:44 EDT 2019

To me it looks like a broken migration configuration.

>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 19.06.2019
um
18:46 in Nachricht
<1654529492.1465807.1560962767193.JavaMail.zimbra at helmholtz-muenchen.de>:

> ‑‑‑‑‑ On Jun 15, 2019, at 4:30 PM, Bernd Lentes
bernd.lentes at helmholtz‑muenchen.de 
> wrote:
> 
>> ‑‑‑‑‑ Am 14. Jun 2019 um 21:20 schrieb kgaillot kgaillot at redhat.com:
>> 
>>> On Fri, 2019‑06‑14 at 18:27 +0200, Lentes, Bernd wrote:
>>>> Hi,
>>>> 
>>>> i had that problem already once but still it's not clear for me what
>>>> really happens.
>>>> I had this problem some days ago:
>>>> I have a 2‑node cluster with several virtual domains as resources. I
>>>> put one node (ha‑idg‑2) into standby, and two running virtual domains
>>>> were migrated to the other node (ha‑idg‑1). The other virtual domains
>>>> were already running on ha‑idg‑1.
>>>> Since then the two virtual domains which migrated (vm_idcc_devel and
>>>> vm_severin) start or stop every 15 minutes on ha‑idg‑1.
>>>> ha‑idg‑2 resides in standby.
>>>> I know that the 15 minutes interval is related to the "cluster‑
>>>> recheck‑interval".
>>>> But why are these two domains started and stopped ?
>>>> I looked around much in the logs, checked the pe‑input files, watched
>>>> some graphs created by crm_simulate with dotty ...
>>>> I always see that the domains are started and 15 minutes later
>>>> stopped and 15 minutes later started ...
>>>> but i don't see WHY. I would really like to know that.
>>>> And why are the domains not started from the monitor resource
>>>> operation ? It should recognize that the domain is stopped and starts
>>>> it again. My monitor interval is 30 seconds.
>>>> I had two errors pending concerning these domains, a failed migrate
>>>> from ha‑idg‑1 to ha‑idg‑2, form some time before.
>>>> Could that be the culprit ?
>>>> 
>>>> I still have all the logs from that time, if you need information
>>>> just let me know.
>>> 
>>> Yes the logs and pe‑input files would be helpful. It sounds like a bug
>>> in the scheduler. What version of pacemaker are you running?
>>> 
>> 
>> Hi,
>> 
>> here are the log and some pe‑input files:
>> https://hmgubox.helmholtz‑muenchen.de/d/f28f6961722f472eb649/ 
>> On 6th of june at 15:41:28 i issued "crm node standby ha‑idg‑2", then the
>> trouble began.
>> I'm running pacemaker‑1.1.19+20181105.ccd6b5b10‑3.10.1.x86_64 on SLES 12
SP4 
> and
>> kernel 4.12.14‑95.13.
>> 
> 
> Hi,
> the problem arised again.
> And what attracted my attention: i made a change in the configuration, e.g.

> some slight changes
> of a resource, it immediately start or stop the domains, depending on the 
> state before.
> The fence‑resource is not affected by this start/stop.
> 
> Example (some changes of a stonith agent):
> 
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_process_request:  

>  Forwarding cib_replace operation for section configuration to all 
> (origin=local/crm_shadow/2)
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: __xml_diff_object:    

>  Moved nvpair at id (0 ‑> 2)
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: __xml_diff_object:    

>  Moved nvpair at name (1 ‑> 0)
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: ‑‑‑ 
> 2.6990.1043 2
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: 
> +++ 2.6991.0 6a5f09a19ae7d0a7bae55bddb9d1564f   <=====================   new

> epoch
> 
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ‑‑ 
>
/cib/configuration/resources/primitive[@id='fence_ilo_ha‑idg‑2']/instance_attri
>
butes[@id='fence_ha‑idg‑2‑instance_attributes']/nvpair[@id='fence_ha‑idg‑2‑instance
> _attributes‑action']
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ‑‑ 
>
/cib/configuration/resources/primitive[@id='fence_ilo_ha‑idg‑2']/instance_attri
>
butes[@id='fence_ha‑idg‑2‑instance_attributes‑0']/nvpair[@id='fence_ha‑idg‑2‑instan
> ce_attributes‑0‑ipaddr']
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ‑‑ 
>
/cib/configuration/resources/primitive[@id='fence_ilo_ha‑idg‑2']/instance_attri
>
butes[@id='fence_ha‑idg‑2‑instance_attributes‑1']/nvpair[@id='fence_ha‑idg‑2‑instan
> ce_attributes‑1‑login']
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ‑‑ 
>
/cib/configuration/resources/primitive[@id='fence_ilo_ha‑idg‑2']/instance_attri
>
butes[@id='fence_ha‑idg‑2‑instance_attributes‑2']/nvpair[@id='fence_ha‑idg‑2‑instan
> ce_attributes‑2‑passwd']
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ‑‑ 
>
/cib/configuration/resources/primitive[@id='fence_ilo_ha‑idg‑2']/instance_attri
>
butes[@id='fence_ha‑idg‑2‑instance_attributes‑3']/nvpair[@id='fence_ha‑idg‑2‑instan
> ce_attributes‑3‑ssl_insecure']
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ‑‑ 
>
/cib/configuration/resources/primitive[@id='fence_ilo_ha‑idg‑2']/instance_attri
>
butes[@id='fence_ha‑idg‑2‑instance_attributes‑4']/nvpair[@id='fence_ha‑idg‑2‑instan
> ce_attributes‑4‑delay']
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   +  
> /cib:  @epoch=6991, @num_updates=0
> Jun 18 18:07:09 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ++ 
>
/cib/configuration/resources/primitive[@id='fence_ilo_ha‑idg‑2']/instance_attri
> butes[@id='fence_ha‑idg‑2‑instance_attributes']:  <nvpair
name="pcmk_off_action" 
> va
> lue="Off" id="fence_ha‑idg‑2‑instance_attributes‑pcmk_off_action"/>
> 
>  ...
> 
> cluster reacts immediately:
> Jun 18 18:07:10 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_sim                 (             ha‑idg‑2 )
> Jun 18 18:07:10 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_geneious            (             ha‑idg‑2 )
> Jun 18 18:07:10 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_idcc_devel          (             ha‑idg‑2 )
> Jun 18 18:07:10 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_genetrap            (             ha‑idg‑2 )
> Jun 18 18:07:10 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_mouseidgenes        (             ha‑idg‑2 )
> Jun 18 18:07:10 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_greensql            (             ha‑idg‑2 )
> Jun 18 18:07:10 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_severin             (             ha‑idg‑2 )
> 
> What else surprises me:
> With the changes of the stonith agent a new epoch (6991) was created. 
> Afterwards, all start/stop actions of the domains, happening hours later, 
> relate to this epoch:
> 
>  ...
> Jun 18 21:15:07 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: ‑‑‑ 
> 2.6991.178 2         <========================
> Jun 18 21:15:07 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: 
> +++ 2.6991.179 (null)    <========================
> Jun 18 21:15:07 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   +  
> /cib:  @num_updates=179
> Jun 18 21:15:07 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   +  
>
/cib/status/node_state[@id='1084777492']/lrm[@id='1084777492']/lrm_resources/
> lrm_resource[@id='vm_sim']/lrm_rsc_op[@id='vm_sim_last_0']:  
> @operation_key=vm_sim_start_0, @operation=start, 
> @transition‑key=33:178:0:612aaad1‑30d6‑4a94‑978e‑fcbece63cb8f, 
> @transition‑magic=0:0;33:178:0:612aaad1‑30d6‑4a94‑978e‑fcbece63cb8f, 
> @call‑id=4693, @exec‑time=2928
> Jun 18 21:15:07 [9577] ha‑idg‑1        cib:     info: cib_process_request:  

>  Completed cib_modify operation for section status: OK (rc=0, 
> origin=ha‑idg‑2/crmd/1977, version=2.6991.179)
> Jun 18 21:15:07 [9583] ha‑idg‑1       crmd:     info: match_graph_event:    

>  Action vm_sim_start_0 (33) confirmed on ha‑idg‑2 (rc=0)
> Jun 18 21:15:07 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating monitor operation vm_sim_monitor_30000 on ha‑idg‑2 | action 34
> Jun 18 21:15:07 [9578] ha‑idg‑1 stonith‑ng:     info: 
> update_cib_stonith_devices_v2:    Updating device list from the cib: modify

> lrm_rsc_op[@id='vm_sim_last_0']
> Jun 18 21:15:07 [9578] ha‑idg‑1 stonith‑ng:     info: cib_devices_update:   

> Updating devices to version 2.6991.179
> Jun 18 21:15:07 [9578] ha‑idg‑1 stonith‑ng:   notice: unpack_config:    On
loss 
> of CCM Quorum: Ignore
> Jun 18 21:15:07 [9578] ha‑idg‑1 stonith‑ng:     info: cib_device_update:    

> Device fence_ilo_ha‑idg‑1 has been disabled on ha‑idg‑1: score=‑INFINITY
> Jun 18 21:15:07 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: ‑‑‑ 
> 2.6991.179 2
> Jun 18 21:15:07 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: 
> +++ 2.6991.180 (null)
>  ...
> 
> Continuing just to the next change of the configuration, then a new epoch is

> created. 
> Normally this epochs are changed very frequently, here they remain for a 
> long time unchanged.
> Does that mean that still, although hours later, the actions relate to this

> epoch ? What is the use of this epoch ?
> 
> Also interesting:
> Jun 17 16:47:05 [9581] ha‑idg‑1    pengine:   notice: process_pe_message:   

>  Calculated transition 62, saving inputs in 
> /var/lib/pacemaker/pengine/pe‑input‑1085.bz2
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:     info: do_state_transition:  

>  State transition S_POLICY_ENGINE ‑> S_TRANSITION_ENGINE |
input=I_PE_SUCCESS 
> cause=C_IPC_MESSAGE origin=handle_response
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: do_te_invoke:     
> Processing graph 62 (ref=pe_calc‑dc‑1560782825‑446) derived from 
> /var/lib/pacemaker/pengine/pe‑input‑1085.bz2
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_mausdb_stop_0 on ha‑idg‑2 | action 37
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_sim_stop_0 on ha‑idg‑2 | action 39
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_geneious_stop_0 on ha‑idg‑2 | action 41
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_idcc_devel_stop_0 on ha‑idg‑2 | action 43
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_genetrap_stop_0 on ha‑idg‑2 | action 45
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_mouseidgenes_stop_0 on ha‑idg‑2 | action 47
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_greensql_stop_0 on ha‑idg‑2 | action 49
> Jun 17 16:47:05 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_severin_stop_0 on ha‑idg‑2 | action 51
>  ...
> Jun 17 16:48:20 [9583] ha‑idg‑1       crmd:   notice: run_graph:        
> Transition 62 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe‑input‑1085.bz2): Complete
> Jun 17 16:48:20 [9583] ha‑idg‑1       crmd:     info: do_log:   Input 
> I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
> Jun 17 16:48:20 [9583] ha‑idg‑1       crmd:   notice: do_state_transition:  

>  State transition S_TRANSITION_ENGINE ‑> S_IDLE | input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd
> 
> 15 min later:
>  ...
> Jun 17 17:03:20 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_mausdb              (             ha‑idg‑2 )
> Jun 17 17:03:20 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_sim                 (             ha‑idg‑2 )
> Jun 17 17:03:20 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_geneious            (             ha‑idg‑2 )
> Jun 17 17:03:20 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_idcc_devel          (             ha‑idg‑2 )
> Jun 17 17:03:20 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_genetrap            (             ha‑idg‑2 )
> Jun 17 17:03:20 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_mouseidgenes        (             ha‑idg‑2 )
> Jun 17 17:03:20 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_greensql            (             ha‑idg‑2 )
> Jun 17 17:03:20 [9581] ha‑idg‑1    pengine:   notice: LogAction:         * 
> Restart    vm_severin             (             ha‑idg‑2 )
>  ...
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_mausdb_stop_0 on ha‑idg‑2 | action 29
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_sim_stop_0 on ha‑idg‑2 | action 32
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_geneious_stop_0 on ha‑idg‑2 | action 35
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_idcc_devel_stop_0 on ha‑idg‑2 | action 38
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_genetrap_stop_0 on ha‑idg‑2 | action 41
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_mouseidgenes_stop_0 on ha‑idg‑2 | action 44
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_greensql_stop_0 on ha‑idg‑2 | action 47
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_severin_stop_0 on ha‑idg‑2 | action 50
>  ...
> Despite the fact that the domains are already stopped they are stopped again

> !?!
> 
> And immediately after being stopped they are started again:
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:     info: match_graph_event:    

>  Action vm_mausdb_stop_0 (29) confirmed on ha‑idg‑2 (rc=0)
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating start operation vm_mausdb_start_0 on ha‑idg‑2 | action 30
> 
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:     info: match_graph_event:    

>  Action vm_idcc_devel_stop_0 (38) confirmed on ha‑idg‑2 (rc=0)
> Jun 17 17:03:20 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating start operation vm_idcc_devel_start_0 on ha‑idg‑2 | action 39
> Stopped and immediately started again !?!
> 
> Also interesting:
>  ...
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:     info: crm_timer_popped: 
> PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: do_state_transition:  

>  State transition S_IDLE ‑> S_POLICY_ENGINE | input=I_PE_CALC 
> cause=C_TIMER_POPPED origin=crm_timer_popped
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:     info: do_state_transition:  

>  Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:   notice: unpack_config:    On 
> loss of CCM Quorum: Ignore
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info: unpack_status:    Node

> ha‑idg‑1 is in standby‑mode
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info: 
> determine_online_status_fencing:  Node ha‑idg‑1 is active
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info:
determine_online_status: 
>  Node ha‑idg‑1 is standby
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info: 
> determine_online_status_fencing:  Node ha‑idg‑2 is active
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info:
determine_online_status: 
>  Node ha‑idg‑2 is online
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:  warning: unpack_rsc_op_failure:

>  Processing failed monitor of vm_mausdb on ha‑idg‑2: not running | rc=7
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:  warning: unpack_rsc_op_failure:

>  Processing failed migrate_to of vm_mouseidgenes on ha‑idg‑2: unknown error
| 
> rc=1
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:  warning: unpack_rsc_op_failure:

>  Processing failed migrate_to of vm_idcc_devel on ha‑idg‑2: unknown error |

> rc=1
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:  warning: unpack_rsc_op_failure:

>  Processing failed migrate_to of vm_sim on ha‑idg‑2: unknown error | rc=1
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:  warning: unpack_rsc_op_failure:

>  Processing failed migrate_to of vm_genetrap on ha‑idg‑2: unknown error |
rc=1
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:  warning: unpack_rsc_op_failure:

>  Processing failed migrate_to of vm_geneious on ha‑idg‑2: unknown error |
rc=1
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:  warning: unpack_rsc_op_failure:

>  Processing failed migrate_to of vm_greensql on ha‑idg‑2: unknown error |
rc=1
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:  warning: unpack_rsc_op_failure:

>  Processing failed migrate_to of vm_severin on ha‑idg‑2: unknown error |
rc=1
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info: unpack_node_loop: Node

> 1084777482 is already processed
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info: unpack_node_loop: Node

> 1084777492 is already processed
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info: unpack_node_loop: Node

> 1084777482 is already processed
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:     info: unpack_node_loop: Node

> 1084777492 is already processed
>  ...
> Jun 19 14:47:42 [9581] ha‑idg‑1    pengine:   notice: process_pe_message:   

>  ===============> Calculated transition 250 <===============, saving inputs
in 
> /var/lib/pacemaker/pengine/pe‑input‑1273.bz2    
> 
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:     info: do_state_transition:  

>  State transition S_POLICY_ENGINE ‑> S_TRANSITION_ENGINE |
input=I_PE_SUCCESS 
> cause=C_IPC_MESSAGE origin=handle_response
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: do_te_invoke:     
> Processing graph 250 (ref=pe_calc‑dc‑1560948462‑3241) derived from 
> /var/lib/pacemaker/pengine/pe‑input‑1273.bz2  
<===============================
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_sim_stop_0 on ha‑idg‑2 | action 32
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_geneious_stop_0 on ha‑idg‑2 | action 35
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_idcc_devel_stop_0 on ha‑idg‑2 | action 38
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_genetrap_stop_0 on ha‑idg‑2 | action 41
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_mouseidgenes_stop_0 on ha‑idg‑2 | action 44
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_greensql_stop_0 on ha‑idg‑2 | action 47
> Jun 19 14:47:42 [9583] ha‑idg‑1       crmd:   notice: te_rsc_command:   
> Initiating stop operation vm_severin_stop_0 on ha‑idg‑2 | action 50
>  ...
> The changes are calculated to transition 250 and completed:
>  ...
> Jun 19 14:47:46 [9583] ha‑idg‑1       crmd:   notice: run_graph:        
> =========> Transition 250 (Complete=22, Pending=0, Fired=0, Skipped=0, 
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe‑input‑1273.bz2): Complete

> <=========== 
> 
> Jun 19 14:47:46 [9583] ha‑idg‑1       crmd:     info: do_log:   Input 
> I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
> Jun 19 14:47:46 [9583] ha‑idg‑1       crmd:   notice: do_state_transition:  

>  State transition S_TRANSITION_ENGINE ‑> S_IDLE | input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd
> Jun 19 14:47:46 [9578] ha‑idg‑1 stonith‑ng:     info: 
> update_cib_stonith_devices_v2:    Updating device list from the cib: modify

> lrm_rsc_op[@id='vm_mouseidgenes_monitor_30000']
> Jun 19 14:47:46 [9578] ha‑idg‑1 stonith‑ng:     info: cib_devices_update:   

> Updating devices to version 2.7007.890
> Jun 19 14:47:46 [9578] ha‑idg‑1 stonith‑ng:   notice: unpack_config:    On
loss 
> of CCM Quorum: Ignore
> Jun 19 14:47:46 [9578] ha‑idg‑1 stonith‑ng:     info: cib_device_update:    

> Device fence_ilo_ha‑idg‑1 has been disabled on ha‑idg‑1: score=‑INFINITY
> 
> Transition 250 is completed. But
>  ...
> Jun 19 14:47:51 [9577] ha‑idg‑1        cib:     info: cib_process_ping: 
> Reporting our current digest to ha‑idg‑1: ae505b65505c427ab0a45d36717b4135
for 
> 2.7007.890 (0x1ea7670 0)
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_process_request:  

>  Forwarding cib_delete operation for section 
> //node_state[@uname='ha‑idg‑1']//lrm_resource[@id='vm_mouseidgenes'] to all

> (origin=local/crmd/707)
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: ‑‑‑ 
> 2.7007.890 2
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: 
> +++ 2.7007.891 2d0272cf2594d336b179085d63b67c6f
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ‑‑ 
>
/cib/status/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_resources/
> lrm_resource[@id='vm_mouseidgenes']
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   +  
> /cib:  @num_updates=891
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_process_request:  

>  Completed cib_delete operation for section 
> //node_state[@uname='ha‑idg‑1']//lrm_resource[@id='vm_mouseidgenes']: OK 
> (rc=0, origin=ha‑idg‑1/crmd/707, version=
> 2.7007.890)
> Jun 19 14:57:32 [9583] ha‑idg‑1       crmd:     info: delete_resource:  
> Removing resource vm_mouseidgenes for 4460a5c3‑c009‑44f6‑a01d‑52f93e731fda
(root) 
> on ha‑idg‑1
> Jun 19 14:57:32 [9583] ha‑idg‑1       crmd:     info: notify_deleted:   
> Notifying 4460a5c3‑c009‑44f6‑a01d‑52f93e731fda on ha‑idg‑1 that
vm_mouseidgenes was 
> deleted
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_process_request:  

>  Forwarding cib_delete operation for section 
> //node_state[@uname='ha‑idg‑1']//lrm_resource[@id='vm_mouseidgenes'] to all

> (origin=local/crmd/708)
> Jun 19 14:57:32 [9583] ha‑idg‑1       crmd:  warning: qb_ipcs_event_sendv:  

>  new_event_notification (9583‑10294‑15): Broken pipe (32)
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: ‑‑‑ 
> 2.7007.890 2
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: 
> +++ 2.7007.891 (null)
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   ‑‑ 
>
/cib/status/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_resources/
> lrm_resource[@id='vm_mouseidgenes']
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   +  
> /cib:  @num_updates=891
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_process_request:  

>  Completed cib_delete operation for section 
> //node_state[@uname='ha‑idg‑1']//lrm_resource[@id='vm_mouseidgenes']: OK 
> (rc=0, origin=ha‑idg‑1/crmd/708, version=
> 2.7007.891)
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_process_request:  

>  Forwarding cib_modify operation for section crm_config to all 
> (origin=local/crmd/710)
> 
> Jun 19 14:57:32 [9583] ha‑idg‑1       crmd:     info:
abort_transition_graph:  
>  ========> Transition 250 aborted <============== by deletion of 
> lrm_resource[@id='vm_mouseidgenes']: Resource state removal | cib=2.7007.891

> source=abort_unless_down:344 path=/cib/sta
>
tus/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_resources/lrm_res
> ource[@id='vm_mouseidgenes'] complete=true
> 
> Jun 19 14:57:32 [9583] ha‑idg‑1       crmd:   notice: do_state_transition:  

>  State transition S_IDLE ‑> S_POLICY_ENGINE | input=I_PE_CALC 
> cause=C_FSA_INTERNAL origin=abort_transition_graph
> Jun 19 14:57:32 [9578] ha‑idg‑1 stonith‑ng:     info: 
> update_cib_stonith_devices_v2:    Updating device list from the cib: delete

> lrm_resource[@id='vm_mouseidgenes']
> Jun 19 14:57:32 [9578] ha‑idg‑1 stonith‑ng:     info: cib_devices_update:   

> Updating devices to version 2.7007.891
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: ‑‑‑ 
> 2.7007.891 2
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:  
Diff: 
> +++ 2.7008.0 (null)
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   +  
> /cib:  @epoch=7008, @num_updates=0
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_perform_op:   +  
>
/cib/configuration/crm_config/cluster_property_set[@id='cib‑bootstrap‑options']
> /nvpair[@id='cib‑bootstrap‑options‑last‑lrm‑refresh']:  @value=1560949052
> Jun 19 14:57:32 [9578] ha‑idg‑1 stonith‑ng:   notice: unpack_config:    On
loss 
> of CCM Quorum: Ignore
> Jun 19 14:57:32 [9577] ha‑idg‑1        cib:     info: cib_process_request:  

>  Completed cib_modify operation for section crm_config: OK (rc=0, 
> origin=ha‑idg‑1/crmd/710, version=2.7008.0)
> Jun 19 14:57:32 [9583] ha‑idg‑1       crmd:     info:
abort_transition_graph:  
>  Transition 250 aborted by cib‑bootstrap‑options‑last‑lrm‑refresh doing
modify 
> last‑lrm‑refresh=1560949052: Configuration change | cib=2.7008.0 
> source=te_update_diff_v2:500 
>
path=/cib/configuration/crm_config/cluster_property_set[@id='cib‑bootstrap‑opti
> ons']/nvpair[@id='cib‑bootstrap‑options‑last‑lrm‑refresh'] complete=true
> 
> a few minutes later transition 250 is aborted. How can something which is 
> completed being aborted ?
> 
> 
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz‑muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich 
> Bassler, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt‑IdNr: DE 129521671
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/