[ClusterLabs] two virtual domains start and stop every 15 minutes
Ken Gaillot
kgaillot at redhat.com
Wed Jul 3 19:25:32 EDT 2019
On Wed, 2019-06-19 at 18:46 +0200, Lentes, Bernd wrote:
> ----- On Jun 15, 2019, at 4:30 PM, Bernd Lentes
> bernd.lentes at helmholtz-muenchen.de wrote:
>
> > ----- Am 14. Jun 2019 um 21:20 schrieb kgaillot kgaillot at redhat.com
> > :
> >
> > > On Fri, 2019-06-14 at 18:27 +0200, Lentes, Bernd wrote:
> > > > Hi,
> > > >
> > > > i had that problem already once but still it's not clear for me
> > > > what
> > > > really happens.
> > > > I had this problem some days ago:
> > > > I have a 2-node cluster with several virtual domains as
> > > > resources. I
> > > > put one node (ha-idg-2) into standby, and two running virtual
> > > > domains
> > > > were migrated to the other node (ha-idg-1). The other virtual
> > > > domains
> > > > were already running on ha-idg-1.
> > > > Since then the two virtual domains which migrated
> > > > (vm_idcc_devel and
> > > > vm_severin) start or stop every 15 minutes on ha-idg-1.
> > > > ha-idg-2 resides in standby.
> > > > I know that the 15 minutes interval is related to the "cluster-
> > > > recheck-interval".
> > > > But why are these two domains started and stopped ?
> > > > I looked around much in the logs, checked the pe-input files,
> > > > watched
> > > > some graphs created by crm_simulate with dotty ...
> > > > I always see that the domains are started and 15 minutes later
> > > > stopped and 15 minutes later started ...
> > > > but i don't see WHY. I would really like to know that.
> > > > And why are the domains not started from the monitor resource
> > > > operation ? It should recognize that the domain is stopped and
> > > > starts
> > > > it again. My monitor interval is 30 seconds.
> > > > I had two errors pending concerning these domains, a failed
> > > > migrate
> > > > from ha-idg-1 to ha-idg-2, form some time before.
> > > > Could that be the culprit ?
It did indeed turn out to be.
The resource history on ha-idg-1 shows the last failed action as a
migrate_to from ha-idg-1 to ha-idg-2, and the last successful action as
a migrate_from from ha-idg-2 to ha-idg-1. That confused pacemaker as to
the current status of the migration.
A full migration is migrate_to on the source node, migrate_from on the
target node, and stop on the source node. When the resource history has
a failed migrate_to on the source, and a stop but no migrate_from on
the target, the migration is considered "dangling" and forces a stop of
the resource on the source, because it's possible the migrate_from
never got a chance to be scheduled.
That is wrong in this situation. The resource is happily running on the
node with the failed migrate_to because it was later moved back
successfully, and the failed migrate_to is no longer relevant.
My current plan for a fix is that if a node with a failed migrate_to
has a successful migrate_from or start that's newer, and the target
node of the failed migrate_to has a successful stop, then the migration
should not be considered dangling.
A couple of side notes on your configuration:
Instead of putting action=off in fence device configurations, you
should use pcmk_reboot_action=off. Pacemaker adds action when sending
the fence command.
When keeping a fence device off its target node, use a finite negative
score rather than -INFINITY. This ensures the node can fence itself as
a last resort.
> > > >
> > > > I still have all the logs from that time, if you need
> > > > information
> > > > just let me know.
> > >
> > > Yes the logs and pe-input files would be helpful. It sounds like
> > > a bug
> > > in the scheduler. What version of pacemaker are you running?
> > >
> >
> > Hi,
> >
> > here are the log and some pe-input files:
> > https://hmgubox.helmholtz-muenchen.de/d/f28f6961722f472eb649/
> > On 6th of june at 15:41:28 i issued "crm node standby ha-idg-2",
> > then the
> > trouble began.
> > I'm running pacemaker-1.1.19+20181105.ccd6b5b10-3.10.1.x86_64 on
> > SLES 12 SP4 and
> > kernel 4.12.14-95.13.
> >
>
> Hi,
> the problem arised again.
> And what attracted my attention: i made a change in the
> configuration, e.g. some slight changes
> of a resource, it immediately start or stop the domains, depending on
> the state before.
> The fence-resource is not affected by this start/stop.
>
> Example (some changes of a stonith agent):
>
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_process_request: Forwarding cib_replace operation for
> section configuration to all (origin=local/crm_shadow/2)
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> __xml_diff_object: Moved nvpair at id (0 -> 2)
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> __xml_diff_object: Moved nvpair at name (1 -> 0)
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: --- 2.6990.1043 2
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: +++ 2.6991.0
> 6a5f09a19ae7d0a7bae55bddb9d1564f <===================== new epoch
>
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg-
> 2']/instance_attributes[@id='fence_ha-idg-2-
> instance_attributes']/nvpair[@id='fence_ha-idg-2-instance
> _attributes-action']
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg-
> 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes-
> 0']/nvpair[@id='fence_ha-idg-2-instan
> ce_attributes-0-ipaddr']
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg-
> 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes-
> 1']/nvpair[@id='fence_ha-idg-2-instan
> ce_attributes-1-login']
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg-
> 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes-
> 2']/nvpair[@id='fence_ha-idg-2-instan
> ce_attributes-2-passwd']
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg-
> 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes-
> 3']/nvpair[@id='fence_ha-idg-2-instan
> ce_attributes-3-ssl_insecure']
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg-
> 2']/instance_attributes[@id='fence_ha-idg-2-instance_attributes-
> 4']/nvpair[@id='fence_ha-idg-2-instan
> ce_attributes-4-delay']
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: + /cib: @epoch=6991, @num_updates=0
> Jun 18 18:07:09 [9577] ha-idg-1 cib: info:
> cib_perform_op: ++
> /cib/configuration/resources/primitive[@id='fence_ilo_ha-idg-
> 2']/instance_attributes[@id='fence_ha-idg-2-
> instance_attributes']: <nvpair name="pcmk_off_action" va
> lue="Off" id="fence_ha-idg-2-instance_attributes-pcmk_off_action"/>
>
> ...
>
> cluster reacts immediately:
> Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_sim ( ha-idg-2 )
> Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_geneious ( ha-idg-2 )
> Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_idcc_devel ( ha-idg-2 )
> Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_genetrap ( ha-idg-2 )
> Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_mouseidgenes ( ha-idg-2 )
> Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_greensql ( ha-idg-2 )
> Jun 18 18:07:10 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_severin ( ha-idg-2 )
>
> What else surprises me:
> With the changes of the stonith agent a new epoch (6991) was created.
> Afterwards, all start/stop actions of the domains, happening hours
> later, relate to this epoch:
>
> ...
> Jun 18 21:15:07 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: --- 2.6991.178
> 2 <========================
> Jun 18 21:15:07 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: +++ 2.6991.179
> (null) <========================
> Jun 18 21:15:07 [9577] ha-idg-1 cib: info:
> cib_perform_op: + /cib: @num_updates=179
> Jun 18 21:15:07 [9577] ha-idg-1 cib: info:
> cib_perform_op: + /cib/status/node_state[@id='1084777492']/lrm[@id
> ='1084777492']/lrm_resources/lrm_resource[@id='vm_sim']/lrm_rsc_op[@i
> d='vm_sim_last_0']: @operation_key=vm_sim_start_0, @operation=start,
> @transition-key=33:178:0:612aaad1-30d6-4a94-978e-fcbece63cb8f,
> @transition-magic=0:0;33:178:0:612aaad1-30d6-4a94-978e-fcbece63cb8f,
> @call-id=4693, @exec-time=2928
> Jun 18 21:15:07 [9577] ha-idg-1 cib: info:
> cib_process_request: Completed cib_modify operation for section
> status: OK (rc=0, origin=ha-idg-2/crmd/1977, version=2.6991.179)
> Jun 18 21:15:07 [9583] ha-idg-1 crmd: info:
> match_graph_event: Action vm_sim_start_0 (33) confirmed on ha-
> idg-2 (rc=0)
> Jun 18 21:15:07 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating monitor operation vm_sim_monitor_30000
> on ha-idg-2 | action 34
> Jun 18 21:15:07 [9578] ha-idg-1 stonith-ng: info:
> update_cib_stonith_devices_v2: Updating device list from the cib:
> modify lrm_rsc_op[@id='vm_sim_last_0']
> Jun 18 21:15:07 [9578] ha-idg-1 stonith-ng: info:
> cib_devices_update: Updating devices to version 2.6991.179
> Jun 18 21:15:07 [9578] ha-idg-1 stonith-ng: notice:
> unpack_config: On loss of CCM Quorum: Ignore
> Jun 18 21:15:07 [9578] ha-idg-1 stonith-ng: info:
> cib_device_update: Device fence_ilo_ha-idg-1 has been disabled
> on ha-idg-1: score=-INFINITY
> Jun 18 21:15:07 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: --- 2.6991.179 2
> Jun 18 21:15:07 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: +++ 2.6991.180 (null)
> ...
>
> Continuing just to the next change of the configuration, then a new
> epoch is created.
> Normally this epochs are changed very frequently, here they remain
> for a long time unchanged.
> Does that mean that still, although hours later, the actions relate
> to this epoch ? What is the use of this epoch ?
>
> Also interesting:
> Jun 17 16:47:05 [9581] ha-idg-1 pengine: notice:
> process_pe_message: Calculated transition 62, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-1085.bz2
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: info:
> do_state_transition: State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> do_te_invoke: Processing graph 62 (ref=pe_calc-dc-1560782825-446)
> derived from /var/lib/pacemaker/pengine/pe-input-1085.bz2
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_mausdb_stop_0 on ha-
> idg-2 | action 37
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_sim_stop_0 on ha-idg-2
> | action 39
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_geneious_stop_0 on ha-
> idg-2 | action 41
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_idcc_devel_stop_0 on
> ha-idg-2 | action 43
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_genetrap_stop_0 on ha-
> idg-2 | action 45
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_mouseidgenes_stop_0 on
> ha-idg-2 | action 47
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_greensql_stop_0 on ha-
> idg-2 | action 49
> Jun 17 16:47:05 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_severin_stop_0 on ha-
> idg-2 | action 51
> ...
> Jun 17 16:48:20 [9583] ha-idg-1 crmd: notice:
> run_graph: Transition 62 (Complete=9, Pending=0, Fired=0,
> Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-
> 1085.bz2): Complete
> Jun 17 16:48:20 [9583] ha-idg-1 crmd: info: do_log: Input
> I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
> Jun 17 16:48:20 [9583] ha-idg-1 crmd: notice:
> do_state_transition: State transition S_TRANSITION_ENGINE ->
> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
>
> 15 min later:
> ...
> Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_mausdb ( ha-idg-2 )
> Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_sim ( ha-idg-2 )
> Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_geneious ( ha-idg-2 )
> Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_idcc_devel ( ha-idg-2 )
> Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_genetrap ( ha-idg-2 )
> Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_mouseidgenes ( ha-idg-2 )
> Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_greensql ( ha-idg-2 )
> Jun 17 17:03:20 [9581] ha-idg-1 pengine: notice:
> LogAction: *
> Restart vm_severin ( ha-idg-2 )
> ...
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_mausdb_stop_0 on ha-
> idg-2 | action 29
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_sim_stop_0 on ha-idg-2
> | action 32
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_geneious_stop_0 on ha-
> idg-2 | action 35
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_idcc_devel_stop_0 on
> ha-idg-2 | action 38
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_genetrap_stop_0 on ha-
> idg-2 | action 41
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_mouseidgenes_stop_0 on
> ha-idg-2 | action 44
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_greensql_stop_0 on ha-
> idg-2 | action 47
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_severin_stop_0 on ha-
> idg-2 | action 50
> ...
> Despite the fact that the domains are already stopped they are
> stopped again !?!
>
> And immediately after being stopped they are started again:
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: info:
> match_graph_event: Action vm_mausdb_stop_0 (29) confirmed on
> ha-idg-2 (rc=0)
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating start operation vm_mausdb_start_0 on ha-
> idg-2 | action 30
>
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: info:
> match_graph_event: Action vm_idcc_devel_stop_0 (38) confirmed
> on ha-idg-2 (rc=0)
> Jun 17 17:03:20 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating start operation vm_idcc_devel_start_0 on
> ha-idg-2 | action 39
> Stopped and immediately started again !?!
>
> Also interesting:
> ...
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: info:
> crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped
> (900000ms)
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE
> | input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: info:
> do_state_transition: Progressed to state S_POLICY_ENGINE after
> C_TIMER_POPPED
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: notice:
> unpack_config: On loss of CCM Quorum: Ignore
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> unpack_status: Node ha-idg-1 is in standby-mode
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> determine_online_status_fencing: Node ha-idg-1 is active
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> determine_online_status: Node ha-idg-1 is standby
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> determine_online_status_fencing: Node ha-idg-2 is active
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> determine_online_status: Node ha-idg-2 is online
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning:
> unpack_rsc_op_failure: Processing failed monitor of vm_mausdb on
> ha-idg-2: not running | rc=7
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning:
> unpack_rsc_op_failure: Processing failed migrate_to of
> vm_mouseidgenes on ha-idg-2: unknown error | rc=1
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning:
> unpack_rsc_op_failure: Processing failed migrate_to of
> vm_idcc_devel on ha-idg-2: unknown error | rc=1
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning:
> unpack_rsc_op_failure: Processing failed migrate_to of vm_sim on
> ha-idg-2: unknown error | rc=1
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning:
> unpack_rsc_op_failure: Processing failed migrate_to of vm_genetrap
> on ha-idg-2: unknown error | rc=1
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning:
> unpack_rsc_op_failure: Processing failed migrate_to of vm_geneious
> on ha-idg-2: unknown error | rc=1
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning:
> unpack_rsc_op_failure: Processing failed migrate_to of vm_greensql
> on ha-idg-2: unknown error | rc=1
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: warning:
> unpack_rsc_op_failure: Processing failed migrate_to of vm_severin
> on ha-idg-2: unknown error | rc=1
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> unpack_node_loop: Node 1084777482 is already processed
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> unpack_node_loop: Node 1084777492 is already processed
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> unpack_node_loop: Node 1084777482 is already processed
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: info:
> unpack_node_loop: Node 1084777492 is already processed
> ...
> Jun 19 14:47:42 [9581] ha-idg-1 pengine: notice:
> process_pe_message: ===============> Calculated transition 250
> <===============, saving inputs in /var/lib/pacemaker/pengine/pe-
> input-1273.bz2
>
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: info:
> do_state_transition: State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> do_te_invoke: Processing graph 250 (ref=pe_calc-dc-1560948462-
> 3241) derived from /var/lib/pacemaker/pengine/pe-input-
> 1273.bz2 <===============================
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_sim_stop_0 on ha-idg-2
> | action 32
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_geneious_stop_0 on ha-
> idg-2 | action 35
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_idcc_devel_stop_0 on
> ha-idg-2 | action 38
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_genetrap_stop_0 on ha-
> idg-2 | action 41
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_mouseidgenes_stop_0 on
> ha-idg-2 | action 44
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_greensql_stop_0 on ha-
> idg-2 | action 47
> Jun 19 14:47:42 [9583] ha-idg-1 crmd: notice:
> te_rsc_command: Initiating stop operation vm_severin_stop_0 on ha-
> idg-2 | action 50
> ...
> The changes are calculated to transition 250 and completed:
> ...
> Jun 19 14:47:46 [9583] ha-idg-1 crmd: notice:
> run_graph: =========> Transition 250 (Complete=22, Pending=0,
> Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-1273.bz2): Complete
> <===========
>
> Jun 19 14:47:46 [9583] ha-idg-1 crmd: info: do_log: Input
> I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
> Jun 19 14:47:46 [9583] ha-idg-1 crmd: notice:
> do_state_transition: State transition S_TRANSITION_ENGINE ->
> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
> Jun 19 14:47:46 [9578] ha-idg-1 stonith-ng: info:
> update_cib_stonith_devices_v2: Updating device list from the cib:
> modify lrm_rsc_op[@id='vm_mouseidgenes_monitor_30000']
> Jun 19 14:47:46 [9578] ha-idg-1 stonith-ng: info:
> cib_devices_update: Updating devices to version 2.7007.890
> Jun 19 14:47:46 [9578] ha-idg-1 stonith-ng: notice:
> unpack_config: On loss of CCM Quorum: Ignore
> Jun 19 14:47:46 [9578] ha-idg-1 stonith-ng: info:
> cib_device_update: Device fence_ilo_ha-idg-1 has been disabled
> on ha-idg-1: score=-INFINITY
>
> Transition 250 is completed. But
> ...
> Jun 19 14:47:51 [9577] ha-idg-1 cib: info:
> cib_process_ping: Reporting our current digest to ha-idg-1:
> ae505b65505c427ab0a45d36717b4135 for 2.7007.890 (0x1ea7670 0)
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_process_request: Forwarding cib_delete operation for section
> //node_state[@uname='ha-idg-1']//lrm_resource[@id='vm_mouseidgenes']
> to all (origin=local/crmd/707)
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: --- 2.7007.890 2
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: +++ 2.7007.891
> 2d0272cf2594d336b179085d63b67c6f
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: --
> /cib/status/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_re
> sources/lrm_resource[@id='vm_mouseidgenes']
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: + /cib: @num_updates=891
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_process_request: Completed cib_delete operation for section
> //node_state[@uname='ha-idg-1']//lrm_resource[@id='vm_mouseidgenes']:
> OK (rc=0, origin=ha-idg-1/crmd/707, version=
> 2.7007.890)
> Jun 19 14:57:32 [9583] ha-idg-1 crmd: info:
> delete_resource: Removing resource vm_mouseidgenes for 4460a5c3-
> c009-44f6-a01d-52f93e731fda (root) on ha-idg-1
> Jun 19 14:57:32 [9583] ha-idg-1 crmd: info:
> notify_deleted: Notifying 4460a5c3-c009-44f6-a01d-52f93e731fda on
> ha-idg-1 that vm_mouseidgenes was deleted
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_process_request: Forwarding cib_delete operation for section
> //node_state[@uname='ha-idg-1']//lrm_resource[@id='vm_mouseidgenes']
> to all (origin=local/crmd/708)
> Jun 19 14:57:32 [9583] ha-idg-1 crmd: warning:
> qb_ipcs_event_sendv: new_event_notification (9583-10294-15):
> Broken pipe (32)
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: --- 2.7007.890 2
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: +++ 2.7007.891 (null)
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: --
> /cib/status/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_re
> sources/lrm_resource[@id='vm_mouseidgenes']
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: + /cib: @num_updates=891
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_process_request: Completed cib_delete operation for section
> //node_state[@uname='ha-idg-1']//lrm_resource[@id='vm_mouseidgenes']:
> OK (rc=0, origin=ha-idg-1/crmd/708, version=
> 2.7007.891)
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_process_request: Forwarding cib_modify operation for section
> crm_config to all (origin=local/crmd/710)
>
> Jun 19 14:57:32 [9583] ha-idg-1 crmd: info:
> abort_transition_graph: ========> Transition 250 aborted
> <============== by deletion of lrm_resource[@id='vm_mouseidgenes']:
> Resource state removal | cib=2.7007.891 source=abort_unless_down:344
> path=/cib/sta
> tus/node_state[@id='1084777482']/lrm[@id='1084777482']/lrm_resources/
> lrm_resource[@id='vm_mouseidgenes'] complete=true
>
> Jun 19 14:57:32 [9583] ha-idg-1 crmd: notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE
> | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph
> Jun 19 14:57:32 [9578] ha-idg-1 stonith-ng: info:
> update_cib_stonith_devices_v2: Updating device list from the cib:
> delete lrm_resource[@id='vm_mouseidgenes']
> Jun 19 14:57:32 [9578] ha-idg-1 stonith-ng: info:
> cib_devices_update: Updating devices to version 2.7007.891
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: --- 2.7007.891 2
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: Diff: +++ 2.7008.0 (null)
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: + /cib: @epoch=7008, @num_updates=0
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_perform_op: + /cib/configuration/crm_config/cluster_property_s
> et[@id='cib-bootstrap-options']/nvpair[@id='cib-bootstrap-options-
> last-lrm-refresh']: @value=1560949052
> Jun 19 14:57:32 [9578] ha-idg-1 stonith-ng: notice:
> unpack_config: On loss of CCM Quorum: Ignore
> Jun 19 14:57:32 [9577] ha-idg-1 cib: info:
> cib_process_request: Completed cib_modify operation for section
> crm_config: OK (rc=0, origin=ha-idg-1/crmd/710, version=2.7008.0)
> Jun 19 14:57:32 [9583] ha-idg-1 crmd: info:
> abort_transition_graph: Transition 250 aborted by cib-bootstrap-
> options-last-lrm-refresh doing modify last-lrm-refresh=1560949052:
> Configuration change | cib=2.7008.0 source=te_update_diff_v2:500
> path=/cib/configuration/crm_config/cluster_property_set[@id='cib-
> bootstrap-options']/nvpair[@id='cib-bootstrap-options-last-lrm-
> refresh'] complete=true
>
> a few minutes later transition 250 is aborted. How can something
> which is completed being aborted ?
>
>
>
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep,
> Heinrich Bassler, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list