[ClusterLabs] Antw: Re: Antw: [EXT] clear_failcount operation times out, makes it impossible to use the cluster

Krzysztof Bodora krzysztof.bodora at open-e.com
Thu Jan 5 04:55:22 EST 2023


Hi, we use pacemaker as a part of a larger software stack, and 
installing from a new .iso and restoring all the configuration is a way 
of performing an update on the node, i just explained it as reinstalling 
the OS as that is essentialy what we are doing with regards to 
pacemaker. So necessarily some software was different, but it should be 
unrelated to pacemaker. Currently the error is not happening anymore 
after re-creating the cluster, the output of corosync-cfgtool -s looks 
like this:

root at swdal1-ISCSI01:~# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
         id      = 10.151.50.42
         status  = ring 0 active with no faults
RING ID 1
         id      = 192.168.120.50
         status  = ring 1 active with no faults
root at swdal1-ISCSI01:~# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
         id      = 10.151.50.42
         status  = ring 0 active with no faults
RING ID 1
         id      = 192.168.120.50
         status  = ring 1 active with no faults

Anyway, we did this process many times in the past and this is the first 
time we see this error. Perhaps its worth looking into if they way we 
restore the configuration after re-installation is "not enough" as you 
say. Maybe its somehow related to the resource failcount that is not 
restored. We'll also try to go with the latest version of pacemaker in 
the future. For now we decided to write this off as a one-off occurence 
and see if it happens again.

Thanks for the help so far.

W dniu 05.01.2023 o 08:23, Ulrich Windl pisze:
>>>> Ulrich Windl schrieb am 05.01.2023 um 08:22 in Nachricht <63B67A9F.36B :
> 161 :
> 60728>:
>>>>> Krzysztof Bodora <krzysztof.bodora at open-e.com> schrieb am 04.01.2023 um
> 09:51
>> in Nachricht <f779bcf3-1936-25ba-c464-44144c4aff43 at open-e.com>:
>>> It's an old installation, the error started appearing when one of the
>>> nodes was disconnected and the OS was re-installed, after which the
>>> pacemaker configuration was restored from a backup (pcs cluster
>>> cib-push) and the node rejoined the cluster. The failcount itself was at
>>> 8 for some time before, though. The configuration looks like this:
>> My guess is that it was "not enough" to restore the pacemaker configuration
>> only.
>> What was the reason to re-install the OS? In the last 30 years I never had
>> to reinstall an OS, even if support suggested to do so 😉.
>> I suggest to repeat and validate al lthe steps needed to set up a new
>> cluster.
>> What does "corosync-cfgtool -s" say?
> I forgot to ask: And the media used to reinstall was the same used to install
> the OS initially, meaning: do all the nodes run the same software versions
> now?
>
>> Regards,
>> Ulrich
>>
>>
>>> pcs config
>>>
>>> Cluster Name:
>>> Corosync Nodes:
>>>    10.151.50.43 10.151.50.42
>>> Pacemaker Nodes:
>>>    swdal1-ISCSI01 swdal1-ISCSI02
>>>
>>> Resources:
>>>    Clone: ping_resource-clone
>>>     Resource: ping_resource (class=ocf provider=pacemaker type=ping)
>>>      Attributes: multiplier=1000 dampen=15s attempts=4 timeout=3
>>> host_list="10.151.16.60 10.151.16.50 10.151.17.50 10.151.17.60"
>>>      Operations: start interval=0s timeout=60
>>> (ping_resource-start-interval-0s)
>>>                  stop interval=0s timeout=20
> (ping_resource-stop-interval-0s)
>>>                  monitor interval=5s timeout=15s
>>> (ping_resource-monitor-interval-5s)
>>>    Resource: Pool-1 (class=ocf provider=oe type=zfs)
>>>     Attributes: pool_name=Pool-1 pool_id=9200090953161398950
>>> encryption_password_hash=None
>>>     Meta Attrs: failure-timeout=30 is-managed=True
>>> encryption_password_hash=None
>>>     Operations: start interval=0s timeout=300 (Pool-1-start-interval-0s)
>>>                 stop interval=0s timeout=300 (Pool-1-stop-interval-0s)
>>>                 monitor interval=10 timeout=60
> (Pool-1-monitor-interval-10)
>>>    Resource: Pool-0 (class=ocf provider=oe type=zfs)
>>>     Attributes: pool_name=Pool-0 pool_id=4165732781319344895
>>> encryption_password_hash=None
>>>     Meta Attrs: failure-timeout=30 is-managed=True
>>> encryption_password_hash=None
>>>     Operations: start interval=0s timeout=300 (Pool-0-start-interval-0s)
>>>                 stop interval=0s timeout=300 (Pool-0-stop-interval-0s)
>>>                 monitor interval=10 timeout=60
> (Pool-0-monitor-interval-10)
>>> Stonith Devices:
>>> Fencing Levels:
>>>
>>> Location Constraints:
>>>     Resource: Pool-0
>>>       Enabled on: swdal1-ISCSI01 (score:1)
>>> (id:location-Pool-0-swdal1-ISCSI01-1)
>>>       Constraint: location-Pool-0
>>>         Rule: score=-INFINITY boolean-op=or (id:location-Pool-0-rule)
>>>           Expression: pingd lt 1  (id:location-Pool-0-rule-expr)
>>>           Expression: not_defined pingd (id:location-Pool-0-rule-expr-1)
>>>     Resource: Pool-1
>>>       Enabled on: swdal1-ISCSI01 (score:1)
>>> (id:location-Pool-1-swdal1-ISCSI01-1)
>>>       Constraint: location-Pool-1
>>>         Rule: score=-INFINITY boolean-op=or (id:location-Pool-1-rule)
>>>           Expression: pingd lt 1  (id:location-Pool-1-rule-expr)
>>>           Expression: not_defined pingd (id:location-Pool-1-rule-expr-1)
>>> Ordering Constraints:
>>> Colocation Constraints:
>>>
>>> Resources Defaults:
>>>    resource-stickiness: 100000
>>> Operations Defaults:
>>>    record-pending: true
>>>
>>> Cluster Properties:
>>>    batch-limit: 1
>>>    cluster-infrastructure: corosync
>>>    cluster-recheck-interval: 180
>>>    dc-version: 1.1.12-1.1.12+git+561c4cf
>>>    no-quorum-policy: ignore
>>>    stonith-enabled: false
>>>    stop-orphan-resources: false
>>>
>>> W dniu 02.01.2023 o 13:12, Ulrich Windl pisze:
>>>> Hi!
>>>>
>>>> I wonder: Is this a new installation, or is it a new bug in an old
>>> installation?
>>>> For the first case I'd recommend to start with current software, and for
> the
>>> second case please describe what had changed or what had triggered the
>>> situation.
>>>> Also provide basic configuration data, please.
>>>>
>>>> Regards,
>>>> Ulrich
>>>>
>>>>
>>>>>>> Krzysztof Bodora <krzysztof.bodora at open-e.com> schrieb am 02.01.2023
> um 12:16
>>>> in Nachricht <37d86b8d-c59f-5fe3-cba1-41d2c84fcb5b at open-e.com>:
>>>>> Hello Clusterlabs,
>>>>>
>>>>> I'm getting this error in the logs:
>>>>>
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:    error:
>>>>> print_synapse:     [Action    7]: In-flight crm op
>>>>> ping_resource_clear_failcount_0   on swdal1-ISCSI01 (priority: 0,
>>>>> waiting: none)
>>>>>
>>>>> My specfications:
>>>>>
>>>>> OS: Debian 8
>>>>> Pacemaker version: 1.1.12
>>>>> Kernel version: 4.19.190
>>>>>
>>>>> I'd like to know what can cause this error to happen and how to prevent
>>>>> it in the future. I'm also currently unable to update to a newer
> version
>>>>> of pacemaker.
>>>>>
>>>>> Here is some context for when it happens. It seems that the
>>>>> ping_resource resources are in 'Restart' state:
>>>>>
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> LogActions:        Restart ping_resource:0 (Started swdal1-ISCSI01)
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> LogActions:        Restart ping_resource:1 (Started swdal1-ISCSI02)
>>>>>
>>>>> which causes pacemaker to try to clear the failcounts on those
> resources:
>>>>> Dec 20 09:24:23 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> do_state_transition:       State transition S_POLICY_ENGINE ->
>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>>>>> origin=handle_response ]
>>>>> Dec 20 09:24:23 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> do_te_invoke:      Processing graph 11 (ref=pe_calc-dc-1671528262-59)
>>>>> derived from /var/lib/pacemaker/pengine/pe-input-518.bz2
>>>>> Dec 20 09:24:23 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> te_crm_command:    Executing crm-event (7): clear_failcount on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> handle_failcount_op:       Removing failcount for ping_resource
>>>>> Dec 20 09:24:23 [57841] swdal1-ISCSI01        cib:     info:
>>>>> cib_process_request:       Forwarding cib_delete operation for section
>>>>>
> //node_state[@uname='swdal1-ISCSI01']//lrm_resource[@id='ping_resource']/lrm
>>>>> _rsc_op[@id='ping_resource_last_failure_0']
>>>>> to master (origin=local/crmd/118)
>>>>> Dec 20 09:24:23 [57841] swdal1-ISCSI01        cib:     info:
>>>>> cib_process_request:       Completed cib_delete operation for section
>>>>>
> //node_state[@uname='swdal1-ISCSI01']//lrm_resource[@id='ping_resource']/lrm
>>>>> _rsc_op[@id='ping_resource_last_failure_0']:
>>>>> OK (rc=0, origin=swdal1-ISCSI01/crmd/118, version=0.60.0)
>>>>> Dec 20 09:24:28 [57841] swdal1-ISCSI01        cib:     info:
>>>>> cib_process_ping:  Reporting our current digest to swdal1-ISCSI01:
>>>>> ccf71244504d3deb02d0da64fa72cedc for 0.60.0 (0x55788a83c4b0 0)
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:  warning:
>>>>> action_timer_callback:     Timer popped (timeout=20000, abort_level=0,
>>>>> complete=false)
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:    error:
>>>>> print_synapse:     [Action    7]: In-flight crm op
>>>>> ping_resource_clear_failcount_0   on swdal1-ISCSI01 (priority: 0,
>>>>> waiting: none)
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:   notice:
>>>>> abort_transition_graph:    Transition aborted: Action lost
>>>>> (source=action_timer_callback:772, 0)
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:   notice:
>>>>> run_graph:         Transition 11 (Complete=1, Pending=0, Fired=0,
>>>>> Skipped=9, Incomplete=2,
>>>>> Source=/var/lib/pacemaker/pengine/pe-input-518.bz2): Stopped
>>>>>
>>>>> Clearing the failcount fails, so the whole transition is aborted. This
>>>>> make it impossible to do anything in the cluster, for example move
>>>>> Pool-0 resource, as it also trigger the clear_failcount operation which
>>>>> fails and aborts the transition, for example:
>>>>>
>>>>> Dec 20 09:35:04 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> RecurringOp:        Start recurring monitor (5s) for ping_resource:0 on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:35:04 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> RecurringOp:        Start recurring monitor (5s) for ping_resource:1 on
>>>>> swdal1-ISCSI02
>>>>> Dec 20 09:35:04 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> RecurringOp:        Start recurring monitor (10s) for Pool-0 on
>>>>> swdal1-ISCSI02
>>>>> Dec 20 09:35:04 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> LogActions:        Restart ping_resource:0 (Started swdal1-ISCSI01)
>>>>> Dec 20 09:35:04 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> LogActions:        Restart ping_resource:1 (Started swdal1-ISCSI02)
>>>>> Dec 20 09:35:04 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> LogActions:        Leave   Pool-1  (Started swdal1-ISCSI01)
>>>>> Dec 20 09:35:04 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> LogActions:        Move    Pool-0  (Started swdal1-ISCSI01 ->
>>>>> swdal1-ISCSI02)
>>>>> Dec 20 09:35:04 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> process_pe_message:        Calculated Transition 19:
>>>>> /var/lib/pacemaker/pengine/pe-input-519.bz2
>>>>> Dec 20 09:35:04 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> do_state_transition:       State transition S_POLICY_ENGINE ->
>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>>>>> origin=handle_response ]
>>>>> Dec 20 09:35:04 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> do_te_invoke:      Processing graph 19 (ref=pe_calc-dc-1671528904-75)
>>>>> derived from /var/lib/pacemaker/pengine/pe-input-519.bz2
>>>>> Dec 20 09:35:04 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> te_crm_command:    Executing crm-event (7): clear_failcount on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:35:04 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> handle_failcount_op:       Removing failcount for ping_resource
>>>>> Dec 20 09:35:04 [57841] swdal1-ISCSI01        cib:     info:
>>>>> cib_process_request:       Forwarding cib_delete operation for section
>>>>>
> //node_state[@uname='swdal1-ISCSI01']//lrm_resource[@id='ping_resource']/lrm
>>>>> _rsc_op[@id='ping_resource_last_failure_0']
>>>>> to master (origin=local/crmd/134)
>>>>> Dec 20 09:35:04 [57841] swdal1-ISCSI01        cib:    info:
>>>>> cib_process_request:       Completed cib_delete operation for section
>>>>>
> //node_state[@uname='swdal1-ISCSI01']//lrm_resource[@id='ping_resource']/lrm
>>>>> _rsc_op[@id='ping_resource_last_failure_0']:
>>>>> OK (rc=0, origin=swdal1-ISCSI01/crmd/134, version=0.61.0)
>>>>> Dec 20 09:35:09 [57841] swdal1-ISCSI01        cib:     info:
>>>>> cib_process_ping:  Reporting our current digest to swdal1-ISCSI01:
>>>>> decc3ad1315820648f242167998a5880 for 0.61.0 (0x55788a8408e0 0)
>>>>> Dec 20 09:36:24 [57862] swdal1-ISCSI01       crmd:  warning:
>>>>> action_timer_callback:     Timer popped (timeout=20000, abort_level=0,
>>>>> complete=false)
>>>>> Dec 20 09:36:24 [57862] swdal1-ISCSI01       crmd:    error:
>>>>> print_synapse:     [Action    7]: In-flight crm op
>>>>> ping_resource_clear_failcount_0   on swdal1-ISCSI01 (priority: 0,
>>>>> waiting: none)
>>>>> Dec 20 09:36:24 [57862] swdal1-ISCSI01       crmd:   notice:
>>>>> abort_transition_graph:    Transition aborted: Action lost
>>>>> (source=action_timer_callback:772, 0)
>>>>> Dec 20 09:36:24 [57862] swdal1-ISCSI01       crmd:   notice:
>>>>> run_graph:         Transition 19 (Complete=1, Pending=0, Fired=0,
>>>>> Skipped=12, Incomplete=2,
>>>>> Source=/var/lib/pacemaker/pengine/pe-input-519.bz2): Stopped
>>>>>
>>>>> As you can see the 'stop' operation for resource Pool-0 did not even
>>>>> run, as the transition was stopped by the clear_failcount error. This
>>>>> error kept happening until we restarted pacemaker. Here is some more
>>>>> context from one of the times this error has happened:
>>>>>
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> process_pe_message:        Input has not changed since last time, not
>>>>> saving to disk
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> unpack_config:     On loss of CCM Quorum: Ignore
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> determine_online_status:   Node swdal1-ISCSI01 is online
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> determine_online_status:   Node swdal1-ISCSI02 is online
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> determine_op_status:       Operation monitor found resource Pool-0
>>>>> active on swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> determine_op_status:       Operation monitor found resource Pool-0
>>>>> active on swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> determine_op_status:       Operation monitor found resource Pool-1
>>>>> active on swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> determine_op_status:       Operation monitor found resource Pool-1
>>>>> active on swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> clone_print:        Clone Set: ping_resource-clone [ping_resource]
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> short_print:            Started: [ swdal1-ISCSI01 swdal1-ISCSI02 ]
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> native_print:      Pool-1  (ocf::oe:zfs):  Started swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> native_print:      Pool-0  (ocf::oe:zfs):  Started swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> get_failcount_full:        ping_resource:0 has failed 8 times on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> common_apply_stickiness:   ping_resource-clone can fail 999992 more
>>>>> times on swdal1-ISCSI01 before being forced off
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> get_failcount_full:        ping_resource:1 has failed 8 times on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> common_apply_stickiness:   ping_resource-clone can fail 999992 more
>>>>> times on swdal1-ISCSI01 before being forced off
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> check_action_definition:   params:reload   <parameters
> multiplier="1000"
>>>>> dampen="15s" host_list="10.151.17.50 10.151.16.50 10.151.17.60
>>>>> 10.151.16.60" attempts="4" timeout="3"/>
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> check_action_definition:   Parameters to ping_resource:0_start_0 on
>>>>> swdal1-ISCSI01 changed: was 57524cd0b7204dd60c127ba66fb83cd2 vs. now
>>>>> 1a37c0e0391890df8549f5fda647f4d9 (reload:3.0.9)
>>>>> 0:0;14:28:0:a0f1b96e-5089-4dad-9073-8c8feac4ea3a
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> get_failcount_full:        ping_resource:0 has failed 8 times on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> check_action_definition:   params:reload   <parameters
> multiplier="1000"
>>>>> dampen="15s" host_list="10.151.17.50 10.151.16.50 10.151.17.60
>>>>> 10.151.16.60" attempts="4" timeout="3" CRM_meta_timeout="15000"/>
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> check_action_definition:   Parameters to ping_resource:0_monitor_5000
> on
>>>>> swdal1-ISCSI01 changed: was f3b4adf4d46692f312296263faa50a75 vs. now
>>>>> c0d10fc8996c295dd1213d4ca058c0e7 (reload:3.0.9)
>>>>> 0:0;15:28:0:a0f1b96e-5089-4dad-9073-8c8feac4ea3a
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> get_failcount_full:        ping_resource:0 has failed 8 times on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> check_action_definition:   params:reload   <parameters
> multiplier="1000"
>>>>> dampen="15s" host_list="10.151.17.50 10.151.16.50 10.151.17.60
>>>>> 10.151.16.60" attempts="4" timeout="3"/>
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> check_action_definition:   Parameters to ping_resource:1_start_0 on
>>>>> swdal1-ISCSI02 changed: was 57524cd0b7204dd60c127ba66fb83cd2 vs. now
>>>>> 1a37c0e0391890df8549f5fda647f4d9 (reload:3.0.9)
>>>>> 0:0;17:7:0:0ea53274-56ef-48f6-9de1-38d635fa2530
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> check_action_definition:   params:reload   <parameters
> multiplier="1000"
>>>>> dampen="15s" host_list="10.151.17.50 10.151.16.50 10.151.17.60
>>>>> 10.151.16.60" attempts="4" timeout="3" CRM_meta_timeout="15000"/>
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> check_action_definition:   Parameters to ping_resource:1_monitor_5000
> on
>>>>> swdal1-ISCSI02 changed: was f3b4adf4d46692f312296263faa50a75 vs. now
>>>>> c0d10fc8996c295dd1213d4ca058c0e7 (reload:3.0.9)
>>>>> 0:0;18:7:0:0ea53274-56ef-48f6-9de1-38d635fa2530
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> RecurringOp:        Start recurring monitor (5s) for ping_resource:0 on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> RecurringOp:        Start recurring monitor (5s) for ping_resource:1 on
>>>>> swdal1-ISCSI02
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> LogActions:        Restart ping_resource:0 (Started swdal1-ISCSI01)
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> LogActions:        Restart ping_resource:1 (Started swdal1-ISCSI02)
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> LogActions:        Leave   Pool-1  (Started swdal1-ISCSI01)
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:     info:
>>>>> LogActions:        Leave   Pool-0  (Started swdal1-ISCSI01)
>>>>> Dec 20 09:24:23 [57851] swdal1-ISCSI01    pengine:   notice:
>>>>> process_pe_message:        Calculated Transition 11:
>>>>> /var/lib/pacemaker/pengine/pe-input-518.bz2
>>>>> Dec 20 09:24:23 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> do_state_transition:       State transition S_POLICY_ENGINE ->
>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>>>>> origin=handle_response ]
>>>>> Dec 20 09:24:23 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> do_te_invoke:      Processing graph 11 (ref=pe_calc-dc-1671528262-59)
>>>>> derived from /var/lib/pacemaker/pengine/pe-input-518.bz2
>>>>> Dec 20 09:24:23 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> te_crm_command:    Executing crm-event (7): clear_failcount on
>>>>> swdal1-ISCSI01
>>>>> Dec 20 09:24:23 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> handle_failcount_op:       Removing failcount for ping_resource
>>>>> Dec 20 09:24:23 [57841] swdal1-ISCSI01        cib:     info:
>>>>> cib_process_request:       Forwarding cib_delete operation for section
>>>>>
> //node_state[@uname='swdal1-ISCSI01']//lrm_resource[@id='ping_resource']/lrm
>>>>> _rsc_op[@id='ping_resource_last_failure_0']
>>>>> to master (origin=local/crmd/118)
>>>>> Dec 20 09:24:23 [57841] swdal1-ISCSI01        cib:     info:
>>>>> cib_process_request:       Completed cib_delete operation for section
>>>>>
> //node_state[@uname='swdal1-ISCSI01']//lrm_resource[@id='ping_resource']/lrm
>>>>> _rsc_op[@id='ping_resource_last_failure_0']:
>>>>> OK (rc=0, origin=swdal1-ISCSI01/crmd/118, version=0.60.0)
>>>>> Dec 20 09:24:28 [57841] swdal1-ISCSI01        cib:     info:
>>>>> cib_process_ping:  Reporting our current digest to swdal1-ISCSI01:
>>>>> ccf71244504d3deb02d0da64fa72cedc for 0.60.0 (0x55788a83c4b0 0)
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:  warning:
>>>>> action_timer_callback:     Timer popped (timeout=20000, abort_level=0,
>>>>> complete=false)
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:    error:
>>>>> print_synapse:     [Action    7]: In-flight crm op
>>>>> ping_resource_clear_failcount_0   on swdal1-ISCSI01 (priority: 0,
>>>>> waiting: none)
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:   notice:
>>>>> abort_transition_graph:    Transition aborted: Action lost
>>>>> (source=action_timer_callback:772, 0)
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:   notice:
>>>>> run_graph:         Transition 11 (Complete=1, Pending=0, Fired=0,
>>>>> Skipped=9, Incomplete=2,
>>>>> Source=/var/lib/pacemaker/pengine/pe-input-518.bz2): Stopped
>>>>> Dec 20 09:25:43 [57862] swdal1-ISCSI01       crmd:     info:
>>>>> do_state_transition:       State transition S_TRANSITION_ENGINE ->
>>>>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=notify_crmd ]
>>>>> I'd appreciate some information about this topic.
>>>>>
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list