[ClusterLabs] Is 20 seconds to complete redis switchover to be expected?

Strahil Nikolov hunter86_bg at yahoo.com
Wed Apr 1 00:18:52 EDT 2020


On April 1, 2020 6:20:47 AM GMT+03:00, steven prothero <steven at marimo-tech.com> wrote:
> Hello All,
>
>RE:
>https://lists.clusterlabs.org/pipermail/users/2020-March/027041.html
>
>Thank you for taking a look at my issue. I included the full logs in
>this post below summary & configuration....
>Fencing is taking 7 seconds (fence_ipmilan, power off)...  Redis
>switchover 19 seconds later.
>
>SUMMARY:
>11:05:01 - node1 crash (redis-server killed)
>11:05:03 - Pacemaker knows there is a problem.
>11:05:03 - (Pacemaker notifies Redis of trouble?) LRM operation
>redis_notify_0 (call=30, rc=0, cib-update=0, confirmed=true) ok
>11:05:03 - STONITH starts. (fence_ipmilan power off)
>11:05:10 - STONITH finished.
>11:05:12 - 11:05:16 - Corosync forms new membership.
> ......
>11:05:20 - Redis switched.
>11:05:21 - VIP switched.
>
>So everything works, I am a rookie, new at this, so I don't see any
>trouble looking at the logs myself but wondering if those with
>experience can see why it is taking 20 seconds to complete failover.
>
>I am using pacemaker 1.1.10, with 2 nodes. (fence_ipmilan power off)
>redis database is empty as these are test systems. 1001 Thank yous for
>your ideas.
>
>my (edited) crm configure show :
>
>node $id="1" example1.com \
>        attributes standby="off"
>node $id="2" example2.com
>primitive fence_example1 stonith:fence_ipmilan \
>        params pcmk_host_list="example1.com" ipaddr="10.24.x.xxx"
>login="user" passwd="password" pcmk_action_limit="-1"
>concurrent-fencing="true" lanplus="true" action="off" \
>        op monitor interval="20s" \
>        meta target-role="Started"
>primitive fence_example2 stonith:fence_ipmilan \
>        params pcmk_host_list="example2.com" ipaddr="10.24.x.yyy"
>login="user" passwd="password" concurrent-fencing="true"
>pcmk_action_limit="-1" lanplus="true" action="off" delay="10" \
>        op monitor interval="20s" \
>        meta target-role="Started"
>primitive redis ocf:heartbeat:redis \
>        params bin="/usr/local/bin/redis-server"
>client_bin="/usr/local/bin/redis-cli" port="6379"
>config="/etc/redis/redis.conf" socket_name="redis.sock"
>pidfile_name="redis.pid" \
>        meta target-role="Started" is-managed="true" \
>    op monitor interval="1s" role="Master" timeout="1s" on-fail="fence"
>primitive res_vip ocf:heartbeat:IPaddr2 \
>        params ip="10.24.x.zzz" nic="bond0.xx" cidr_netmask="22" \
>        op monitor interval="10" \
>        meta target-role="Started"
>ms redis_clone redis \
>        meta notify="true" is-managed="true" ordered="false"
>interleave="true" globally-unique="false" target-role="Started"
>migration-threshold="1"
>colocation redis-first inf: redis_clone:Master res_vip
>order redis-before-vip inf: redis_clone:promote res_vip:start
>property $id="cib-bootstrap-options" \
>        dc-version="1.1.10-42f2063" \
>        cluster-infrastructure="corosync" \
>        stonith-enabled="true" \
>        no-quorum-policy="ignore" \
>        default-resource-stickiness="200" \
>        batch-limit="300" \
>        migration-limit="-1"
>property $id="redis_replication" \
>        redis_REPL_INFO="example1.com"
>rsc_defaults $id="rsc-options" \
>        migration-threshold="1"
>
>
>Here are my (redacted) logs during the start of the crash to the
>finish. about 20 seconds.
>
>
>Apr 01 11:05:03 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_apply_diff operation for section
>status: OK (rc=0, origin=node1.com/crmd/61, version=0.289.24)
>Apr 01 11:05:03 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='fail-count-redis']:
>No such device or address (rc=-6, origin=local/attrd/12,
>version=0.289.24)
>Apr 01 11:05:03 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_apply_diff operation for section
>status: OK (rc=0, origin=node1.com/attrd/29, version=0.289.25)
>Apr 01 11:05:03 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='last-failure-redis']:
>No such device or address (rc=-6, origin=local/attrd/13,
>version=0.289.25)
>Apr 01 11:05:03 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_apply_diff operation for section
>status: OK (rc=0, origin=node1.com/attrd/32, version=0.289.26)
>Apr 01 11:05:03 [3537] node2.com stonith-ng:   notice:
>can_fence_host_with_device:    fence_node1 can fence node1.com:
>static-list
>Apr 01 11:05:03 [3537] node2.com stonith-ng:   notice:
>can_fence_host_with_device:    fence_node2 can not fence node1.com:
>static-list
>Apr 01 11:05:03 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_query from node1.com: OK (0)
>Apr 01 11:05:03 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=xxx op=redis_notify_0
>Apr 01 11:05:03 [3538] node2.com       lrmd:     info: log_execute:
>executing - rsc:redis action:notify call_id:30
>Apr 01 11:05:03 [3537] node2.com stonith-ng:   notice:
>can_fence_host_with_device:    fence_node1 can fence node1.com:
>static-list
>Apr 01 11:05:03 [3537] node2.com stonith-ng:   notice:
>can_fence_host_with_device:    fence_node2 can not fence node1.com:
>static-list
>Apr 01 11:05:03 [3537] node2.com stonith-ng:     info:
>stonith_fence_get_devices_cb:  Found 1 matching devices for
>'node1.com'
>Apr 01 11:05:03 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_fence from node1.com: Operation
>now in progress (-115)
>Apr 01 11:05:03 [3537] node2.com stonith-ng:     info:
>stonith_action_create:         Initiating action reboot for agent
>fence_ipmilan (target=node1.com)
>Apr 01 11:05:04 [4328] node2.com pacemakerd:     info: crm_log_init:
>Changed active directory to /var/lib/heartbeat/cores/root
>Apr 01 11:05:04 [4328] node2.com pacemakerd:     info:
>crm_xml_cleanup:       Cleaning up memory from libxml2
>Apr 01 11:05:04 [3538] node2.com       lrmd:     info: log_finished:
>finished - rsc:redis action:notify call_id:30 pid:4303 exit-code:0
>exec-time:559ms queue-time:0ms
>Apr 01 11:05:04 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation redis_notify_0 (call=30, rc=0,
>cib-update=0, confirmed=true) ok
>Apr 01 11:05:10 [3537] node2.com stonith-ng:   notice: log_operation:
>       Operation 'reboot' [4304] (call 2 from crmd.2027) for host
>'node1.com' with device 'fence_node1' returned: 0 (OK)
>Apr 01 11:05:10 [3537] node2.com stonith-ng:     info: log_operation:
>       fence_node1:4304 [ Powering off machine @
>IPMI:10.24.x.yyy...Done ]
>Apr 01 11:05:12 [1773] node2.com corosync notice  [TOTEM ] A processor
>failed, forming new configuration.
>Apr 01 11:05:16 [1773] node2.com corosync notice  [TOTEM ] A new
>membership (10.24.x.zz:26956) was formed. Members left: 1
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>pcmk_cpg_membership:   Left[1.0] cib.1
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>pcmk_cpg_membership:   Left[1.0] crmd.1
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>crm_update_peer_proc:  pcmk_cpg_membership: Node node1.com[1] -
>corosync-cpg is now offline
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crm_update_peer_proc:  pcmk_cpg_membership: Node node1.com[1] -
>corosync-cpg is now offline
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>pcmk_cpg_membership:   Member[1.0] cib.2
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>peer_update_callback:  Client node1.com/peer now has status [offline]
>(DC=node1.com)
>Apr 01 11:05:16 [3541] node2.com       crmd:   notice:
>peer_update_callback:  Our peer on the DC is dead
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>pcmk_cpg_membership:   Member[1.0] crmd.2
>Apr 01 11:05:16 [3541] node2.com       crmd:   notice:
>do_state_transition:   State transition S_NOT_DC -> S_ELECTION [
>input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK
>origin=peer_update_callback ]
>Apr 01 11:05:16 [3541] node2.com       crmd:     info: update_dc:
>Unset DC. Was node1.com
>Apr 01 11:05:16 [1773] node2.com corosync notice  [QUORUM] Members[1]:
>2
>Apr 01 11:05:16 [1773] node2.com corosync notice  [MAIN  ] Completed
>service synchronization, ready to provide service.
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>pcmk_quorum_notification:      Membership 26956: quorum retained (1)
>Apr 01 11:05:16 [3541] node2.com       crmd:   notice:
>crm_update_peer_state:         pcmk_quorum_notification: Node
>node1.com[1] - state is now lost (was member)
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>peer_update_callback:  node1.com is now lost (was member)
>Apr 01 11:05:16 [3541] node2.com       crmd:     info: do_log:
>FSA: Input I_ELECTION_DC from do_election_check() received in state
>S_ELECTION
>Apr 01 11:05:16 [3541] node2.com       crmd:   notice:
>do_state_transition:   State transition S_ELECTION -> S_INTEGRATION [
>input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
>Apr 01 11:05:16 [3541] node2.com       crmd:     info: do_te_control:
>       Registering TE UUID: xxxxx
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>set_graph_functions:   Setting custom graph functions
>Apr 01 11:05:16 [3534] node2.com pacemakerd:     info:
>pcmk_quorum_notification:      Membership 26956: quorum retained (1)
>Apr 01 11:05:16 [3534] node2.com pacemakerd:   notice:
>crm_update_peer_state:         pcmk_quorum_notification: Node
>node1.com[1] - state is now lost (was member)
>Apr 01 11:05:16 [3540] node2.com    pengine:     info: crm_client_new:
>       Connecting 0x123 for uid=111 gid=115 pid=3541 id=xxx
>Apr 01 11:05:16 [3537] node2.com stonith-ng:     info:
>pcmk_cpg_membership:   Left[1.0] stonith-ng.1
>Apr 01 11:05:16 [3537] node2.com stonith-ng:     info:
>crm_update_peer_proc:  pcmk_cpg_membership: Node node1.com[1] -
>corosync-cpg is now offline
>Apr 01 11:05:16 [3537] node2.com stonith-ng:     info:
>pcmk_cpg_membership:   Member[1.0] stonith-ng.2
>Apr 01 11:05:16 [3537] node2.com stonith-ng:   notice: remote_op_done:
>       Operation reboot of node1.com by node2.com for
>crmd.2027 at node1.com.d87c55: OK
>Apr 01 11:05:16 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_notify reply from node2.com: OK
>(0)
>Apr 01 11:05:16 [3541] node2.com       crmd:     info: do_dc_takeover:
>       Taking over DC status for this partition
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_readwrite:         We are now in R/W mode
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_master operation for section
>'all': OK (rc=0, origin=local/crmd/15, version=0.289.26)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section cib:
>OK (rc=0, origin=local/crmd/16, version=0.289.26)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/configuration/crm_config//cluster_property_set//nvpair[@name='dc-version']:
>OK (rc=0, origin=local/crmd/17, version=0.289.26)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>crm_config: OK (rc=0, origin=local/crmd/18, version=0.289.26)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/configuration/crm_config//cluster_property_set//nvpair[@name='cluster-infrastructure']:
>OK (rc=0, origin=local/crmd/19, version=0.289.26)
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>join_make_offer:       Making join offers based on membership 26956
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>join_make_offer:       join-1: Sending offer to node2.com
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crm_update_peer_join:  join_make_offer: Node node2.com[2] - join-1
>phase 0 -> 1
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>do_dc_join_offer_all:  join-1: Waiting on 1 outstanding join acks
>Apr 01 11:05:16 [3541] node2.com       crmd:   notice:
>tengine_stonith_notify:        Peer node1.com was terminated (reboot)
>by node2.com for node1.com: OK (ref=xxxx) by client crmd.2027
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crm_update_peer_proc:  send_stonith_update: Node node1.com[1] - all
>processes are now offline
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>peer_update_callback:  Client node1.com/peer now has status [offline]
>(DC=true)
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crm_update_peer_expected:      send_stonith_update: Node node1.com[1]
>- expected state is now down
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>erase_status_tag:      Deleting xpath:
>//node_state[@uname='node1.com']/lrm
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>erase_status_tag:      Deleting xpath:
>//node_state[@uname='node1.com']/transient_attributes
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>tengine_stonith_notify:        External fencing operation from
>crmd.2027 fenced node1.com
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>abort_transition_graph:        tengine_stonith_notify:205 - Triggered
>transition abort (complete=1) : External Fencing Operation
>Apr 01 11:05:16 [3541] node2.com       crmd:     info: update_dc:
>Set DC to node2.com (3.0.7)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>crm_config: OK (rc=0, origin=local/crmd/20, version=0.289.26)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>crm_config: OK (rc=0, origin=local/crmd/21, version=0.289.26)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/22, version=0.289.27)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_delete operation for section
>//node_state[@uname='node1.com']/lrm: OK (rc=0, origin=local/crmd/23,
>version=0.289.28)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_delete operation for section
>//node_state[@uname='node1.com']/transient_attributes: OK (rc=0,
>origin=local/crmd/24, version=0.289.29)
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>'all': OK (rc=0, origin=local/crmd/25, version=0.289.29)
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>cib_fencing_updated:   Fencing update 22 for node1.com: complete
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crm_update_peer_join:  do_dc_join_filter_offer: Node node2.com[2] -
>join-1 phase 1 -> 2
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crm_update_peer_expected:      do_dc_join_filter_offer: Node
>node2.com[2] - expected state is now member
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>do_state_transition:   State transition S_INTEGRATION ->
>S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL
>origin=check_join_state ]
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crmd_join_phase_log:   join-1: node1.com=none
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crmd_join_phase_log:   join-1: node2.com=integrated
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>do_dc_join_finalize:   join-1: Syncing our CIB to the rest of the
>cluster
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_sync operation for section 'all':
>OK (rc=0, origin=local/crmd/26, version=0.289.29)
>Apr 01 11:05:16 [3541] node2.com       crmd:     info:
>crm_update_peer_join:  finalize_join_for: Node node2.com[2] - join-1
>phase 2 -> 3
>Apr 01 11:05:16 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>nodes: OK (rc=0, origin=local/crmd/27, version=0.289.29)
>Apr 01 11:05:17 [4431] node2.com pacemakerd:     info: crm_log_init:
>Changed active directory to /var/lib/heartbeat/cores/hacluster
>Apr 01 11:05:17 [4431] node2.com pacemakerd:     info:
>crm_xml_cleanup:       Cleaning up memory from libxml2
>Apr 01 11:05:17 [3541] node2.com       crmd:     info:
>services_os_action_execute:    Managed redis_meta-data_0 process 4408
>exited with rc=0
>Apr 01 11:05:17 [3541] node2.com       crmd:     info:
>crm_update_peer_join:  do_dc_join_ack: Node node2.com[2] - join-1
>phase 3 -> 4
>Apr 01 11:05:17 [3541] node2.com       crmd:     info: do_dc_join_ack:
>       join-1: Updating node state to member for node2.com
>Apr 01 11:05:17 [3541] node2.com       crmd:     info:
>erase_status_tag:      Deleting xpath:
>//node_state[@uname='node2.com']/lrm
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_delete operation for section
>//node_state[@uname='node2.com']/lrm: OK (rc=0, origin=local/crmd/28,
>version=0.289.30)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/29, version=0.289.31)
>Apr 01 11:05:17 [3541] node2.com       crmd:     info:
>do_state_transition:   State transition S_FINALIZE_JOIN ->
>S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL
>origin=check_join_state ]
>Apr 01 11:05:17 [3541] node2.com       crmd:     info:
>abort_transition_graph:        do_te_invoke:151 - Triggered transition
>abort (complete=1) : Peer Cancelled
>Apr 01 11:05:17 [3539] node2.com      attrd:   notice:
>attrd_local_callback:  Sending full refresh (origin=crmd)
>Apr 01 11:05:17 [3539] node2.com      attrd:   notice:
>attrd_trigger_update:  Sending flush op to all hosts for: master-redis
>(2)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>nodes: OK (rc=0, origin=local/crmd/30, version=0.289.31)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/31, version=0.289.32)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section cib:
>OK (rc=0, origin=local/crmd/32, version=0.289.33)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>'all': OK (rc=0, origin=local/crmd/33, version=0.289.33)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='master-redis']:
>OK (rc=0, origin=local/attrd/14, version=0.289.33)
>Apr 01 11:05:17 [3539] node2.com      attrd:   notice:
>attrd_trigger_update:  Sending flush op to all hosts for:
>probe_complete (true)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/attrd/15, version=0.289.33)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='probe_complete']:
>OK (rc=0, origin=local/attrd/16, version=0.289.33)
>Apr 01 11:05:17 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/attrd/17, version=0.289.33)
>Apr 01 11:05:18 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>'all': OK (rc=0, origin=local/crmd/34, version=0.289.33)
>Apr 01 11:05:18 [3540] node2.com    pengine:   notice: unpack_config:
>       On loss of CCM Quorum: Ignore
>Apr 01 11:05:18 [3540] node2.com    pengine:     info:
>determine_online_status_fencing:       Node node2.com is active
>Apr 01 11:05:18 [3540] node2.com    pengine:     info:
>determine_online_status:       Node node2.com is online
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: native_print:
>res_vip (ocf::heartbeat:IPaddr2):       Stopped
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: clone_print:
>Master/Slave Set: redis_clone [redis]
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: short_print:
>    Slaves: [ node2.com ]
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: short_print:
>    Stopped: [ node1.com ]
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: native_print:
>fence_node1       (stonith:fence_ipmilan):        Stopped
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: native_print:
>fence_node2       (stonith:fence_ipmilan):        Stopped
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: native_color:
>redis:1: Rolling back scores from res_vip
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: native_color:
>Resource redis:1 cannot run anywhere
>Apr 01 11:05:18 [3540] node2.com    pengine:     info:
>rsc_merge_weights:     redis_clone: Breaking dependency loop at
>redis_clone
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: master_color:
>Promoting redis:0 (Slave node2.com)
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: master_color:
>redis_clone: Promoted 1 instances of a possible 1 to master
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: RecurringOp:
>Start recurring monitor (10s) for res_vip on node2.com
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: RecurringOp:
>Start recurring monitor (1s) for redis:0 on node2.com
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: RecurringOp:
>Start recurring monitor (1s) for redis:0 on node2.com
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: RecurringOp:
>Start recurring monitor (20s) for fence_node1 on node2.com
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: RecurringOp:
>Start recurring monitor (20s) for fence_node2 on node2.com
>Apr 01 11:05:18 [3540] node2.com    pengine:   notice: LogActions:
>Start   res_vip (node2.com)
>Apr 01 11:05:18 [3540] node2.com    pengine:   notice: LogActions:
>Promote redis:0 (Slave -> Master node2.com)
>Apr 01 11:05:18 [3540] node2.com    pengine:     info: LogActions:
>Leave   redis:1 (Stopped)
>Apr 01 11:05:18 [3540] node2.com    pengine:   notice: LogActions:
>Start   fence_node1       (node2.com)
>Apr 01 11:05:18 [3540] node2.com    pengine:   notice: LogActions:
>Start   fence_node2       (node2.com)
>Apr 01 11:05:18 [3540] node2.com    pengine:   notice:
>process_pe_message:    Calculated Transition 0:
>/var/lib/pacemaker/pengine/pe-input-396.bz2
>Apr 01 11:05:18 [3541] node2.com       crmd:     info:
>do_state_transition:   State transition S_POLICY_ENGINE ->
>S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>origin=handle_response ]
>Apr 01 11:05:18 [3541] node2.com       crmd:     info: do_te_invoke:
>Processing graph 0 (ref=pe_calc-dc-1585718-14) derived from
>/var/lib/pacemaker/pengine/pe-input-396.bz2
>Apr 01 11:05:18 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 34: start fence_node1_start_0 on node2.com
>(local)
>Apr 01 11:05:18 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=xxxxx op=fence_node1_start_0
>Apr 01 11:05:18 [3538] node2.com       lrmd:     info: log_execute:
>executing - rsc:fence_node1 action:start call_id:33
>Apr 01 11:05:18 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 36: start fence_node2_start_0 on node2.com
>(local)
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info:
>stonith_action_create:         Initiating action metadata for agent
>fence_ipmilan (target=(null))
>Apr 01 11:05:18 [3537] node2.com stonith-ng:   notice:
>stonith_device_register:       Device 'fence_node1' already existed in
>device list (2 active devices)
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_device_register from lrmd.3538: OK
>(0)
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_execute from lrmd.3538: Operation
>now in progress (-115)
>Apr 01 11:05:18 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=xxx op=fence_node2_start_0
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info:
>stonith_action_create:         Initiating action monitor for agent
>fence_ipmilan (target=(null))
>Apr 01 11:05:18 [3538] node2.com       lrmd:     info: log_execute:
>executing - rsc:fence_node2 action:start call_id:35
>Apr 01 11:05:18 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 47: notify redis_pre_notify_promote_0 on
>node2.com (local)
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info:
>stonith_action_create:         Initiating action metadata for agent
>fence_ipmilan (target=(null))
>Apr 01 11:05:18 [3537] node2.com stonith-ng:   notice:
>stonith_device_register:       Device 'fence_node2' already existed in
>device list (2 active devices)
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_device_register from lrmd.3538: OK
>(0)
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_execute from lrmd.3538: Operation
>now in progress (-115)
>Apr 01 11:05:18 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=xxxx op=redis_notify_0
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info:
>stonith_action_create:         Initiating action monitor for agent
>fence_ipmilan (target=(null))
>Apr 01 11:05:18 [3538] node2.com       lrmd:     info: log_execute:
>executing - rsc:redis action:notify call_id:37
>Apr 01 11:05:18 [3537] node2.com stonith-ng:     info: log_operation:
>       fence_node2:4468 [ Getting status of IPMI:10.24.x.aa...Done ]
>Apr 01 11:05:18 [3538] node2.com       lrmd:     info: log_finished:
>finished - rsc:fence_node2 action:start call_id:35  exit-code:0
>exec-time:258ms queue-time:0ms
>Apr 01 11:05:18 [3541] node2.com       crmd:     info:
>stonith_action_create:         Initiating action metadata for agent
>fence_ipmilan (target=(null))
>Apr 01 11:05:18 [4490] node2.com pacemakerd:     info: crm_log_init:
>Changed active directory to /var/lib/heartbeat/cores/root
>Apr 01 11:05:18 [4490] node2.com pacemakerd:     info:
>crm_xml_cleanup:       Cleaning up memory from libxml2
>Apr 01 11:05:18 [3538] node2.com       lrmd:     info: log_finished:
>finished - rsc:redis action:notify call_id:37 pid:4469 exit-code:0
>exec-time:556ms queue-time:0ms
>Apr 01 11:05:19 [3537] node2.com stonith-ng:     info: log_operation:
>       fence_node1:4465 [ Getting status of IPMI:10.24.x.yyy...Done ]
>Apr 01 11:05:19 [3538] node2.com       lrmd:     info: log_finished:
>finished - rsc:fence_node1 action:start call_id:33  exit-code:0
>exec-time:1113ms queue-time:0ms
>Apr 01 11:05:19 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation fence_node2_start_0 (call=35,
>rc=0, cib-update=35, confirmed=true) ok
>Apr 01 11:05:19 [3541] node2.com       crmd:     info:
>match_graph_event:     Action redis_notify_0 (47) confirmed on
>node2.com (rc=0)
>Apr 01 11:05:19 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation redis_notify_0 (call=37, rc=0,
>cib-update=0, confirmed=true) ok
>Apr 01 11:05:19 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/35, version=0.289.34)
>Apr 01 11:05:19 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation fence_node1_start_0 (call=33,
>rc=0, cib-update=36, confirmed=true) ok
>Apr 01 11:05:19 [3541] node2.com       crmd:     info:
>match_graph_event:     Action fence_node2_start_0 (36) confirmed on
>node2.com (rc=0)
>Apr 01 11:05:19 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 37: monitor fence_node2_monitor_20000 on
>node2.com (local)
>Apr 01 11:05:19 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=aaaa op=fence_node2_monitor_20000
>Apr 01 11:05:19 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/36, version=0.289.35)
>Apr 01 11:05:19 [3541] node2.com       crmd:     info:
>match_graph_event:     Action fence_node1_start_0 (34) confirmed on
>node2.com (rc=0)
>Apr 01 11:05:19 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 8: promote redis_promote_0 on node2.com
>(local)
>Apr 01 11:05:19 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_execute from lrmd.3538: Operation
>now in progress (-115)
>Apr 01 11:05:19 [3537] node2.com stonith-ng:     info:
>stonith_action_create:         Initiating action monitor for agent
>fence_ipmilan (target=(null))
>Apr 01 11:05:19 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=aaaaa op=redis_promote_0
>Apr 01 11:05:19 [3538] node2.com       lrmd:     info: log_execute:
>executing - rsc:redis action:promote call_id:44
>Apr 01 11:05:19 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 35: monitor fence_node1_monitor_20000 on
>node2.com (local)
>Apr 01 11:05:19 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=yyyy op=fence_node1_monitor_20000
>Apr 01 11:05:19 [3537] node2.com stonith-ng:     info:
>stonith_command:       Processed st_execute from lrmd.3538: Operation
>now in progress (-115)
>Apr 01 11:05:19 [3537] node2.com stonith-ng:     info:
>stonith_action_create:         Initiating action monitor for agent
>fence_ipmilan (target=(null))
>Apr 01 11:05:19 [3537] node2.com stonith-ng:     info: log_operation:
>       fence_node2:4522 [ Getting status of IPMI:10.24.x.aa...Done ]
>Apr 01 11:05:19 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation fence_node2_monitor_20000
>(call=42, rc=0, cib-update=37, confirmed=false) ok
>Apr 01 11:05:19 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/37, version=0.289.36)
>Apr 01 11:05:19 [3541] node2.com       crmd:     info:
>match_graph_event:     Action fence_node2_monitor_20000 (37) confirmed
>on node2.com (rc=0)
>Apr 01 11:05:19 [3537] node2.com stonith-ng:     info: log_operation:
>       fence_node1:4525 [ Getting status of IPMI:10.24.x.yyy...Done ]
>Apr 01 11:05:19 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation fence_node1_monitor_20000
>(call=46, rc=0, cib-update=38, confirmed=false) ok
>Apr 01 11:05:19 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/38, version=0.289.37)
>Apr 01 11:05:19 [3541] node2.com       crmd:     info:
>match_graph_event:     Action fence_node1_monitor_20000 (35) confirmed
>on node2.com (rc=0)
>Apr 01 11:05:19 [4549] node2.com pacemakerd:     info: crm_log_init:
>Changed active directory to /var/lib/heartbeat/cores/root
>Apr 01 11:05:19 [4549] node2.com pacemakerd:     info:
>crm_xml_cleanup:       Cleaning up memory from libxml2
>Apr 01 11:05:20 [3536] node2.com        cib:     info: crm_client_new:
>       Connecting 0x7fee530 for uid=0 gid=0 pid=4587 id=hhhhhh
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.289.37)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='master-redis']:
>OK (rc=0, origin=local/crm_attribute/3, version=0.289.37)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>crm_client_destroy:    Destroying 0 events
>Apr 01 11:05:20 [3536] node2.com        cib:     info: crm_client_new:
>       Connecting 0x7f3e0 for uid=0 gid=0 pid=4601 id=ggggg
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.289.37)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='master-redis']:
>OK (rc=0, origin=local/crm_attribute/3, version=0.289.37)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>crm_client_destroy:    Destroying 0 events
>Apr 01 11:05:20 [3536] node2.com        cib:     info: crm_client_new:
>       Connecting 0x7ooee530 for uid=0 gid=0 pid=4602 id=uuuu
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.289.37)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/configuration/crm_config//cluster_property_set[@id='redis_replication']//nvpair[@name='redis_REPL_INFO']:
>OK (rc=0, origin=local/crm_attribute/3, version=0.289.37)
>Apr 01 11:05:20 [3536] node2.com        cib:   notice: cib:diff:
>Diff: --- 0.289.37
>Apr 01 11:05:20 [3536] node2.com        cib:   notice: cib:diff:
>Diff: +++ 0.290.1 xxxx
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>crm_config: OK (rc=0, origin=local/crm_attribute/4, version=0.290.1)
>Apr 01 11:05:20 [3541] node2.com       crmd:     info:
>abort_transition_graph:        te_update_diff:126 - Triggered
>transition abort (complete=0, node=, tag=diff, id=(null), magic=NA,
>cib=0.290.1) : Non-status change
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>crm_config: OK (rc=0, origin=local/crmd/39, version=0.290.1)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>crm_client_destroy:    Destroying 0 events
>Apr 01 11:05:20 [3538] node2.com       lrmd:     info: log_finished:
>finished - rsc:redis action:promote call_id:44 pid:4523 exit-code:0
>exec-time:868ms queue-time:0ms
>Apr 01 11:05:20 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation redis_promote_0 (call=44, rc=0,
>cib-update=40, confirmed=true) ok
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>write_cib_contents:    Archived previous version as
>/var/lib/pacemaker/cib/cib-8.raw
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/40, version=0.290.2)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>write_cib_contents:    Wrote version 0.290.0 of the CIB to disk
>(digest:  809718ebfe91e)
>Apr 01 11:05:20 [3541] node2.com       crmd:     info:
>match_graph_event:     Action redis_promote_0 (8) confirmed on
>node2.com (rc=0)
>Apr 01 11:05:20 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 48: notify redis_post_notify_promote_0 on
>node2.com (local)
>Apr 01 11:05:20 [3536] node2.com        cib:     info: retrieveCib:
>Reading cluster configuration from: /var/lib/pacemaker/cib/cib.7E6gmO
>(digest: /var/lib/pacemaker/cib/cib.8hDyp5)
>Apr 01 11:05:20 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=xxx op=redis_notify_0
>Apr 01 11:05:20 [3538] node2.com       lrmd:     info: log_execute:
>executing - rsc:redis action:notify call_id:51
>Apr 01 11:05:20 [4623] node2.com pacemakerd:     info: crm_log_init:
>Changed active directory to /var/lib/heartbeat/cores/root
>Apr 01 11:05:20 [4623] node2.com pacemakerd:     info:
>crm_xml_cleanup:       Cleaning up memory from libxml2
>Apr 01 11:05:20 [3536] node2.com        cib:     info: crm_client_new:
>       Connecting 0xxx40 for uid=0 gid=0 pid=4665 id=xxxx
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.290.2)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='master-redis']:
>OK (rc=0, origin=local/crm_attribute/3, version=0.290.2)
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>crm_client_destroy:    Destroying 0 events
>Apr 01 11:05:20 [3538] node2.com       lrmd:     info: log_finished:
>finished - rsc:redis action:notify call_id:51 pid:4604 exit-code:0
>exec-time:682ms queue-time:0ms
>Apr 01 11:05:20 [3541] node2.com       crmd:     info:
>match_graph_event:     Action redis_notify_0 (48) confirmed on
>node2.com (rc=0)
>Apr 01 11:05:20 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation redis_notify_0 (call=51, rc=0,
>cib-update=0, confirmed=true) ok
>Apr 01 11:05:20 [3541] node2.com       crmd:   notice: run_graph:
>Transition 0 (Complete=13, Pending=0, Fired=0, Skipped=3,
>Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-396.bz2):
>Stopped
>Apr 01 11:05:20 [3541] node2.com       crmd:     info:
>do_state_transition:   State transition S_TRANSITION_ENGINE ->
>S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
>origin=notify_crmd ]
>Apr 01 11:05:20 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>'all': OK (rc=0, origin=local/crmd/41, version=0.290.2)
>Apr 01 11:05:20 [3540] node2.com    pengine:   notice: unpack_config:
>       On loss of CCM Quorum: Ignore
>Apr 01 11:05:20 [3540] node2.com    pengine:     info:
>determine_online_status_fencing:       Node node2.com is active
>Apr 01 11:05:20 [3540] node2.com    pengine:     info:
>determine_online_status:       Node node2.com is online
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: native_print:
>res_vip (ocf::heartbeat:IPaddr2):       Stopped
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: clone_print:
>Master/Slave Set: redis_clone [redis]
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: short_print:
>    Masters: [ node2.com ]
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: short_print:
>    Stopped: [ node1.com ]
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: native_print:
>fence_node1       (stonith:fence_ipmilan):        Started node2.com
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: native_print:
>fence_node2       (stonith:fence_ipmilan):        Started node2.com
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: native_color:
>redis:1: Rolling back scores from res_vip
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: native_color:
>Resource redis:1 cannot run anywhere
>Apr 01 11:05:20 [3540] node2.com    pengine:     info:
>rsc_merge_weights:     redis_clone: Breaking dependency loop at
>redis_clone
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: master_color:
>Promoting redis:0 (Master node2.com)
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: master_color:
>redis_clone: Promoted 1 instances of a possible 1 to master
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: RecurringOp:
>Start recurring monitor (10s) for res_vip on node2.com
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: RecurringOp:
>Start recurring monitor (1s) for redis:0 on node2.com
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: RecurringOp:
>Start recurring monitor (1s) for redis:0 on node2.com
>Apr 01 11:05:20 [3540] node2.com    pengine:   notice: LogActions:
>Start   res_vip (node2.com)
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: LogActions:
>Leave   redis:0 (Master node2.com)
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: LogActions:
>Leave   redis:1 (Stopped)
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: LogActions:
>Leave   fence_node1       (Started node2.com)
>Apr 01 11:05:20 [3540] node2.com    pengine:     info: LogActions:
>Leave   fence_node2       (Started node2.com)
>Apr 01 11:05:20 [3540] node2.com    pengine:   notice:
>process_pe_message:    Calculated Transition 1:
>/var/lib/pacemaker/pengine/pe-input-397.bz2
>Apr 01 11:05:20 [3541] node2.com       crmd:     info:
>do_state_transition:   State transition S_POLICY_ENGINE ->
>S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>origin=handle_response ]
>Apr 01 11:05:20 [3541] node2.com       crmd:     info: do_te_invoke:
>Processing graph 1 (ref=pe_calc-dc-1586720-24) derived from
>/var/lib/pacemaker/pengine/pe-input-397.bz2
>Apr 01 11:05:20 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 6: start res_vip_start_0 on node2.com (local)
>Apr 01 11:05:20 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=xxxx op=res_vip_start_0
>Apr 01 11:05:20 [3538] node2.com       lrmd:     info: log_execute:
>executing - rsc:res_vip action:start call_id:54
>Apr 01 11:05:20 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 12: monitor redis_monitor_1000 on node2.com
>(local)
>Apr 01 11:05:20 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=xxxxxx op=redis_monitor_1000
>IPaddr2(res_vip)[4666]: 2020/04/01_11:05:21 INFO: Adding IPv4 address
>10.24.x.y/22 with broadcast address 10.24.x.255 to device bond0.x
>IPaddr2(res_vip)[4666]: 2020/04/01_11:05:21 INFO: Bringing device
>bond0.x up
>Apr 01 11:05:21 [4721] node2.com pacemakerd:     info: crm_log_init:
>Changed active directory to /var/lib/heartbeat/cores/root
>Apr 01 11:05:21 [4721] node2.com pacemakerd:     info:
>crm_xml_cleanup:       Cleaning up memory from libxml2
>IPaddr2(res_vip)[4666]: 2020/04/01_11:05:21 INFO:
>/usr/lib/heartbeat/send_arp -i 200 -r 5 -p
>/var/run/resource-agents/send_arp-10.24.x.aa bond0.2 10.24.x.aa auto
>not_used not_used
>Apr 01 11:05:21 [3538] node2.com       lrmd:   notice:
>operation_finished:    res_vip_start_0:4666:stderr [
>2020/04/01_11:05:21 INFO: Adding IPv4 address 10.24.x.aa/22 with
>broadcast address 10.24.x.z to device bond0.2 ]
>Apr 01 11:05:21 [3538] node2.com       lrmd:   notice:
>operation_finished:    res_vip_start_0:4666:stderr [
>2020/04/01_11:05:21 INFO: Bringing device bond0.2 up ]
>Apr 01 11:05:21 [3538] node2.com       lrmd:   notice:
>operation_finished:    res_vip_start_0:4666:stderr [
>2020/04/01_11:05:21 INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p
>/var/run/resource-agents/send_arp-10.24.x.aa bond0.2 10.24.x.aa auto
>not_used not_used ]
>Apr 01 11:05:21 [3538] node2.com       lrmd:     info: log_finished:
>finished - rsc:res_vip action:start call_id:54 pid:4666 exit-code:0
>exec-time:482ms queue-time:0ms
>Apr 01 11:05:21 [3536] node2.com        cib:     info: crm_client_new:
>       Connecting xx6b40 for uid=0 gid=0 pid=4779 id=xx
>Apr 01 11:05:21 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.290.2)
>Apr 01 11:05:21 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='master-redis']:
>OK (rc=0, origin=local/crm_attribute/3, version=0.290.2)
>Apr 01 11:05:21 [3536] node2.com        cib:     info:
>crm_client_destroy:    Destroying 0 events
>Apr 01 11:05:21 [3541] node2.com       crmd:     info:
>services_os_action_execute:    Managed IPaddr2_meta-data_0 process
>4750 exited with rc=0
>Apr 01 11:05:21 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation res_vip_start_0 (call=54, rc=0,
>cib-update=42, confirmed=true) ok
>Apr 01 11:05:21 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation redis_monitor_1000 (call=56,
>rc=8, cib-update=43, confirmed=false) master
>Apr 01 11:05:21 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/42, version=0.290.3)
>Apr 01 11:05:21 [3541] node2.com       crmd:     info:
>match_graph_event:     Action res_vip_start_0 (6) confirmed on
>node2.com (rc=0)
>Apr 01 11:05:21 [3541] node2.com       crmd:   notice: te_rsc_command:
>       Initiating action 7: monitor res_vip_monitor_10000 on node2.com
>(local)
>Apr 01 11:05:21 [3541] node2.com       crmd:     info: do_lrm_rsc_op:
>       Performing key=xxx op=res_vip_monitor_10000
>Apr 01 11:05:21 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/43, version=0.290.4)
>Apr 01 11:05:21 [3541] node2.com       crmd:     info:
>match_graph_event:     Action redis_monitor_1000 (12) confirmed on
>node2.com (rc=0)
>Apr 01 11:05:22 [3541] node2.com       crmd:   notice:
>process_lrm_event:     LRM operation res_vip_monitor_10000 (call=60,
>rc=0, cib-update=44, confirmed=false) ok
>Apr 01 11:05:22 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_modify operation for section
>status: OK (rc=0, origin=local/crmd/44, version=0.290.5)
>Apr 01 11:05:22 [3541] node2.com       crmd:     info:
>match_graph_event:     Action res_vip_monitor_10000 (7) confirmed on
>node2.com (rc=0)
>Apr 01 11:05:22 [3541] node2.com       crmd:   notice: run_graph:
>Transition 1 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>Source=/var/lib/pacemaker/pengine/pe-input-397.bz2): Complete
>Apr 01 11:05:22 [3541] node2.com       crmd:     info: do_log:
>FSA: Input I_TE_SUCCESS from notify_crmd() received in state
>S_TRANSITION_ENGINE
>Apr 01 11:05:22 [3541] node2.com       crmd:   notice:
>do_state_transition:   State transition S_TRANSITION_ENGINE -> S_IDLE
>[ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>Apr 01 11:05:23 [4840] node2.com pacemakerd:     info: crm_log_init:
>Changed active directory to /var/lib/heartbeat/cores/root
>Apr 01 11:05:23 [4840] node2.com pacemakerd:     info:
>crm_xml_cleanup:       Cleaning up memory from libxml2
>Apr 01 11:05:23 [3536] node2.com        cib:     info: crm_client_new:
>       Connecting xxb0 for uid=0 gid=0 pid=4878 id=xxxx
>Apr 01 11:05:23 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.290.5)
>Apr 01 11:05:23 [3536] node2.com        cib:     info:
>cib_process_request:   Completed cib_query operation for section
>//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='master-redis']:
>OK (rc=0, origin=local/crm_attribute/3, version=0.290.5)
>
>
>redis log
>
>3765:M 01 Apr 11:05:20.126 # Connection with master lost.
>3765:M 01 Apr 11:05:20.126 * Caching the disconnected master state.
>3765:M 01 Apr 11:05:20.126 * Discarding previously cached master state.
>3765:M 01 Apr 11:05:20.126 * MASTER MODE enabled (user request from
>'id=11 addr=/var/run/redis/redis.sock:0 fd=8 name= age=0 idle=0
>flags=U db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0
>omem=0 events=r cmd=slaveof')
>_______________________________________________
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/

Hi Steven,

For me it looks OK.
The only thing that worries me is the redis monitoring interval:
op monitor interval="1s" 

I think it's too short.

Actually the slowest part here is the power  fencing .

What are thhe values for 'token' and 'consensus' in the coorosync.conf?

Best Regards,
Sttrahil Nikolov


More information about the Users mailing list