[ClusterLabs] Failover When Host is Up, Out of Order Logs

Corey Moullas cmoullas at emak.tech
Tue Jan 31 12:44:30 EST 2017


I have been getting extremely strange behavior from a Corosync/Pacemaker install on OVH Public Cloud servers.

After hours of Googling, I thought I would try posting here to see if somebody knows what to do.

I see this in my logs very frequently:
Jan 31 10:31:36 [581] fusion01-2 corosync warning [MAIN  ] Corosync main process was not scheduled for 24334.5645 ms (threshold is 2400.0000 ms). Consider token timeout increase.
Jan 31 10:31:36 [581] fusion01-2 corosync notice  [TOTEM ] A processor failed, forming new configuration.

I have increased token time to 10s and this still occurs regularly even though both hosts are always up.

There are also times when the floating IP script is fired, but corosync does not seem aware. When you run crm status it will show the ip being bound the fusion01-1 when in fact the script fired and moved it to fusion01-2.

Finally, the timing of the logs seems very odd. This is what the end of my corosync log file looks like. Notice the times appear out of order in the logs. I'm ripping my hair out with these issues. Anybody have a clue what may be going on here?

Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_process_request:               Completed cib_modify operation for section status: OK (rc=0, origin=fusion01-1/crmd/245, version=0.81.123)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        Diff: --- 0.81.123 2
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        Diff: +++ 0.81.124 (null)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        -- /cib/status/node_state[@id='1']/lrm[@id='1']
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        +  /cib:  @num_updates=124
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_process_request:               Completed cib_delete operation for section //node_state[@uname='fusion01-1']/lrm: OK (rc=0, origin=fusion01-1/crmd/246, version=0.81.124)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        Diff: --- 0.81.124 2
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        Diff: +++ 0.81.125 (null)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        +  /cib:  @num_updates=125
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        +  /cib/status/node_state[@id='1']:  @crm-debug-origin=do_lrm_query_internal
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++ /cib/status/node_state[@id='1']:  <lrm id="1"/>
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                     <lrm_resources>
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                       <lrm_resource id="FloatIP" type="floatip-ocf" class="ocf" provider="ovh">
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                         <lrm_rsc_op id="FloatIP_last_0" operation_key="FloatIP_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="5:31:0:08c3f481-ccde-4f75-b1a7-acf8168cd0c1" transition-magic="0:0;5:31:0:08c3f481-ccde-4f75-b1a7-acf8168cd0c1" on_node="fusion01-1" call-id="17" rc-code="0" op-status="0" interval="0" last-run="1485859189" last-rc-change="1485859189" e
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                       </lrm_resource>
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                       <lrm_resource id="FS" type="FSSofia" class="lsb">
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                         <lrm_rsc_op id="FS_last_failure_0" operation_key="FS_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="4:1:7:1fe20aa3-b305-4282-99a3-b1f8190d3c2c" transition-magic="0:0;4:1:7:1fe20aa3-b305-4282-99a3-b1f8190d3c2c" on_node="fusion01-1" call-id="9" rc-code="0" op-status="0" interval="0" last-run="1485760597" last-rc-change="1485760597" ex
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                         <lrm_rsc_op id="FS_last_0" operation_key="FS_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="7:31:0:08c3f481-ccde-4f75-b1a7-acf8168cd0c1" transition-magic="0:0;7:31:0:08c3f481-ccde-4f75-b1a7-acf8168cd0c1" on_node="fusion01-1" call-id="18" rc-code="0" op-status="0" interval="0" last-run="1485859191" last-rc-change="1485859191" exec-time="
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                         <lrm_rsc_op id="FS_monitor_1000" operation_key="FS_monitor_1000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="1:31:0:08c3f481-ccde-4f75-b1a7-acf8168cd0c1" transition-magic="0:0;1:31:0:08c3f481-ccde-4f75-b1a7-acf8168cd0c1" on_node="fusion01-1" call-id="19" rc-code="0" op-status="0" interval="1000" last-rc-change="1485859191" exec-time="20" qu
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                       </lrm_resource>
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                     </lrm_resources>
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        ++                                   </lrm>
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_process_request:               Completed cib_modify operation for section status: OK (rc=0, origin=fusion01-1/crmd/247, version=0.81.125)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_process_request:               Completed cib_modify operation for section nodes: OK (rc=0, origin=fusion01-1/crmd/250, version=0.81.125)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        Diff: --- 0.81.125 2
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        Diff: +++ 0.81.126 (null)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        +  /cib:  @num_updates=126
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        +  /cib/status/node_state[@id='2']:  @crm-debug-origin=do_state_transition
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        +  /cib/status/node_state[@id='1']:  @crm-debug-origin=do_state_transition
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_process_request:               Completed cib_modify operation for section status: OK (rc=0, origin=fusion01-1/crmd/251, version=0.81.126)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_process_request:               Completed cib_modify operation for section cib: OK (rc=0, origin=fusion01-1/crmd/252, version=0.81.126)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_process_request:               Completed cib_modify operation for section status: OK (rc=0, origin=fusion01-1/attrd/16, version=0.81.126)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        Diff: --- 0.81.126 2
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        Diff: +++ 0.81.127 (null)
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_perform_op:        +  /cib:  @num_updates=127
Jan 31 10:31:47 [21062] fusion01-2        cib:     info: cib_process_request:               Completed cib_modify operation for section status: OK (rc=0, origin=fusion01-1/attrd/17, version=0.81.127)
Jan 31 10:31:52 [21062] fusion01-2        cib:     info: cib_process_ping:     Reporting our current digest to fusion01-1: 8f48b8c10ce54828f87b27bea1b50d20 for 0.81.127 (0x7ff1485dadc0 0)
Jan 31 10:32:16 [21067] fusion01-2       crmd:     info: throttle_send_command:   New throttle mode: 0000 (was 0010)
Jan 31 10:31:36 [581] fusion01-2 corosync warning [MAIN  ] Corosync main process was not scheduled for 24334.5645 ms (threshold is 2400.0000 ms). Consider token timeout increase.
Jan 31 10:31:36 [581] fusion01-2 corosync notice  [TOTEM ] A processor failed, forming new configuration.
Jan 31 10:31:36 [581] fusion01-2 corosync warning [MAIN  ] Corosync main process was not scheduled for 9222.6709 ms (threshold is 2400.0000 ms). Consider token timeout increase.
Jan 31 10:31:36 [581] fusion01-2 corosync notice  [TOTEM ] A new membership (192.168.128.21:6120) was formed. Members left: 1
Jan 31 10:31:36 [581] fusion01-2 corosync notice  [TOTEM ] Failed to receive the leave message. failed: 1
Jan 31 10:31:36 [581] fusion01-2 corosync notice  [QUORUM] Members[1]: 2
Jan 31 10:31:36 [581] fusion01-2 corosync notice  [MAIN  ] Completed service synchronization, ready to provide service.
Jan 31 10:31:46 [581] fusion01-2 corosync notice  [TOTEM ] A new membership (192.168.128.20:6128) was formed. Members joined: 1
Jan 31 10:31:46 [581] fusion01-2 corosync notice  [QUORUM] Members[2]: 1 2
Jan 31 10:31:46 [581] fusion01-2 corosync notice  [MAIN  ] Completed service synchronization, ready to provide service.

Cheers,
Corey Moullas

Network Administrator
EMAK|TECH

T: (514)-4000-226 x101
E: cmoullas at emak.tech<mailto:cmoullas at emak.tech>

IMPORTANT: For all support-related requests, please email support at emak.tech<mailto:support at emak.tech> / Pour toute assistance technique, veuillez envoyer un courriel à support at emak.tech<mailto:support at emak.tech>.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170131/0a7754a6/attachment-0002.html>


More information about the Users mailing list