[Pacemaker] Corosync over DHCP IP

Sun Feb 10 07:24:07 EST 2013

Hi guys,

Got a tricky issue with Corosync and Pacemaker over DHCP IP address using
unicast. Corosync craches periodically.

Packages are from centos 6 repos:
corosync-1.4.1-7.el6_3.1.x86_64
corosynclib-1.4.1-7.el6_3.1.x86_64
pacemaker-cluster-libs-1.1.7-6.el6.x86_64
pacemaker-libs-1.1.7-6.el6.x86_64
pacemaker-cli-1.1.7-6.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64

*Logs*

Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new
configuration.
Feb 10 07:56:22 corosync [TOTEM ] The network interface is down.
Feb 10 07:56:24 corosync [TOTEM ] The network interface [172.17.0.104] is
now up.
Feb 10 07:56:25 [5242] host1 pacemakerd:    error: cfg_connection_destroy:
    Connection destroyed
Feb 10 07:56:25 [5251] host1       crmd:    error: ais_dispatch:
Receiving message body failed: (2) Library error: Resource temporarily
unavailable (11)
Feb 10 07:56:25 [5246] host1        cib:    error: ais_dispatch:
Receiving message body failed: (2) Library error: Resource temporarily
unavailable (11)
Feb 10 07:56:25 [5249] host1      attrd:    error: ais_dispatch:
Receiving message body failed: (2) Library error: Resource temporarily
unavailable (11)
Feb 10 07:56:25 [5251] host1       crmd:    error: ais_dispatch:       AIS
connection failed
Feb 10 07:56:25 [5242] host1 pacemakerd:    error: cpg_connection_destroy:
    Connection destroyed
Feb 10 07:56:25 [5246] host1        cib:    error: ais_dispatch:       AIS
connection failed
Feb 10 07:56:25 [5251] host1       crmd:     info: crmd_ais_destroy:
connection closed
Feb 10 07:56:25 [5249] host1      attrd:    error: ais_dispatch:       AIS
connection failed
Feb 10 07:56:25 [5247] host1 stonith-ng:    error: ais_dispatch:
Receiving message body failed: (2) Library error: Resource temporarily
unavailable (11)
Feb 10 07:56:25 [5246] host1        cib:    error: cib_ais_destroy:    AIS
connection terminated
Feb 10 07:56:25 [5249] host1      attrd:     crit: attrd_ais_destroy:  Lost
connection to OpenAIS service!
Feb 10 07:56:25 [5242] host1 pacemakerd:   notice: pcmk_shutdown_worker:
    Shuting down Pacemaker
Feb 10 07:56:25 [5247] host1 stonith-ng:    error: ais_dispatch:       AIS
connection failed
Feb 10 07:56:25 [5249] host1      attrd:   notice: main:       Exiting...
Feb 10 07:56:25 [5247] host1 stonith-ng:    error:
stonith_peer_ais_destroy:   AIS connection terminated
Feb 10 07:56:25 [5242] host1 pacemakerd:   notice: stop_child:
Stopping crmd: Sent -15 to process 5251
Feb 10 07:56:25 [5249] host1      attrd:    error:
attrd_cib_connection_destroy:       Connection to the CIB terminated...
Feb 10 07:56:25 [5251] host1       crmd:     info: crm_signal_dispatch:
   Invoking handler for signal 15: Terminated
Feb 10 07:56:25 [5251] host1       crmd:   notice: crm_shutdown:
Requesting shutdown, upper limit is 1200000ms
Feb 10 07:56:25 [5251] host1       crmd:     info: do_shutdown_req:
 Sending shutdown request to host2
Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
 Child process stonith-ng exited (pid=5247, rc=1)
Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:   IPC
Channel to 5249 is not connected
Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:   IPC
Channel to 5246 is not connected
Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:   IPC
Channel to 5247 is not connected
Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
Sending message via cpg FAILED: (rc=9) Bad handle
Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
 Child process cib exited (pid=5246, rc=1)
Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
Sending message via cpg FAILED: (rc=9) Bad handle
Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
 Child process attrd exited (pid=5249, rc=1)
Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
Sending message via cpg FAILED: (rc=9) Bad handle
Feb 10 07:56:27 [5251] host1       crmd:    error: send_ais_text:
 Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection
timed out (110)
Feb 10 07:56:27 [5251] host1       crmd:    error: do_log:     FSA: Input
I_ERROR from do_shutdown_req() received in state S_NOT_DC
Feb 10 07:56:27 [5251] host1       crmd:   notice: do_state_transition:
   State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR
cause=C_FSA_INTERNAL origin=do_shutdown_req ]
Feb 10 07:56:27 [5251] host1       crmd:    error: do_recover:
Action A_RECOVER (0000000001000000) not supported
Feb 10 07:56:27 [5251] host1       crmd:    error: do_log:     FSA: Input
I_TERMINATE from do_recover() received in state S_RECOVERY
Feb 10 07:56:27 [5251] host1       crmd:   notice: do_state_transition:
   State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
cause=C_FSA_INTERNAL origin=do_recover ]
Feb 10 07:56:27 [5251] host1       crmd:     info: do_shutdown:
 Disconnecting STONITH...
Feb 10 07:56:27 [5251] host1       crmd:     info:
tengine_stonith_connection_destroy:         Fencing daemon disconnected
Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation monitor[25]
on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its parameters:
CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[20000]
CRM_meta_interval=[5000] ip=[172.24.0.104]  cancelled
Feb 10 07:56:27 [5251] host1       crmd:    error: verify_stopped:
Resource P_SESSION_IP was active at shutdown.  You may ignore this error if
it is unmanaged.
Feb 10 07:56:27 [5251] host1       crmd:     info: do_lrm_control:
Disconnected from the LRM
Feb 10 07:56:27 [5251] host1       crmd:   notice:
terminate_ais_connection:   Disconnecting from AIS
Feb 10 07:56:27 [5251] host1       crmd:     info: do_ha_control:
 Disconnected from OpenAIS
Feb 10 07:56:27 [5251] host1       crmd:     info: do_cib_control:
Disconnecting CIB
Feb 10 07:56:27 [5251] host1       crmd:    error: send_ipc_message:   IPC
Channel to 5246 is not connected
Feb 10 07:56:27 [5251] host1       crmd:    error: send_ipc_message:   IPC
Channel to 5246 is not connected
Feb 10 07:56:27 [5251] host1       crmd:    error:
cib_native_perform_op_delegate:     Sending message to CIB service FAILED
Feb 10 07:56:27 [5251] host1       crmd:     info:
crmd_cib_connection_destroy:        Connection to the CIB terminated...
Feb 10 07:56:27 [5251] host1       crmd:    error: verify_stopped:
Resource P_SESSION_IP was active at shutdown.  You may ignore this error if
it is unmanaged.
Feb 10 07:56:27 [5251] host1       crmd:     info: do_exit:    Performing
A_EXIT_0 - gracefully exiting the CRMd
Feb 10 07:56:27 [5251] host1       crmd:    error: do_exit:    Could not
recover from internal error
Feb 10 07:56:27 [5251] host1       crmd:     info: free_mem:   Dropping
I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
Feb 10 07:56:27 [5251] host1       crmd:     info: crm_xml_cleanup:
 Cleaning up memory from libxml2
Feb 10 07:56:27 [5251] host1       crmd:     info: do_exit:    [crmd]
stopped (2)
Feb 10 07:56:27 [5242] host1 pacemakerd:    error: pcmk_child_exit:
 Child process crmd exited (pid=5251, rc=2)
Feb 10 07:56:27 [5242] host1 pacemakerd:  warning: send_ipc_message:   IPC
Channel to 5251 is not connected
Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
Sending message via cpg FAILED: (rc=9) Bad handle
Feb 10 07:56:27 [5242] host1 pacemakerd:   notice: stop_child:
Stopping pengine: Sent -15 to process 5250
Feb 10 07:56:27 [5242] host1 pacemakerd:     info: pcmk_child_exit:
 Child process pengine exited (pid=5250, rc=0)
Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
Sending message via cpg FAILED: (rc=9) Bad handle
Feb 10 07:56:27 [5242] host1 pacemakerd:   notice: stop_child:
Stopping lrmd: Sent -15 to process 5248
Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down
Feb 10 07:56:27 [5242] host1 pacemakerd:     info: pcmk_child_exit:
 Child process lrmd exited (pid=5248, rc=0)
Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
Sending message via cpg FAILED: (rc=9) Bad handle
Feb 10 07:56:27 [5242] host1 pacemakerd:   notice: pcmk_shutdown_worker:
    Shutdown complete
Feb 10 07:56:27 [5242] host1 pacemakerd:     info: main:       Exiting
pacemakerd

*corosync.conf:*

compatibility: whitetank

totem {
        version: 2
        secauth: off
        nodeid: 104
        interface {
                member {
                        memberaddr: 172.17.0.104
                }
                member {
                        memberaddr: 172.17.0.105
                }
                ringnumber: 0
                bindnetaddr: 172.17.0.0
                mcastport: 5426
                ttl: 1
        }
        transport: udpu
}

logging {
        fileline: off
        to_logfile: yes
        to_syslog: yes
        debug: on
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}
service {
       # Load the Pacemaker Cluster Resource Manager
       ver:       1
       name:      pacemaker
}

aisexec {
       user:   root
       group:  root
}

Thank you!

-- 
Viacheslav Biriukov
BR
http://biriukov.me
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130210/628ecc22/attachment-0002.html>