[Pacemaker] Corosync over DHCP IP

Mon Feb 11 05:24:39 EST 2013

It is VM in the OpenStack. So we can't use static IP.
Right now investigating why interface become down.

Thank you!

2013/2/11 Viacheslav Biriukov <v.v.biriukov at gmail.com>

>
>
>
> 2013/2/11 Dan Frincu <df.cluster at gmail.com>
>
>> Hi,
>>
>> On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov
>> <v.v.biriukov at gmail.com> wrote:
>> > Hi guys,
>> >
>> > Got a tricky issue with Corosync and Pacemaker over DHCP IP address
>> using
>> > unicast. Corosync craches periodically.
>> >
>> > Packages are from centos 6 repos:
>> > corosync-1.4.1-7.el6_3.1.x86_64
>> > corosynclib-1.4.1-7.el6_3.1.x86_64
>> > pacemaker-cluster-libs-1.1.7-6.el6.x86_64
>> > pacemaker-libs-1.1.7-6.el6.x86_64
>> > pacemaker-cli-1.1.7-6.el6.x86_64
>> > pacemaker-1.1.7-6.el6.x86_64
>> >
>> >
>> > Logs
>> >
>> > Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
>> > Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new
>> > configuration.
>> > Feb 10 07:56:22 corosync [TOTEM ] The network interface is down.
>>
>> This ^^^ is your problem. Corosync doesn't like it, see
>>
>> https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface
>>
>> Normally DHCP shouldn't take the interface down. Also, since changing
>> the network configuration in corosync means restarting it, why not go
>> with static IP's?
>>
>> HTH,
>> Dan
>>
>> > Feb 10 07:56:24 corosync [TOTEM ] The network interface [172.17.0.104]
>> is
>> > now up.
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error:
>> cfg_connection_destroy:
>> > Connection destroyed
>> > Feb 10 07:56:25 [5251] host1       crmd:    error: ais_dispatch:
>> > Receiving message body failed: (2) Library error: Resource temporarily
>> > unavailable (11)
>> > Feb 10 07:56:25 [5246] host1        cib:    error: ais_dispatch:
>> > Receiving message body failed: (2) Library error: Resource temporarily
>> > unavailable (11)
>> > Feb 10 07:56:25 [5249] host1      attrd:    error: ais_dispatch:
>> > Receiving message body failed: (2) Library error: Resource temporarily
>> > unavailable (11)
>> > Feb 10 07:56:25 [5251] host1       crmd:    error: ais_dispatch:
>> AIS
>> > connection failed
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error:
>> cpg_connection_destroy:
>> > Connection destroyed
>> > Feb 10 07:56:25 [5246] host1        cib:    error: ais_dispatch:
>> AIS
>> > connection failed
>> > Feb 10 07:56:25 [5251] host1       crmd:     info: crmd_ais_destroy:
>> > connection closed
>> > Feb 10 07:56:25 [5249] host1      attrd:    error: ais_dispatch:
>> AIS
>> > connection failed
>> > Feb 10 07:56:25 [5247] host1 stonith-ng:    error: ais_dispatch:
>> > Receiving message body failed: (2) Library error: Resource temporarily
>> > unavailable (11)
>> > Feb 10 07:56:25 [5246] host1        cib:    error: cib_ais_destroy:
>>  AIS
>> > connection terminated
>> > Feb 10 07:56:25 [5249] host1      attrd:     crit: attrd_ais_destroy:
>>  Lost
>> > connection to OpenAIS service!
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:   notice: pcmk_shutdown_worker:
>> > Shuting down Pacemaker
>> > Feb 10 07:56:25 [5247] host1 stonith-ng:    error: ais_dispatch:
>> AIS
>> > connection failed
>> > Feb 10 07:56:25 [5249] host1      attrd:   notice: main:
>> Exiting...
>> > Feb 10 07:56:25 [5247] host1 stonith-ng:    error:
>> stonith_peer_ais_destroy:
>> > AIS connection terminated
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:   notice: stop_child:
>> > Stopping crmd: Sent -15 to process 5251
>> > Feb 10 07:56:25 [5249] host1      attrd:    error:
>> > attrd_cib_connection_destroy:       Connection to the CIB terminated...
>> > Feb 10 07:56:25 [5251] host1       crmd:     info: crm_signal_dispatch:
>> > Invoking handler for signal 15: Terminated
>> > Feb 10 07:56:25 [5251] host1       crmd:   notice: crm_shutdown:
>> > Requesting shutdown, upper limit is 1200000ms
>> > Feb 10 07:56:25 [5251] host1       crmd:     info: do_shutdown_req:
>> > Sending shutdown request to host2
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
>>  Child
>> > process stonith-ng exited (pid=5247, rc=1)
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:
>> IPC
>> > Channel to 5249 is not connected
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:
>> IPC
>> > Channel to 5246 is not connected
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:
>> IPC
>> > Channel to 5247 is not connected
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
>> > Sending message via cpg FAILED: (rc=9) Bad handle
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
>>  Child
>> > process cib exited (pid=5246, rc=1)
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
>> > Sending message via cpg FAILED: (rc=9) Bad handle
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
>>  Child
>> > process attrd exited (pid=5249, rc=1)
>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
>> > Sending message via cpg FAILED: (rc=9) Bad handle
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: send_ais_text:
>> > Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection
>> timed
>> > out (110)
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: do_log:     FSA:
>> Input
>> > I_ERROR from do_shutdown_req() received in state S_NOT_DC
>> > Feb 10 07:56:27 [5251] host1       crmd:   notice: do_state_transition:
>> > State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR
>> cause=C_FSA_INTERNAL
>> > origin=do_shutdown_req ]
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: do_recover:
>> > Action A_RECOVER (0000000001000000) not supported
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: do_log:     FSA:
>> Input
>> > I_TERMINATE from do_recover() received in state S_RECOVERY
>> > Feb 10 07:56:27 [5251] host1       crmd:   notice: do_state_transition:
>> > State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
>> > cause=C_FSA_INTERNAL origin=do_recover ]
>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_shutdown:
>> > Disconnecting STONITH...
>> > Feb 10 07:56:27 [5251] host1       crmd:     info:
>> > tengine_stonith_connection_destroy:         Fencing daemon disconnected
>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation
>> monitor[25]
>> > on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its
>> parameters:
>> > CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[20000]
>> > CRM_meta_interval=[5000] ip=[172.24.0.104]  cancelled
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: verify_stopped:
>> > Resource P_SESSION_IP was active at shutdown.  You may ignore this
>> error if
>> > it is unmanaged.
>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_lrm_control:
>> > Disconnected from the LRM
>> > Feb 10 07:56:27 [5251] host1       crmd:   notice:
>> terminate_ais_connection:
>> > Disconnecting from AIS
>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_ha_control:
>> > Disconnected from OpenAIS
>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_cib_control:
>> > Disconnecting CIB
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: send_ipc_message:
>> IPC
>> > Channel to 5246 is not connected
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: send_ipc_message:
>> IPC
>> > Channel to 5246 is not connected
>> > Feb 10 07:56:27 [5251] host1       crmd:    error:
>> > cib_native_perform_op_delegate:     Sending message to CIB service
>> FAILED
>> > Feb 10 07:56:27 [5251] host1       crmd:     info:
>> > crmd_cib_connection_destroy:        Connection to the CIB terminated...
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: verify_stopped:
>> > Resource P_SESSION_IP was active at shutdown.  You may ignore this
>> error if
>> > it is unmanaged.
>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_exit:
>>  Performing
>> > A_EXIT_0 - gracefully exiting the CRMd
>> > Feb 10 07:56:27 [5251] host1       crmd:    error: do_exit:    Could not
>> > recover from internal error
>> > Feb 10 07:56:27 [5251] host1       crmd:     info: free_mem:   Dropping
>> > I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
>> > Feb 10 07:56:27 [5251] host1       crmd:     info: crm_xml_cleanup:
>> > Cleaning up memory from libxml2
>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_exit:    [crmd]
>> > stopped (2)
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:    error: pcmk_child_exit:
>>  Child
>> > process crmd exited (pid=5251, rc=2)
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:  warning: send_ipc_message:
>> IPC
>> > Channel to 5251 is not connected
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
>> > Sending message via cpg FAILED: (rc=9) Bad handle
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:   notice: stop_child:
>> > Stopping pengine: Sent -15 to process 5250
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:     info: pcmk_child_exit:
>>  Child
>> > process pengine exited (pid=5250, rc=0)
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
>> > Sending message via cpg FAILED: (rc=9) Bad handle
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:   notice: stop_child:
>> > Stopping lrmd: Sent -15 to process 5248
>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:     info: pcmk_child_exit:
>>  Child
>> > process lrmd exited (pid=5248, rc=0)
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
>> > Sending message via cpg FAILED: (rc=9) Bad handle
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:   notice: pcmk_shutdown_worker:
>> > Shutdown complete
>> > Feb 10 07:56:27 [5242] host1 pacemakerd:     info: main:       Exiting
>> > pacemakerd
>> >
>> >
>> > corosync.conf:
>> >
>> > compatibility: whitetank
>> >
>> > totem {
>> >         version: 2
>> >         secauth: off
>> >         nodeid: 104
>> >         interface {
>> >                 member {
>> >                         memberaddr: 172.17.0.104
>> >                 }
>> >                 member {
>> >                         memberaddr: 172.17.0.105
>> >                 }
>> >                 ringnumber: 0
>> >                 bindnetaddr: 172.17.0.0
>> >                 mcastport: 5426
>> >                 ttl: 1
>> >         }
>> >         transport: udpu
>> > }
>> >
>> > logging {
>> >         fileline: off
>> >         to_logfile: yes
>> >         to_syslog: yes
>> >         debug: on
>> >         logfile: /var/log/cluster/corosync.log
>> >         debug: off
>> >         timestamp: on
>> >         logger_subsys {
>> >                 subsys: AMF
>> >                 debug: off
>> >         }
>> > }
>> > service {
>> >        # Load the Pacemaker Cluster Resource Manager
>> >        ver:       1
>> >        name:      pacemaker
>> > }
>> >
>> > aisexec {
>> >        user:   root
>> >        group:  root
>> > }
>> >
>> >
>> >
>> > Thank you!
>> >
>> > --
>> > Viacheslav Biriukov
>> > BR
>> > http://biriukov.me
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>>
>>
>> --
>> Dan Frincu
>> CCNA, RHCE
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> Viacheslav Biriukov
> BR
> http://biriukov.me
>

-- 
Viacheslav Biriukov
BR
http://biriukov.me
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130211/659a4066/attachment-0003.html>