[Pacemaker] Corosync over DHCP IP

Mon Feb 11 06:21:44 EST 2013

We need solution for something like VIP for our MySQL servers (for example)
with auto migration when something go wrong. If you have a better solution
– please suggest.
Talking about dynamic IP addresses: it is not important for us. After boot
(not every day) we reconfigure cluster using maintenance mode in the
pacemaker.

2013/2/11 Andrew Beekhof <andrew at beekhof.net>

> On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov
> <v.v.biriukov at gmail.com> wrote:
> > It is VM in the OpenStack. So we can't use static IP.
> > Right now investigating why interface become down.
>
> Even if you solve that, dynamic IP addresses are fundamentally
> incompatible with cluster software.
> You're effectively trying to create a cluster out of nodes which
> change their name every time they boot.
>
> >
> > Thank you!
> >
> >
> > 2013/2/11 Viacheslav Biriukov <v.v.biriukov at gmail.com>
> >>
> >>
> >>
> >>
> >> 2013/2/11 Dan Frincu <df.cluster at gmail.com>
> >>>
> >>> Hi,
> >>>
> >>> On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov
> >>> <v.v.biriukov at gmail.com> wrote:
> >>> > Hi guys,
> >>> >
> >>> > Got a tricky issue with Corosync and Pacemaker over DHCP IP address
> >>> > using
> >>> > unicast. Corosync craches periodically.
> >>> >
> >>> > Packages are from centos 6 repos:
> >>> > corosync-1.4.1-7.el6_3.1.x86_64
> >>> > corosynclib-1.4.1-7.el6_3.1.x86_64
> >>> > pacemaker-cluster-libs-1.1.7-6.el6.x86_64
> >>> > pacemaker-libs-1.1.7-6.el6.x86_64
> >>> > pacemaker-cli-1.1.7-6.el6.x86_64
> >>> > pacemaker-1.1.7-6.el6.x86_64
> >>> >
> >>> >
> >>> > Logs
> >>> >
> >>> > Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25:
> monitor
> >>> > Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new
> >>> > configuration.
> >>> > Feb 10 07:56:22 corosync [TOTEM ] The network interface is down.
> >>>
> >>> This ^^^ is your problem. Corosync doesn't like it, see
> >>>
> >>>
> https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface
> >>>
> >>> Normally DHCP shouldn't take the interface down. Also, since changing
> >>> the network configuration in corosync means restarting it, why not go
> >>> with static IP's?
> >>>
> >>> HTH,
> >>> Dan
> >>>
> >>> > Feb 10 07:56:24 corosync [TOTEM ] The network interface
> [172.17.0.104]
> >>> > is
> >>> > now up.
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error:
> >>> > cfg_connection_destroy:
> >>> > Connection destroyed
> >>> > Feb 10 07:56:25 [5251] host1       crmd:    error: ais_dispatch:
> >>> > Receiving message body failed: (2) Library error: Resource
> temporarily
> >>> > unavailable (11)
> >>> > Feb 10 07:56:25 [5246] host1        cib:    error: ais_dispatch:
> >>> > Receiving message body failed: (2) Library error: Resource
> temporarily
> >>> > unavailable (11)
> >>> > Feb 10 07:56:25 [5249] host1      attrd:    error: ais_dispatch:
> >>> > Receiving message body failed: (2) Library error: Resource
> temporarily
> >>> > unavailable (11)
> >>> > Feb 10 07:56:25 [5251] host1       crmd:    error: ais_dispatch:
> >>> > AIS
> >>> > connection failed
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error:
> >>> > cpg_connection_destroy:
> >>> > Connection destroyed
> >>> > Feb 10 07:56:25 [5246] host1        cib:    error: ais_dispatch:
> >>> > AIS
> >>> > connection failed
> >>> > Feb 10 07:56:25 [5251] host1       crmd:     info: crmd_ais_destroy:
> >>> > connection closed
> >>> > Feb 10 07:56:25 [5249] host1      attrd:    error: ais_dispatch:
> >>> > AIS
> >>> > connection failed
> >>> > Feb 10 07:56:25 [5247] host1 stonith-ng:    error: ais_dispatch:
> >>> > Receiving message body failed: (2) Library error: Resource
> temporarily
> >>> > unavailable (11)
> >>> > Feb 10 07:56:25 [5246] host1        cib:    error: cib_ais_destroy:
> >>> > AIS
> >>> > connection terminated
> >>> > Feb 10 07:56:25 [5249] host1      attrd:     crit: attrd_ais_destroy:
> >>> > Lost
> >>> > connection to OpenAIS service!
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:   notice:
> >>> > pcmk_shutdown_worker:
> >>> > Shuting down Pacemaker
> >>> > Feb 10 07:56:25 [5247] host1 stonith-ng:    error: ais_dispatch:
> >>> > AIS
> >>> > connection failed
> >>> > Feb 10 07:56:25 [5249] host1      attrd:   notice: main:
> >>> > Exiting...
> >>> > Feb 10 07:56:25 [5247] host1 stonith-ng:    error:
> >>> > stonith_peer_ais_destroy:
> >>> > AIS connection terminated
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:   notice: stop_child:
> >>> > Stopping crmd: Sent -15 to process 5251
> >>> > Feb 10 07:56:25 [5249] host1      attrd:    error:
> >>> > attrd_cib_connection_destroy:       Connection to the CIB
> terminated...
> >>> > Feb 10 07:56:25 [5251] host1       crmd:     info:
> crm_signal_dispatch:
> >>> > Invoking handler for signal 15: Terminated
> >>> > Feb 10 07:56:25 [5251] host1       crmd:   notice: crm_shutdown:
> >>> > Requesting shutdown, upper limit is 1200000ms
> >>> > Feb 10 07:56:25 [5251] host1       crmd:     info: do_shutdown_req:
> >>> > Sending shutdown request to host2
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
> >>> > Child
> >>> > process stonith-ng exited (pid=5247, rc=1)
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:
> >>> > IPC
> >>> > Channel to 5249 is not connected
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:
> >>> > IPC
> >>> > Channel to 5246 is not connected
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:  warning: send_ipc_message:
> >>> > IPC
> >>> > Channel to 5247 is not connected
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
> >>> > Sending message via cpg FAILED: (rc=9) Bad handle
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
> >>> > Child
> >>> > process cib exited (pid=5246, rc=1)
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
> >>> > Sending message via cpg FAILED: (rc=9) Bad handle
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: pcmk_child_exit:
> >>> > Child
> >>> > process attrd exited (pid=5249, rc=1)
> >>> > Feb 10 07:56:25 [5242] host1 pacemakerd:    error: send_cpg_message:
> >>> > Sending message via cpg FAILED: (rc=9) Bad handle
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: send_ais_text:
> >>> > Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection
> >>> > timed
> >>> > out (110)
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: do_log:     FSA:
> >>> > Input
> >>> > I_ERROR from do_shutdown_req() received in state S_NOT_DC
> >>> > Feb 10 07:56:27 [5251] host1       crmd:   notice:
> do_state_transition:
> >>> > State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR
> >>> > cause=C_FSA_INTERNAL
> >>> > origin=do_shutdown_req ]
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: do_recover:
> >>> > Action A_RECOVER (0000000001000000) not supported
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: do_log:     FSA:
> >>> > Input
> >>> > I_TERMINATE from do_recover() received in state S_RECOVERY
> >>> > Feb 10 07:56:27 [5251] host1       crmd:   notice:
> do_state_transition:
> >>> > State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
> >>> > cause=C_FSA_INTERNAL origin=do_recover ]
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_shutdown:
> >>> > Disconnecting STONITH...
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info:
> >>> > tengine_stonith_connection_destroy:         Fencing daemon
> disconnected
> >>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation
> >>> > monitor[25]
> >>> > on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its
> >>> > parameters:
> >>> > CRM_meta_name=[monitor] crm_feature_set=[3.0.6]
> >>> > CRM_meta_timeout=[20000]
> >>> > CRM_meta_interval=[5000] ip=[172.24.0.104]  cancelled
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: verify_stopped:
> >>> > Resource P_SESSION_IP was active at shutdown.  You may ignore this
> >>> > error if
> >>> > it is unmanaged.
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_lrm_control:
> >>> > Disconnected from the LRM
> >>> > Feb 10 07:56:27 [5251] host1       crmd:   notice:
> >>> > terminate_ais_connection:
> >>> > Disconnecting from AIS
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_ha_control:
> >>> > Disconnected from OpenAIS
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_cib_control:
> >>> > Disconnecting CIB
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: send_ipc_message:
> >>> > IPC
> >>> > Channel to 5246 is not connected
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: send_ipc_message:
> >>> > IPC
> >>> > Channel to 5246 is not connected
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error:
> >>> > cib_native_perform_op_delegate:     Sending message to CIB service
> >>> > FAILED
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info:
> >>> > crmd_cib_connection_destroy:        Connection to the CIB
> terminated...
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: verify_stopped:
> >>> > Resource P_SESSION_IP was active at shutdown.  You may ignore this
> >>> > error if
> >>> > it is unmanaged.
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_exit:
> >>> > Performing
> >>> > A_EXIT_0 - gracefully exiting the CRMd
> >>> > Feb 10 07:56:27 [5251] host1       crmd:    error: do_exit:    Could
> >>> > not
> >>> > recover from internal error
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info: free_mem:
> Dropping
> >>> > I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop
> ]
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info: crm_xml_cleanup:
> >>> > Cleaning up memory from libxml2
> >>> > Feb 10 07:56:27 [5251] host1       crmd:     info: do_exit:    [crmd]
> >>> > stopped (2)
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:    error: pcmk_child_exit:
> >>> > Child
> >>> > process crmd exited (pid=5251, rc=2)
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:  warning: send_ipc_message:
> >>> > IPC
> >>> > Channel to 5251 is not connected
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
> >>> > Sending message via cpg FAILED: (rc=9) Bad handle
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:   notice: stop_child:
> >>> > Stopping pengine: Sent -15 to process 5250
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:     info: pcmk_child_exit:
> >>> > Child
> >>> > process pengine exited (pid=5250, rc=0)
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
> >>> > Sending message via cpg FAILED: (rc=9) Bad handle
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:   notice: stop_child:
> >>> > Stopping lrmd: Sent -15 to process 5248
> >>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:     info: pcmk_child_exit:
> >>> > Child
> >>> > process lrmd exited (pid=5248, rc=0)
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:    error: send_cpg_message:
> >>> > Sending message via cpg FAILED: (rc=9) Bad handle
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:   notice:
> >>> > pcmk_shutdown_worker:
> >>> > Shutdown complete
> >>> > Feb 10 07:56:27 [5242] host1 pacemakerd:     info: main:
> Exiting
> >>> > pacemakerd
> >>> >
> >>> >
> >>> > corosync.conf:
> >>> >
> >>> > compatibility: whitetank
> >>> >
> >>> > totem {
> >>> >         version: 2
> >>> >         secauth: off
> >>> >         nodeid: 104
> >>> >         interface {
> >>> >                 member {
> >>> >                         memberaddr: 172.17.0.104
> >>> >                 }
> >>> >                 member {
> >>> >                         memberaddr: 172.17.0.105
> >>> >                 }
> >>> >                 ringnumber: 0
> >>> >                 bindnetaddr: 172.17.0.0
> >>> >                 mcastport: 5426
> >>> >                 ttl: 1
> >>> >         }
> >>> >         transport: udpu
> >>> > }
> >>> >
> >>> > logging {
> >>> >         fileline: off
> >>> >         to_logfile: yes
> >>> >         to_syslog: yes
> >>> >         debug: on
> >>> >         logfile: /var/log/cluster/corosync.log
> >>> >         debug: off
> >>> >         timestamp: on
> >>> >         logger_subsys {
> >>> >                 subsys: AMF
> >>> >                 debug: off
> >>> >         }
> >>> > }
> >>> > service {
> >>> >        # Load the Pacemaker Cluster Resource Manager
> >>> >        ver:       1
> >>> >        name:      pacemaker
> >>> > }
> >>> >
> >>> > aisexec {
> >>> >        user:   root
> >>> >        group:  root
> >>> > }
> >>> >
> >>> >
> >>> >
> >>> > Thank you!
> >>> >
> >>> > --
> >>> > Viacheslav Biriukov
> >>> > BR
> >>> > http://biriukov.me
> >>> >
> >>> > _______________________________________________
> >>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>> >
> >>> > Project Home: http://www.clusterlabs.org
> >>> > Getting started:
> >>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> > Bugs: http://bugs.clusterlabs.org
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Dan Frincu
> >>> CCNA, RHCE
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >>
> >>
> >> --
> >> Viacheslav Biriukov
> >> BR
> >> http://biriukov.me
> >
> >
> >
> >
> > --
> > Viacheslav Biriukov
> > BR
> > http://biriukov.me
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
Viacheslav Biriukov
BR
http://biriukov.me
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130211/cbee4e6b/attachment-0003.html>