<div dir="ltr"><div style>We need solution for something like VIP for our MySQL servers (for example) with auto migration when something go wrong. If you have a better solution – please suggest.</div><div style>Talking about dynamic IP addresses: it is not important for us. After boot (not every day) we reconfigure cluster using maintenance mode in the pacemaker. </div>
<div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/2/11 Andrew Beekhof <span dir="ltr"><<a href="mailto:andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov<br>
<<a href="mailto:v.v.biriukov@gmail.com">v.v.biriukov@gmail.com</a>> wrote:<br>
> It is VM in the OpenStack. So we can't use static IP.<br>
> Right now investigating why interface become down.<br>
<br>
</div>Even if you solve that, dynamic IP addresses are fundamentally<br>
incompatible with cluster software.<br>
You're effectively trying to create a cluster out of nodes which<br>
change their name every time they boot.<br>
<div class="HOEnZb"><div class="h5"><br>
><br>
> Thank you!<br>
><br>
><br>
> 2013/2/11 Viacheslav Biriukov <<a href="mailto:v.v.biriukov@gmail.com">v.v.biriukov@gmail.com</a>><br>
>><br>
>><br>
>><br>
>><br>
>> 2013/2/11 Dan Frincu <<a href="mailto:df.cluster@gmail.com">df.cluster@gmail.com</a>><br>
>>><br>
>>> Hi,<br>
>>><br>
>>> On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov<br>
>>> <<a href="mailto:v.v.biriukov@gmail.com">v.v.biriukov@gmail.com</a>> wrote:<br>
>>> > Hi guys,<br>
>>> ><br>
>>> > Got a tricky issue with Corosync and Pacemaker over DHCP IP address<br>
>>> > using<br>
>>> > unicast. Corosync craches periodically.<br>
>>> ><br>
>>> > Packages are from centos 6 repos:<br>
>>> > corosync-1.4.1-7.el6_3.1.x86_64<br>
>>> > corosynclib-1.4.1-7.el6_3.1.x86_64<br>
>>> > pacemaker-cluster-libs-1.1.7-6.el6.x86_64<br>
>>> > pacemaker-libs-1.1.7-6.el6.x86_64<br>
>>> > pacemaker-cli-1.1.7-6.el6.x86_64<br>
>>> > pacemaker-1.1.7-6.el6.x86_64<br>
>>> ><br>
>>> ><br>
>>> > Logs<br>
>>> ><br>
>>> > Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor<br>
>>> > Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new<br>
>>> > configuration.<br>
>>> > Feb 10 07:56:22 corosync [TOTEM ] The network interface is down.<br>
>>><br>
>>> This ^^^ is your problem. Corosync doesn't like it, see<br>
>>><br>
>>> <a href="https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface" target="_blank">https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface</a><br>
>>><br>
>>> Normally DHCP shouldn't take the interface down. Also, since changing<br>
>>> the network configuration in corosync means restarting it, why not go<br>
>>> with static IP's?<br>
>>><br>
>>> HTH,<br>
>>> Dan<br>
>>><br>
>>> > Feb 10 07:56:24 corosync [TOTEM ] The network interface [172.17.0.104]<br>
>>> > is<br>
>>> > now up.<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error:<br>
>>> > cfg_connection_destroy:<br>
>>> > Connection destroyed<br>
>>> > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch:<br>
>>> > Receiving message body failed: (2) Library error: Resource temporarily<br>
>>> > unavailable (11)<br>
>>> > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch:<br>
>>> > Receiving message body failed: (2) Library error: Resource temporarily<br>
>>> > unavailable (11)<br>
>>> > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch:<br>
>>> > Receiving message body failed: (2) Library error: Resource temporarily<br>
>>> > unavailable (11)<br>
>>> > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch:<br>
>>> > AIS<br>
>>> > connection failed<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error:<br>
>>> > cpg_connection_destroy:<br>
>>> > Connection destroyed<br>
>>> > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch:<br>
>>> > AIS<br>
>>> > connection failed<br>
>>> > Feb 10 07:56:25 [5251] host1 crmd: info: crmd_ais_destroy:<br>
>>> > connection closed<br>
>>> > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch:<br>
>>> > AIS<br>
>>> > connection failed<br>
>>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch:<br>
>>> > Receiving message body failed: (2) Library error: Resource temporarily<br>
>>> > unavailable (11)<br>
>>> > Feb 10 07:56:25 [5246] host1 cib: error: cib_ais_destroy:<br>
>>> > AIS<br>
>>> > connection terminated<br>
>>> > Feb 10 07:56:25 [5249] host1 attrd: crit: attrd_ais_destroy:<br>
>>> > Lost<br>
>>> > connection to OpenAIS service!<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: notice:<br>
>>> > pcmk_shutdown_worker:<br>
>>> > Shuting down Pacemaker<br>
>>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch:<br>
>>> > AIS<br>
>>> > connection failed<br>
>>> > Feb 10 07:56:25 [5249] host1 attrd: notice: main:<br>
>>> > Exiting...<br>
>>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error:<br>
>>> > stonith_peer_ais_destroy:<br>
>>> > AIS connection terminated<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: stop_child:<br>
>>> > Stopping crmd: Sent -15 to process 5251<br>
>>> > Feb 10 07:56:25 [5249] host1 attrd: error:<br>
>>> > attrd_cib_connection_destroy: Connection to the CIB terminated...<br>
>>> > Feb 10 07:56:25 [5251] host1 crmd: info: crm_signal_dispatch:<br>
>>> > Invoking handler for signal 15: Terminated<br>
>>> > Feb 10 07:56:25 [5251] host1 crmd: notice: crm_shutdown:<br>
>>> > Requesting shutdown, upper limit is 1200000ms<br>
>>> > Feb 10 07:56:25 [5251] host1 crmd: info: do_shutdown_req:<br>
>>> > Sending shutdown request to host2<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit:<br>
>>> > Child<br>
>>> > process stonith-ng exited (pid=5247, rc=1)<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message:<br>
>>> > IPC<br>
>>> > Channel to 5249 is not connected<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message:<br>
>>> > IPC<br>
>>> > Channel to 5246 is not connected<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message:<br>
>>> > IPC<br>
>>> > Channel to 5247 is not connected<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message:<br>
>>> > Sending message via cpg FAILED: (rc=9) Bad handle<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit:<br>
>>> > Child<br>
>>> > process cib exited (pid=5246, rc=1)<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message:<br>
>>> > Sending message via cpg FAILED: (rc=9) Bad handle<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit:<br>
>>> > Child<br>
>>> > process attrd exited (pid=5249, rc=1)<br>
>>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message:<br>
>>> > Sending message via cpg FAILED: (rc=9) Bad handle<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ais_text:<br>
>>> > Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection<br>
>>> > timed<br>
>>> > out (110)<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA:<br>
>>> > Input<br>
>>> > I_ERROR from do_shutdown_req() received in state S_NOT_DC<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition:<br>
>>> > State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR<br>
>>> > cause=C_FSA_INTERNAL<br>
>>> > origin=do_shutdown_req ]<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_recover:<br>
>>> > Action A_RECOVER (0000000001000000) not supported<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA:<br>
>>> > Input<br>
>>> > I_TERMINATE from do_recover() received in state S_RECOVERY<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition:<br>
>>> > State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE<br>
>>> > cause=C_FSA_INTERNAL origin=do_recover ]<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_shutdown:<br>
>>> > Disconnecting STONITH...<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info:<br>
>>> > tengine_stonith_connection_destroy: Fencing daemon disconnected<br>
>>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation<br>
>>> > monitor[25]<br>
>>> > on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its<br>
>>> > parameters:<br>
>>> > CRM_meta_name=[monitor] crm_feature_set=[3.0.6]<br>
>>> > CRM_meta_timeout=[20000]<br>
>>> > CRM_meta_interval=[5000] ip=[172.24.0.104] cancelled<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped:<br>
>>> > Resource P_SESSION_IP was active at shutdown. You may ignore this<br>
>>> > error if<br>
>>> > it is unmanaged.<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_lrm_control:<br>
>>> > Disconnected from the LRM<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: notice:<br>
>>> > terminate_ais_connection:<br>
>>> > Disconnecting from AIS<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_ha_control:<br>
>>> > Disconnected from OpenAIS<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_cib_control:<br>
>>> > Disconnecting CIB<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message:<br>
>>> > IPC<br>
>>> > Channel to 5246 is not connected<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message:<br>
>>> > IPC<br>
>>> > Channel to 5246 is not connected<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error:<br>
>>> > cib_native_perform_op_delegate: Sending message to CIB service<br>
>>> > FAILED<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info:<br>
>>> > crmd_cib_connection_destroy: Connection to the CIB terminated...<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped:<br>
>>> > Resource P_SESSION_IP was active at shutdown. You may ignore this<br>
>>> > error if<br>
>>> > it is unmanaged.<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit:<br>
>>> > Performing<br>
>>> > A_EXIT_0 - gracefully exiting the CRMd<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_exit: Could<br>
>>> > not<br>
>>> > recover from internal error<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info: free_mem: Dropping<br>
>>> > I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info: crm_xml_cleanup:<br>
>>> > Cleaning up memory from libxml2<br>
>>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: [crmd]<br>
>>> > stopped (2)<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: pcmk_child_exit:<br>
>>> > Child<br>
>>> > process crmd exited (pid=5251, rc=2)<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: warning: send_ipc_message:<br>
>>> > IPC<br>
>>> > Channel to 5251 is not connected<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message:<br>
>>> > Sending message via cpg FAILED: (rc=9) Bad handle<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child:<br>
>>> > Stopping pengine: Sent -15 to process 5250<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit:<br>
>>> > Child<br>
>>> > process pengine exited (pid=5250, rc=0)<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message:<br>
>>> > Sending message via cpg FAILED: (rc=9) Bad handle<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child:<br>
>>> > Stopping lrmd: Sent -15 to process 5248<br>
>>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit:<br>
>>> > Child<br>
>>> > process lrmd exited (pid=5248, rc=0)<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message:<br>
>>> > Sending message via cpg FAILED: (rc=9) Bad handle<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice:<br>
>>> > pcmk_shutdown_worker:<br>
>>> > Shutdown complete<br>
>>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: main: Exiting<br>
>>> > pacemakerd<br>
>>> ><br>
>>> ><br>
>>> > corosync.conf:<br>
>>> ><br>
>>> > compatibility: whitetank<br>
>>> ><br>
>>> > totem {<br>
>>> > version: 2<br>
>>> > secauth: off<br>
>>> > nodeid: 104<br>
>>> > interface {<br>
>>> > member {<br>
>>> > memberaddr: 172.17.0.104<br>
>>> > }<br>
>>> > member {<br>
>>> > memberaddr: 172.17.0.105<br>
>>> > }<br>
>>> > ringnumber: 0<br>
>>> > bindnetaddr: 172.17.0.0<br>
>>> > mcastport: 5426<br>
>>> > ttl: 1<br>
>>> > }<br>
>>> > transport: udpu<br>
>>> > }<br>
>>> ><br>
>>> > logging {<br>
>>> > fileline: off<br>
>>> > to_logfile: yes<br>
>>> > to_syslog: yes<br>
>>> > debug: on<br>
>>> > logfile: /var/log/cluster/corosync.log<br>
>>> > debug: off<br>
>>> > timestamp: on<br>
>>> > logger_subsys {<br>
>>> > subsys: AMF<br>
>>> > debug: off<br>
>>> > }<br>
>>> > }<br>
>>> > service {<br>
>>> > # Load the Pacemaker Cluster Resource Manager<br>
>>> > ver: 1<br>
>>> > name: pacemaker<br>
>>> > }<br>
>>> ><br>
>>> > aisexec {<br>
>>> > user: root<br>
>>> > group: root<br>
>>> > }<br>
>>> ><br>
>>> ><br>
>>> ><br>
>>> > Thank you!<br>
>>> ><br>
>>> > --<br>
>>> > Viacheslav Biriukov<br>
>>> > BR<br>
>>> > <a href="http://biriukov.me" target="_blank">http://biriukov.me</a><br>
>>> ><br>
>>> > _______________________________________________<br>
>>> > Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
>>> > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
>>> ><br>
>>> > Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>> > Getting started:<br>
>>> > <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>> > Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>> ><br>
>>><br>
>>><br>
>>><br>
>>> --<br>
>>> Dan Frincu<br>
>>> CCNA, RHCE<br>
>>><br>
>>> _______________________________________________<br>
>>> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
>>> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
>>><br>
>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>><br>
>><br>
>><br>
>><br>
>> --<br>
>> Viacheslav Biriukov<br>
>> BR<br>
>> <a href="http://biriukov.me" target="_blank">http://biriukov.me</a><br>
><br>
><br>
><br>
><br>
> --<br>
> Viacheslav Biriukov<br>
> BR<br>
> <a href="http://biriukov.me" target="_blank">http://biriukov.me</a><br>
><br>
> _______________________________________________<br>
> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
<br>
_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Viacheslav Biriukov<br>BR<br><div><a href="http://biriukov.me" target="_blank">http://biriukov.me</a></div></div>
</div>