[Pacemaker] Pacemaker not failing over correctly (DRDB/Heartbeat/Pacemaker/Mysql) on Centos 5.5

Brian Cavanagh brian at designedtoscale.com
Thu Feb 3 17:44:30 EST 2011


Yeah I tried that one, hasn't worked out for me.  Complains the format of
the return value is not valid

If I get this right that init script is incompatible with the drbd script
Feb 03 17:11:36 mdb3 lrmd: [3047]: ERROR: (raexecocf.c:execra:178) execl
failed for /usr/lib/ocf/resource.d//linbit/drbd: Exec format error


Thanks, any additional comments would be great

Some more details

tail -n 30 /var/log/heartbeat.log
Feb 03 17:11:35 mdb3 lrmd: [2901]: info: rsc:ip2arp:10: stop
Feb 03 17:11:35 mdb3 crmd: [2904]: info: do_lrm_rsc_op: Performing
key=62:1:0:be346eea-a04e-43ab-955e-313256315b8b op=drbd_mysql:1_notify_0 )
Feb 03 17:11:35 mdb3 lrmd: [2901]: info: rsc:drbd_mysql:1:11: notify
Feb 03 17:11:35 mdb3 lrmd: [3030]: ERROR: (raexecocf.c:execra:178) execl
failed for /usr/lib/ocf/resource.d//linbit/drbd: Exec format error
Feb 03 17:11:35 mdb3 lrmd: [2901]: WARN: mapped the invalid return code
254.
Feb 03 17:11:35 mdb3 crmd: [2904]: info: process_lrm_event: LRM operation
drbd_mysql:1_notify_0 (call=11, rc=0, cib-update=17, confirmed=true) ok
SendArp[3028]:    2011/02/03_17:11:35 INFO: SendArp for
192.168.162.12/eth0:0
released
SendArp[3029]:    2011/02/03_17:11:35 INFO: SendArp for
97.107.136.62/eth0:2
released
Feb 03 17:11:35 mdb3 crmd: [2904]: info: process_lrm_event: LRM operation
ip1arp_stop_0 (call=9, rc=0, cib-update=18, confirmed=true) ok
Feb 03 17:11:35 mdb3 crmd: [2904]: info: process_lrm_event: LRM operation
ip2arp_stop_0 (call=10, rc=0, cib-update=19, confirmed=true) ok
Feb 03 17:11:35 mdb3 attrd: [2903]: info: attrd_perform_update: Sent
update 11: probe_complete=true
Feb 03 17:11:35 mdb3 attrd: [2903]: info: attrd_trigger_update: Sending
flush op to all hosts for: probe_complete (true)
Feb 03 17:11:35 mdb3 attrd: [2903]: info: attrd_perform_update: Sent
update 14: probe_complete=true
Feb 03 17:11:36 mdb3 attrd: [2903]: info: attrd_ha_callback: flush message
from mdb4
Feb 03 17:11:36 mdb3 attrd: [2903]: info: find_hash_entry: Creating hash
entry for fail-count-drbd_mysql:0
Feb 03 17:11:36 mdb3 crmd: [2904]: info: do_lrm_rsc_op: Performing
key=2:1:0:be346eea-a04e-43ab-955e-313256315b8b op=drbd_mysql:1_stop_0 )
Feb 03 17:11:36 mdb3 lrmd: [2901]: info: rsc:drbd_mysql:1:12: stop
Feb 03 17:11:36 mdb3 lrmd: [3047]: ERROR: (raexecocf.c:execra:178) execl
failed for /usr/lib/ocf/resource.d//linbit/drbd: Exec format error
Feb 03 17:11:36 mdb3 lrmd: [2901]: WARN: mapped the invalid return code
254.
Feb 03 17:11:36 mdb3 crmd: [2904]: info: process_lrm_event: LRM operation
drbd_mysql:1_stop_0 (call=12, rc=1, cib-update=20, confirmed=true) unknown
error
Feb 03 17:11:36 mdb3 attrd: [2903]: info: attrd_ha_callback: flush message
from mdb4
Feb 03 17:11:36 mdb3 attrd: [2903]: info: find_hash_entry: Creating hash
entry for last-failure-drbd_mysql:0
Feb 03 17:11:38 mdb3 attrd: [2903]: info: attrd_ha_callback: Update
relayed from mdb4
Feb 03 17:11:38 mdb3 attrd: [2903]: info: find_hash_entry: Creating hash
entry for fail-count-drbd_mysql:1
Feb 03 17:11:38 mdb3 attrd: [2903]: info: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-drbd_mysql:1 (INFINITY)
Feb 03 17:11:38 mdb3 attrd: [2903]: info: attrd_perform_update: Sent
update 19: fail-count-drbd_mysql:1=INFINITY
Feb 03 17:11:38 mdb3 attrd: [2903]: info: attrd_ha_callback: Update
relayed from mdb4
Feb 03 17:11:38 mdb3 attrd: [2903]: info: find_hash_entry: Creating hash
entry for last-failure-drbd_mysql:1
Feb 03 17:11:38 mdb3 attrd: [2903]: info: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-drbd_mysql:1 (1296771097)
Feb 03 17:11:38 mdb3 attrd: [2903]: info: attrd_perform_update: Sent
update 22: last-failure-drbd_mysql:1=1296771097



crm_status

Online: [ mdb4 mdb3 ]

Master/Slave Set: ms_drbd_mysql
     drbd_mysql:0    (ocf::linbit:drbd):     Slave mdb4 (unmanaged) FAILED
     drbd_mysql:1    (ocf::linbit:drbd):     Slave mdb3 (unmanaged) FAILED

Failed actions:
    drbd_mysql:0_monitor_0 (node=mdb4, call=2, rc=1, status=complete):
unknown error
    drbd_mysql:0_stop_0 (node=mdb4, call=12, rc=1, status=complete):
unknown error
    drbd_mysql:1_monitor_0 (node=mdb3, call=2, rc=1, status=complete):
unknown error
    drbd_mysql:1_stop_0 (node=mdb3, call=12, rc=1, status=complete):
unknown error



If I get this right that init script is incompatible with the drbd script
Feb 03 17:11:36 mdb3 lrmd: [3047]: ERROR: (raexecocf.c:execra:178) execl
failed for /usr/lib/ocf/resource.d//linbit/drbd: Exec format error



On 2/3/11 5:19 PM, "Lars Ellenberg" <lars.ellenberg at linbit.com> wrote:

>On Fri, Jan 28, 2011 at 02:22:22PM -0500, Brian Cavanagh wrote:
>> Hi, 
>> 
>> I am having this issue where it appears that everything is working
>> correctly, but when I simulate failure the failover fails to work
>>correctly.
>> the Migrate command works fine, I can transfer the service, and the
>>error I
>> get when a node is put into standby or a server goes down is
>> 
>> Any help would be greatly appreciated
>
>Someone in #linux-ha just pointed us to this thread,
>so I thought it should not go unanswered.
>
>You are using Pacemaker version
>
>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>
>and your DRBD RA complains about
>
>> Jan 28 12:20:37 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:65: stop
>> Jan 28 12:20:37 mdb3 drbd[9631]: ERROR: you really should enable notify
>>when using this RA
>
>And now you are complaining about: but I _do_ have notify enabled.
>
>Well, sure you do.
>
>But since "stop" is a special action, pacemaker decided to treat it's
>environment a little bit too special, and now the sanity check of
>the DRBD RA, which should have prevented you from starting it without
>notify enabled now fails only on stop, as there the environment suddenly
>was different than expected.
>
>This was fixed since, I believe it was fixed in Pacemaker, (and should
>be released with 1.1.5), but was more robustly coded in the DRBD RA as
>well.
>
>So you can just upgrade your DRBD (which should provide you with the
>updated resource agent as well), or, if you prefer, just grab the
>resource agent script itself as a drop-in replacement.
>
>http://git.drbd.org/?p=drbd-8.3.git;a=blob_plain;f=scripts/drbd.ocf
>
>
>-- 
>: Lars Ellenberg
>: LINBIT | Your Way to High Availability
>: DRBD/HA support and consulting http://www.linbit.com
>
>DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: 
>http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker






More information about the Pacemaker mailing list