[Pacemaker] Master won't get promoted

Charles Richard chachi.richard at gmail.com
Thu Sep 29 08:30:55 EDT 2011


Here it is attached.

I also see the following 2 errors in the node 2 logs which I assume mean the
problem is really that node1 is not getting demoted and I'm not sure why:

Error 1:
Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Called drbdadm -c
/etc/drbd.conf primary mysqld
Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Exit code 11
Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Command output:
Sep 28 19:53:20 staging2 lrmd: [1442]: info: RA output:
(drbd_mysql:1:promote:stdout)
Sep 28 19:53:22 staging2 lrmd: [1442]: info: RA output:
(drbd_mysql:1:promote:stderr) 0: State change failed: (-1) Multiple
primaries not allowed by config

Error 2:
Sep 28 19:53:27 staging2 kernel: d-con mysqld: Requested state change failed
by peer: Refusing to be Primary while peer is not outdated (-7)
Sep 28 19:53:27 staging2 kernel: d-con mysqld: peer( Primary -> Unknown )
conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk(
UpToDate -> DUnknown )
Sep 28 19:53:27 staging2 kernel: d-con mysqld: meta connection shut down by
peer.

Also, failover works fine if i reboot either machine.  The outdated machines
comes back up as secondary.  The scenario where i get the errors above is
when i pull the network cable from the primary.  Is that a stonith device
that should be protecting from this scenario and potentially rebooting the
primary?

Feels like I'm getting so close to getting this working!

Thanks!
Charles

On Thu, Sep 29, 2011 at 4:15 AM, Andrew Beekhof <andrew at beekhof.net> wrote:

> Could you attach  /var/lib/pengine/pe-input-3802.bz2 from staging1?
> That would tell us why.
>
> On Mon, Sep 26, 2011 at 10:28 PM, Charles Richard
> <chachi.richard at gmail.com> wrote:
> > Hi,
> >
> > I'm making some headway finally with my pacemaker install but now that
> > crm_mon doesn't return errors any more and crm_verify is clear, I'm
> having a
> > problem where my master won't get promoted.  Not sure what to do with
> this
> > one, any suggestions?   Here's the log snippet and config files:
> >
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: crm_timer_popped: PEngine
> > Recheck Timer (I_PE_CALC) just popped!
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State
> > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_TIMER_POPPED
> > origin=crm_timer_popped ]
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition:
> Progressed
> > to state S_POLICY_ENGINE after C_TIMER_POPPED
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: All 2
> > cluster nodes are eligible to run resources.
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke: Query 106:
> > Requesting the current CIB: S_POLICY_ENGINE
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke_callback:
> Invoking
> > the PE: query=106, ref=pe_calc-dc-1317020772-95, seq=2564, quorate=1
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Startup
> > probes: enabled
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: unpack_config: On loss
> of
> > CCM Quorum: Ignore
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Node
> scores:
> > 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_domains: Unpacking
> > domains
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status:
> > Node staging1.dev.applepeak.com is online
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status:
> > Node staging2.dev.applepeak.com is online
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: group_print:  Resource
> > Group: mysql
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:
> > fs_mysql#011(ocf::heartbeat:Filesystem):#011Stopped
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:
> > ip_mysql#011(ocf::heartbeat:IPaddr2):#011Stopped
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:
> > mysqld#011(lsb:mysqld):#011Stopped
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: clone_print:
> Master/Slave
> > Set: ms_drbd_mysql
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: short_print:
> Stopped:
> > [ drbd_mysql:0 drbd_mysql:1 ]
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color:
> ms_drbd_mysql:
> > Promoted 0 instances of a possible 1 to master
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights:
> > fs_mysql: Rolling back scores from ip_mysql
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights:
> > ip_mysql: Rolling back scores from mysqld
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color:
> ms_drbd_mysql:
> > Promoted 0 instances of a possible 1 to master
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave
> resource
> > fs_mysql#011(Stopped)
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave
> resource
> > ip_mysql#011(Stopped)
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave
> resource
> > mysqld#011(Stopped)
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave
> resource
> > drbd_mysql:0#011(Stopped)
> > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave
> resource
> > drbd_mysql:1#011(Stopped)
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State
> > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> > cause=C_IPC_MESSAGE origin=handle_response ]
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: unpack_graph: Unpacked
> > transition 72: 0 actions in 0 synapses
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_te_invoke: Processing
> graph
> > 72 (ref=pe_calc-dc-1317020772-95) derived from
> > /var/lib/pengine/pe-input-3802.bz2
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: run_graph:
> > ====================================================
> > Sep 26 04:06:12 staging1 crmd: [1686]: notice: run_graph: Transition 72
> > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pengine/pe-input-3802.bz2): Complete
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: te_graph_trigger: Transition
> 72
> > is now complete
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: notify_crmd: Transition 72
> > status: done - <null>
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State
> > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL origin=notify_crmd ]
> > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition:
> Starting
> > PEngine Recheck Timer
> > Sep 26 04:06:12 staging1 pengine: [1685]: info: process_pe_message:
> > Transition 72: PEngine Input stored in:
> /var/lib/pengine/pe-input-3802.bz2
> > Sep 26 04:15:09 staging1 cib: [1682]: info: cib_stats: Processed 1
> > operations (0.00us average, 0% utilization) in the last 10min
> >
> > My drbd config file:
> >
> > resource mysqld {
> >
> > protocol C;
> >
> > startup { wfc-timeout 0; degr-wfc-timeout 120; }
> >
> > disk { on-io-error detach; }
> >
> >
> > on staging1 {
> >
> > device /dev/drbd0;
> >
> > disk /dev/vg_staging1/lv_data;
> >
> > meta-disk internal;
> >
> > address 10.10.20.1:7788;
> >
> > }
> >
> > on staging2 {
> >
> > device /dev/drbd0;
> >
> > disk /dev/vg_staging2/lv_data;
> >
> > meta-disk internal;
> >
> > address 10.10.20.2:7788;
> >
> > }
> >
> > }
> >
> > corosync.conf:
> >
> > compatibility: whitetank
> >
> > aisexec {
> >   user: root
> >   group: root
> > }
> >
> > totem {
> >         version: 2
> >         secauth: off
> >         threads: 0
> >         interface {
> >                 ringnumber: 0
> >                 bindnetaddr: 10.10.10.0
> >                 mcastaddr: 226.94.1.1
> >                 mcastport: 5405
> >         }
> > }
> >
> > logging {
> >         fileline: off
> >         to_stderr: no
> >         to_logfile: no
> >         to_syslog: yes
> >         logfile: /var/log/cluster/corosync.log
> >         debug: off
> >         timestamp: on
> >         logger_subsys {
> >                 subsys: AMF
> >                 debug: off
> >         }
> > }
> >
> > amf {
> >         mode: disabled
> > }
> >
> > service {
> > #Load Pacemaker
> > name: pacemaker
> > ver: 0
> > use_mgmtd: yes
> > }
> >
> > And my crm config:
> >
> > node staging1.dev.applepeak.com
> > node staging2.dev.applepeak.com
> > primitive drbd_mysql ocf:linbit:drbd \
> >         params drbd_resource="mysqld" \
> >         op monitor interval="15s" \
> >         op start interval="0" timeout="240s" \
> >         op stop interval="0" timeout="100s"
> > primitive fs_mysql ocf:heartbeat:Filesystem \
> >         params device="/dev/drbd0" directory="/opt/data/mysql/data/mysql"
> > fstype="ext4" \
> >         op start interval="0" timeout="60s" \
> >         op stop interval="0" timeout="60s"
> > primitive ip_mysql ocf:heartbeat:IPaddr2 \
> >         params ip="10.10.10.31" nic="eth0"
> > primitive mysqld lsb:mysqld
> > group mysql fs_mysql ip_mysql mysqld
> > ms ms_drbd_mysql drbd_mysql \
> >         meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> > order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
> >         cluster-infrastructure="openais" \
> >         expected-quorum-votes="2" \
> >         stonith-enabled="false" \
> >         last-lrm-refresh="1316961847" \
> >         stop-all-resources="true" \
> >         no-quorum-policy="ignore"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="100"
> >
> > Thanks,
> > Charles
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> >
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110929/bccdc54c/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-warn-3802.bz2
Type: application/x-bzip2
Size: 2467 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110929/bccdc54c/attachment-0003.bz2>


More information about the Pacemaker mailing list