[Pacemaker] failover problem with pacemaker & drbd

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Aug 14 11:48:11 EDT 2009


Hi,

On Fri, Aug 14, 2009 at 04:14:40PM +0100, Dave Whitehouse wrote:
> I can't see any stonith primitives in your config. I would guess that
> the remaining peer is detecting the loss of it's peer and initiating a
> Stonith action. Since there is nothing to stonith the lost peer then
> there will be no confirmation to the remaining peer that the failed node
> is dead. Until this confirmation is received I'm pretty sure that the
> failover won't happen.

True unless stonith-enabled is set to false which is the case
here. Still, the only way to figure out what's going on is to
look at the logs. Also, the CRM won't start resources on nodes
where score is -infinity.

Thanks,

Dejan

> From: Gerry kernan [mailto:gerry.kernan at infinityit.ie] 
> Sent: 13 August 2009 14:50
> To: pacemaker at oss.clusterlabs.org
> Subject: [Pacemaker] failover problem with pacemaker & drbd
> 
>  
> 
> Hi
> 
>  
> 
> I have setup 2 servers so that I can replicate a filesystem between both
> servers using drbd. I configured a drbd , filesystem, IPaddress, pingd
> resources, I also have an lsb resource to start icobol.
> 
> I can stop & start the resource group & migrate the resource group
> between servers using pacemaker GUI. But if I power down or take one of
> the servers of the network the resource group doesn't fail over to the
> other node.
> 
> Hopefully someone can point out to me where I have make a mistake or not
> configured sometime.
> 
>  
> 
>  
> 
> . Pacemaker config, drbd.conf & openais.conf are below
> 
>  
> 
>  
> 
> node host1.localdomain
> 
> node host2.localdomain \
> 
>         attributes standby="false"
> 
> primitive res_drbd_credit heartbeat:drbddisk \
> 
>         operations $id="res_drbd_credit-operations" \
> 
>         op monitor interval="15" timeout="15" start-delay="15" \
> 
>         params 1="credit" \
> 
>         meta $id="res_drbd_credit-meta_attributes"
> 
> primitive res_filesystem_credit ocf:heartbeat:Filesystem \
> 
>         meta $id="res_filesystem_credit-meta_attributes" \
> 
>         operations $id="res_filesystem_credit-operations" \
> 
>         op monitor interval="20" timeout="40" start-delay="10" \
> 
>         params device="/dev/drbd0" directory="/credit" fstype="ext3"
> 
> primitive res_icobol_credit lsb:icobol \
> 
>         meta is-managed="true" \
> 
>         operations $id="res_icobol_credit-operations" \
> 
>         op monitor interval="15" timeout="15" start-delay="15"
> 
> primitive res_ip_credit ocf:heartbeat:IPaddr2 \
> 
>         meta $id="res_ip_credit-meta_attributes" \
> 
>         operations $id="res_ip_credit-operations" \
> 
>         op monitor interval="10s" timeout="20s" start-delay="5s" \
> 
>         params ip="192.168.200.1" cidr_netmask="255.255.255.0"
> 
> primitive res_pingd ocf:pacemaker:pingd \
> 
>         operations $id="res_pingd-operations" \
> 
>         op monitor interval="10" timeout="20" start-delay="1m" \
> 
>         params host_list="192.168.200.7"
> 
> group grp_credit res_drbd_credit res_filesystem_credit res_ip_credit
> res_icobol_credit res_pingd \
> 
>         meta target-role="started"
> 
> location cli-prefer-grp_credit grp_credit \
> 
>         rule $id="cli-prefer-rule-grp_credit" inf: #uname eq
> host2.localdomain
> 
> location cli-prefer-res_icobol_credit res_icobol_credit \
> 
>         rule $id="cli-prefer-rule-res_icobol_credit" inf: #uname eq
> host1.localdomain
> 
> location cli-standby-grp_credit grp_credit \
> 
>         rule $id="cli-standby-rule-grp_credit" -inf: #uname eq
> host1.localdomain
> 
> colocation loc_grp_credit inf: res_filesystem_credit res_drbd_credit
> 
> colocation loc_icobol inf: res_icobol_credit res_ip_credit
> 
> colocation loc_ip inf: res_ip_credit res_filesystem_credit
> 
> property $id="cib-bootstrap-options" \
> 
>         dc-version="1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa" \
> 
>         cluster-infrastructure="openais" \
> 
>         expected-quorum-votes="2" \
> 
>         last-lrm-refresh="1250158583" \
> 
>         node-health-red="0" \
> 
>         stonith-enabled="false" \
> 
>         default-resource-stickiness="200" \
> 
>         no-quorum-policy="ignore" \
> 
>         stonith-action="poweroff"
> 
>  
> 
>  
> 
>  
> 
> [root at host1 ~]# cat /etc/drbd.conf
> 
> #
> 
> # please have a a look at the example configuration file in
> 
> # /usr/share/doc/packages/drbd/drbd.conf
> 
> #
> 
> global {
> 
>         usage-count yes;
> 
> }
> 
> common {
> 
>         protocol C;
> 
> }
> 
> resource credit {
> 
>         device /dev/drbd0;
> 
>         meta-disk internal;
> 
>         disk /dev/cciss/c0d0p5;
> 
>  on host1.localdomain {
> 
>         address 10.100.100.1:7789;
> 
> }
> 
>  on host2.localdomain {
> 
>         address 10.100.100.2:7789;
> 
>  }
> 
> handlers {
> 
>  
> 
>         split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> 
> }
> 
>  
> 
> }
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> # Please read the openais.conf.5 manual page
> 
>  
> 
> aisexec {
> 
>         # Run as root - this is necessary to be able to manage resources
> with Pa
> 
> cemaker
> 
>         user:   root
> 
>         group:  root
> 
> }
> 
>  
> 
> service {
> 
>         # Load the Pacemaker Cluster Resource Manager
> 
>         ver:       0
> 
>         name:      pacemaker
> 
>         use_mgmtd: yes
> 
>         use_logd:  yes
> 
> }
> 
>  
> 
> totem {
> 
>         version: 2
> 
>  
> 
>         # How long before declaring a token lost (ms)
> 
>         token:          5000
> 
>  
> 
>         # How many token retransmits before forming a new configuration
> 
>         token_retransmits_before_loss_const: 10
> 
>  
> 
>         # How long to wait for join messages in the membership protocol
> (ms)
> 
>         join:           1000
> 
>  
> 
>         # How long to wait for consensus to be achieved before starting
> a new round of membership configuration (ms)
> 
>         consensus:      2500
> 
>  
> 
>         # Turn off the virtual synchrony filter
> 
>         vsftype:        none
> 
>  
> 
>         # Number of messages that may be sent by one processor on
> receipt of thetoken
> 
>         max_messages:   20
> 
>  
> 
>         # Stagger sending the node join messages by 1..send_join ms
> 
>         send_join: 45
> 
>  
> 
>         # Limit generated nodeids to 31-bits (positive signed integers)
> 
>         clear_node_high_bit: yes
> 
>  
> 
>         # Disable encryption
> 
>         secauth:        on
> 
>  
> 
>         # How many threads to use for encryption/decryption
> 
>         threads:        0
> 
>  
> 
>         # Optionally assign a fixed node id (integer)
> 
>         # nodeid:         1234
> 
>         rrp_mode:    active
> 
>         interface {
> 
>                 ringnumber: 0
> 
>                 bindnetaddr: 192.168.200.0
> 
>                 mcastaddr: 239.0.0.42
> 
>                 mcastport: 5405
> 
>         }
> 
>         interface {
> 
>                 ringnumber: 1
> 
>                 bindnetaddr: 10.100.100.0
> 
>                 mcastaddr: 239.0.0.43
> 
>                 mcastport: 5405
> 
>         }
> 
>  
> 
> }
> 
> logging {
> 
>         debug: on
> 
>         fileline: off
> 
>         to_syslog: yes
> 
>         to_stderr: off
> 
>         syslog_facility: daemon
> 
>         timestamp: on
> 
> }
> 
>  
> 
> amf {
> 
>         mode: disabled
> 
> }
> 
>  
> 
>  
> 
>  
> 
>  
> 
> Best regards,
> 
>  
> 
> Gerry kernan
> 
> Infinity Integration technology
> 
> Suite 17 The mall Beacon Court
> 
> Sandyford 
> 
> Dublin 18
> 
>  
> 
> www.infinityit.ie
> 
>  
> 
> P. +35312930090
> 
> F. +35312930137
>  
> Please consider the environment before printing this email.
> 
> Find out more about Talis at www.talis.com 
> 
> shared innovationTM
> 
> Any views or personal opinions expressed within this email may not be those of Talis Information Ltd or its employees. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited.
> 
> Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB.



> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker





More information about the Pacemaker mailing list