[Pacemaker] DRBD Split Brain after each reboot

Mon Dec 21 15:55:37 EST 2009

Am Montag, 21. Dezember 2009 21:22:44 schrieb andschais at gmail.com:
> Hi all,
>
> I'm getting troubles with a Pacemaker+DRBD 2 nodes cluster. I am trying to
> solve it for about a week, I really need help!!!
> If I disconnect power cord the failover works great, resources migrate to
> secondary node and back to primary when I turn it on.
> But when turn off primary node with a "shutdown -r now" command, I always
> finish with a split brian. That's not all, If a put just a few resources
> (for example: virtual IP, DRBD, Apache and PostgreSQL) split brain does not
> take place, but at the moment I put 8 or 9 resources (specially when one of
> those resources is JBoss AS) I always get split brain...
> Can someone give me some hints?
>
> My systems are:
>
> OS: Debian Lenny 2.6.26-2-686
> Corosync 1.1.2
> DRBD 8.3.6
>
> And my configuration files are:
>
> /etc/corosync/corosync.conf
>
> # Please read the openais.conf.5 manual page
> totem {
>         version: 2
>         # How long before declaring a token lost (ms)
>         token: 3000
>         # How many token retransmits before forming a new configuration
>         token_retransmits_before_loss_const: 10
>         # How long to wait for join messages in the membership protocol
> (ms) join: 60
>         # How long to wait for consensus to be achieved before starting a
> new round of membership configuration (ms)
>         consensus: 1500
>         # Turn off the virtual synchrony filter
>         vsftype: none
>         # Number of messages that may be sent by one processor on receipt
> of the token
>         max_messages: 20
>         # Limit generated nodeids to 31-bits (positive signed integers)
>         clear_node_high_bit: yes
>         # Disable encryption
>         secauth: on
>         # How many threads to use for encryption/decryption
>         threads: 0
>         # Optionally assign a fixed node id (integer)
>         # nodeid: 1234
>         # This specifies the mode of redundant ring, which may be none,
> active, or passive.
>         rrp_mode: passive
>         interface {
>                 # The following values need to be set based on your
> environment
>                 ringnumber: 0
>                 bindnetaddr: 172.16.1.0
>                 mcastaddr: 226.94.1.1
>                 mcastport: 5405
>         }
>         interface {
>                 # The following values need to be set based on your
> environment
>                 ringnumber: 1
>                 bindnetaddr: 10.186.68.0
>                 mcastaddr: 226.94.2.1
>                 mcastport: 5405
>         }
> }
> amf {
>         mode: disabled
> }
> service {
>         # Load the Pacemaker Cluster Resource Manager
>         ver:       0
>         name:      pacemaker
> }
> aisexec {
>         user:   root
>         group:  root
> }
> logging {
>     to_stderr: yes
>     debug: on
>     timestamp: on
>     to_file: yes
>     logfile: /var/log/corosync.log
>     to_syslog: no
>     syslog_facility: daemon
> }
> }
>
>
> /etc/drbd.conf
>
> global {
>     usage-count yes;
> }
> common {
>     syncer { rate 33M; }
> }
> resource r0 {
>     protocol C;
>     handlers {
>        pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
>        pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
>        local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
> halt -f";
>        fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>        after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>        outdate-peer "/usr/lib/drbd/outdate-peer.sh";
>        split-brain "/usr/lib/drbd/notify-split-brain.sh root at localhost";
>     }
>     startup {
>         degr-wfc-timeout 30;
>         wfc-timeout 30;
>     }
>     disk {
>         fencing resource-only;
>         on-io-error   detach;
>     }
>     net {
>         after-sb-0pri disconnect;
>         after-sb-1pri disconnect;
>         after-sb-2pri disconnect;
>         rr-conflict disconnect;
>     }
>
>     on primary {
>         device     /dev/drbd0;
>         disk       /dev/vg00/drbd;
>         address    172.16.1.1:7788;
>         meta-disk  internal;
>     }
>     on secondary {
>         device     /dev/drbd0;
>         disk       /dev/vg00/drbd;
>         address    172.16.1.2:7788;
>         meta-disk  internal;
>     }
> }
>
>
> and my crm config
(...)

Have you ever tried with a SIMPLE configuration? Just two nodes and the master  
DRBD resource?

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: misch at multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42