[ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Oct 20 05:10:07 EDT 2021


>>> Ian Diddams via Users <users at clusterlabs.org> schrieb am 20.10.2021 um 10:54 in
Nachricht <527856924.6994967.1634720079997 at mail.yahoo.com>:
> I've been testing an implementation of a HA mysql cluster for a few months 
> now. I came to this project with no preior knoweldge of what was 
> copncerned/needed and have learned orgainscally via various online how-tos 
> and web sites which many cases wrere slightly out-of-date to missing large 
> chunks of perinent information.  Thats not a criticism at all of those still 
> helpful aids, but more an indication of how there are huge holes in my 
> knowledge..
> 
> So with that background ...
> 
> The cluster consits of 2 centos7 servers (esterla and rafeiro) running 
> DRBD90
> corosync 2.4.5pacemaker 0.9.169
> On the whole its all running fine with some squeaks that we are hoping are 
> down to underlying SAN issues.
> 
>  However...
> earlier this week we had some split-brain issues - some of which seem to 
> have fixed themselves, others not.  What we did notice that whilst the 
> split-brain was being reported the overall cluster remained up (of course?) 

So you drive without safety-belt and airbag (read: fencing)?

> in that the VIP remained up, abnd the mysql instance remained abvailavle via 
> the VIP on port 3306. The underlying coincern being of course that had a 
> "flip" occurred from previous master to the previous slave, the new master's 
> drbd device (moun ted on /var/lib/mysql) may well be out of sync and thus 
> contain "old" data.
> 
> So - system logs recently show this
> 
> ESTRELAOct 18th
> Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> peer node
> Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> peer node

You said you have a SAN and you are using DRBD? Why?

> 
> Oct 19th
> Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 
> drbd0: Split-Brain detected but unresolved, dropping connection!
> Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 
> drbd0: Split-Brain detected but unresolved, dropping connection!
> 
> 
> RAFEIRO
> Oct 18
> Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> this node
> Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> this node
> 
> Oct 19
> Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 
> drbd0: Split-Brain detected but unresolved, dropping connection!
> Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 
> drbd0: Split-Brain detected but unresolved, dropping connection!
> 
> 
> 
> So on the 18th the split-brain issues was detected but (automatically?) 
> fixed.
> But on the 19th it wasnt...
> 
> Any ideas how to investigate why it worked on the 18th and not the 19th?  I 
> am presuming the drbd config is set up to automatically fix stuff but maybe 
> we just got lucky and it isnt?  (Ive googled automatic fixes but I am afarid 
> I cant follow what Im being told/reading :-(  )

I wondered where the cluster is in those logs.

> 
> drbd config below
> ta
> ian
> 
> ==================
> ESTRELAresource mysql01 {
>  protocol C;
>  meta-disk internal;
>  device /dev/drbd0;
>  disk   /dev/vg_mysql/lv_mysql;
>  handlers {
>   split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>  }
>  net {
>   allow-two-primaries no;
>   after-sb-0pri discard-zero-changes;
>   after-sb-1pri discard-secondary;
>   after-sb-2pri disconnect;
>   rr-conflict disconnect;
>  }
>  disk {
>   on-io-error detach;
>  }
>  syncer {
>   verify-alg sha1;
>  }
>  on estrela {
>   address  10.108.248.165:7789;
>  }
>  on rafeiro {
>   address  10.108.248.166:7789;
>  }
> }
> 
> 
> 
> RAFEIRO
> resource mysql01 {
>  protocol C;
>  meta-disk internal;
>  device /dev/drbd0;
>  disk   /dev/vg_mysql/lv_mysql;
>  handlers {
>   split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>  }
>  net {
>   allow-two-primaries no;
>   after-sb-0pri discard-zero-changes;
>   after-sb-1pri discard-secondary;
>   after-sb-2pri disconnect;
>   rr-conflict disconnect;
>  }
>  disk {
>   on-io-error detach;
>  }
>  syncer {
>   verify-alg sha1;
>  }
>  on estrela {
>   address  10.108.248.165:7789;
>  }
>  on rafeiro {
>   address  10.108.248.166:7789;
>  }
> }





More information about the Users mailing list