[ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

Ian Diddams didds3 at yahoo.co.uk
Wed Oct 20 04:54:39 EDT 2021


I've been testing an implementation of a HA mysql cluster for a few months now. I came to this project with no preior knoweldge of what was copncerned/needed and have learned orgainscally via various online how-tos and web sites which many cases wrere slightly out-of-date to missing large chunks of perinent information.  Thats not a criticism at all of those still helpful aids, but more an indication of how there are huge holes in my knowledge..

So with that background ...

The cluster consits of 2 centos7 servers (esterla and rafeiro) running 
DRBD90
corosync 2.4.5pacemaker 0.9.169
On the whole its all running fine with some squeaks that we are hoping are down to underlying SAN issues.

 However...
earlier this week we had some split-brain issues - some of which seem to have fixed themselves, others not.  What we did notice that whilst the split-brain was being reported the overall cluster remained up (of course?) in that the VIP remained up, abnd the mysql instance remained abvailavle via the VIP on port 3306. The underlying coincern being of course that had a "flip" occurred from previous master to the previous slave, the new master's drbd device (moun ted on /var/lib/mysql) may well be out of sync and thus contain "old" data.

So - system logs recently show this

ESTRELAOct 18th
Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from peer node
Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from peer node

Oct 19th
Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 drbd0: Split-Brain detected but unresolved, dropping connection!
Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 drbd0: Split-Brain detected but unresolved, dropping connection!


RAFEIRO
Oct 18
Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from this node
Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from this node

Oct 19
Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 drbd0: Split-Brain detected but unresolved, dropping connection!
Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 drbd0: Split-Brain detected but unresolved, dropping connection!



So on the 18th the split-brain issues was detected but (automatically?) fixed.
But on the 19th it wasnt...

Any ideas how to investigate why it worked on the 18th and not the 19th?  I am presuming the drbd config is set up to automatically fix stuff but maybe we just got lucky and it isnt?  (Ive googled automatic fixes but I am afarid I cant follow what Im being told/reading :-(  )

drbd config below
ta
ian

==================
ESTRELAresource mysql01 {
 protocol C;
 meta-disk internal;
 device /dev/drbd0;
 disk   /dev/vg_mysql/lv_mysql;
 handlers {
  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 }
 net {
  allow-two-primaries no;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
  rr-conflict disconnect;
 }
 disk {
  on-io-error detach;
 }
 syncer {
  verify-alg sha1;
 }
 on estrela {
  address  10.108.248.165:7789;
 }
 on rafeiro {
  address  10.108.248.166:7789;
 }
}



RAFEIRO
resource mysql01 {
 protocol C;
 meta-disk internal;
 device /dev/drbd0;
 disk   /dev/vg_mysql/lv_mysql;
 handlers {
  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 }
 net {
  allow-two-primaries no;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
  rr-conflict disconnect;
 }
 disk {
  on-io-error detach;
 }
 syncer {
  verify-alg sha1;
 }
 on estrela {
  address  10.108.248.165:7789;
 }
 on rafeiro {
  address  10.108.248.166:7789;
 }
}






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20211020/595a7470/attachment.htm>


More information about the Users mailing list