[ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Wed Oct 20 05:10:07 EDT 2021
>>> Ian Diddams via Users <users at clusterlabs.org> schrieb am 20.10.2021 um 10:54 in
Nachricht <527856924.6994967.1634720079997 at mail.yahoo.com>:
> I've been testing an implementation of a HA mysql cluster for a few months
> now. I came to this project with no preior knoweldge of what was
> copncerned/needed and have learned orgainscally via various online how-tos
> and web sites which many cases wrere slightly out-of-date to missing large
> chunks of perinent information. Thats not a criticism at all of those still
> helpful aids, but more an indication of how there are huge holes in my
> knowledge..
>
> So with that background ...
>
> The cluster consits of 2 centos7 servers (esterla and rafeiro) running
> DRBD90
> corosync 2.4.5pacemaker 0.9.169
> On the whole its all running fine with some squeaks that we are hoping are
> down to underlying SAN issues.
>
> However...
> earlier this week we had some split-brain issues - some of which seem to
> have fixed themselves, others not. What we did notice that whilst the
> split-brain was being reported the overall cluster remained up (of course?)
So you drive without safety-belt and airbag (read: fencing)?
> in that the VIP remained up, abnd the mysql instance remained abvailavle via
> the VIP on port 3306. The underlying coincern being of course that had a
> "flip" occurred from previous master to the previous slave, the new master's
> drbd device (moun ted on /var/lib/mysql) may well be out of sync and thus
> contain "old" data.
>
> So - system logs recently show this
>
> ESTRELAOct 18th
> Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from
> peer node
> Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from
> peer node
You said you have a SAN and you are using DRBD? Why?
>
> Oct 19th
> Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0
> drbd0: Split-Brain detected but unresolved, dropping connection!
> Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0
> drbd0: Split-Brain detected but unresolved, dropping connection!
>
>
> RAFEIRO
> Oct 18
> Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from
> this node
> Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from
> this node
>
> Oct 19
> Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0
> drbd0: Split-Brain detected but unresolved, dropping connection!
> Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0
> drbd0: Split-Brain detected but unresolved, dropping connection!
>
>
>
> So on the 18th the split-brain issues was detected but (automatically?)
> fixed.
> But on the 19th it wasnt...
>
> Any ideas how to investigate why it worked on the 18th and not the 19th? I
> am presuming the drbd config is set up to automatically fix stuff but maybe
> we just got lucky and it isnt? (Ive googled automatic fixes but I am afarid
> I cant follow what Im being told/reading :-( )
I wondered where the cluster is in those logs.
>
> drbd config below
> ta
> ian
>
> ==================
> ESTRELAresource mysql01 {
> protocol C;
> meta-disk internal;
> device /dev/drbd0;
> disk /dev/vg_mysql/lv_mysql;
> handlers {
> split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> }
> net {
> allow-two-primaries no;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> rr-conflict disconnect;
> }
> disk {
> on-io-error detach;
> }
> syncer {
> verify-alg sha1;
> }
> on estrela {
> address 10.108.248.165:7789;
> }
> on rafeiro {
> address 10.108.248.166:7789;
> }
> }
>
>
>
> RAFEIRO
> resource mysql01 {
> protocol C;
> meta-disk internal;
> device /dev/drbd0;
> disk /dev/vg_mysql/lv_mysql;
> handlers {
> split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> }
> net {
> allow-two-primaries no;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> rr-conflict disconnect;
> }
> disk {
> on-io-error detach;
> }
> syncer {
> verify-alg sha1;
> }
> on estrela {
> address 10.108.248.165:7789;
> }
> on rafeiro {
> address 10.108.248.166:7789;
> }
> }
More information about the Users
mailing list