[ClusterLabs] DRBD Split brain

Антон Сацкий satskiy.a at gmail.com
Tue Dec 12 08:30:05 EST 2017


Hi list
Need your help.
Got 2  servers use Pacemaker  Corosync Drbd

[root at voipserver ~]# pcs config
Cluster Name: ClusterKrusher
Corosync Nodes:
 voipserver.primary voipserver.backup
Pacemaker Nodes:
 voipserver.backup voipserver.primary

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=32 ip=172.20.11.10
  Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
              start interval=0s timeout=20s (ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
 Master: WebDataClone
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
clone-node-max=1
  Resource: WebData (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=r0
   Operations: demote interval=0s timeout=90 (WebData-demote-interval-0s)
               monitor interval=60s (WebData-monitor-interval-60s)
               promote interval=0s timeout=90 (WebData-promote-interval-0s)
               start interval=0s timeout=240 (WebData-start-interval-0s)
               stop interval=0s timeout=100 (WebData-stop-interval-0s)
 Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/replica fstype=ext3
  Operations: monitor interval=20 timeout=40 (WebFS-monitor-interval-20)
              start interval=0s timeout=60 (WebFS-start-interval-0s)
              stop interval=0s timeout=60 (WebFS-stop-interval-0s)
 Resource: Asterisk (class=lsb type=asterisk)
  Operations: monitor interval=15 timeout=15 (Asterisk-monitor-interval-15)
              start interval=0s timeout=15 (Asterisk-start-interval-0s)
              stop interval=0s timeout=15 (Asterisk-stop-interval-0s)
 Resource: MYSQL (class=lsb type=mysql)
  Operations: monitor interval=15 timeout=15 (MYSQL-monitor-interval-15)
              start interval=0s timeout=15 (MYSQL-start-interval-0s)
              stop interval=0s timeout=15 (MYSQL-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
  promote WebDataClone then start WebFS (kind:Mandatory)
  start WebFS then start MYSQL (kind:Mandatory)
  start ClusterIP then start Asterisk (kind:Mandatory)
Colocation Constraints:
  WebFS with WebDataClone (score:INFINITY) (with-rsc-role:Master)
  MYSQL with WebFS (score:INFINITY)
  Asterisk with ClusterIP (score:INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 resource-stickiness: 100
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: ClusterKrusher
 dc-version: 1.1.16-12.el7_4.2-94ff4df
 have-watchdog: false
 stonith-enabled: false

Quorum:
  Options:
===================


After some tibe got in logs
[root at voipserver ~]#  cat  /var/log/messages |grep drbd
Dec 12 14:08:52 voipserver kernel: block drbd1: role( Secondary -> Primary )
Dec 12 14:08:52 voipserver Filesystem(WebFS)[64935]: INFO: Running start
for /dev/drbd1 on /replica
Dec 12 14:08:52 voipserver kernel: EXT4-fs (drbd1): mounting ext3 file
system using the ext4 subsystem
Dec 12 14:08:53 voipserver kernel: EXT4-fs (drbd1): mounted filesystem with
ordered data mode. Opts: (null)
Dec 12 14:18:13 voipserver Filesystem(WebFS)[3134]: INFO: Running stop for
/dev/drbd1 on /replica
Dec 12 14:18:17 voipserver Filesystem(WebFS)[3319]: INFO: Running start for
/dev/drbd1 on /replica
Dec 12 14:18:17 voipserver kernel: EXT4-fs (drbd1): mounting ext3 file
system using the ext4 subsystem
Dec 12 14:18:17 voipserver kernel: EXT4-fs (drbd1): mounted filesystem with
ordered data mode. Opts: (null)
Dec 12 14:44:07 voipserver Filesystem(WebFS)[11669]: INFO: Running stop for
/dev/drbd1 on /replica
Dec 12 14:44:07 voipserver kernel: block drbd1: role( Primary -> Secondary )
Dec 12 14:44:07 voipserver kernel: block drbd1: 3552 KB (888 bits) marked
out-of-sync by on disk bit-map.
Dec 12 14:44:08 voipserver kernel: block drbd1: disk( UpToDate -> Failed )
Dec 12 14:44:08 voipserver kernel: block drbd1: 3552 KB (888 bits) marked
out-of-sync by on disk bit-map.
Dec 12 14:44:08 voipserver kernel: block drbd1: disk( Failed -> Diskless )
Dec 12 14:44:08 voipserver kernel: drbd r0: Terminating drbd_w_r0
Dec 12 14:44:19 voipserver kernel: drbd: loading out-of-tree module taints
kernel.
Dec 12 14:44:19 voipserver kernel: drbd: module verification failed:
signature and/or required key missing - tainting kernel
Dec 12 14:44:19 voipserver systemd-modules-load: Inserted module 'drbd'
Dec 12 14:44:19 voipserver kernel: drbd: initialized. Version: 8.4.10-1
(api:1/proto:86-101)
Dec 12 14:44:19 voipserver kernel: drbd: GIT-hash:
a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017-09-15
14:23:22
Dec 12 14:44:19 voipserver kernel: drbd: registered as block device major
147
Dec 12 14:45:02 voipserver Filesystem(WebFS)[1400]: WARNING: Couldn't find
device [/dev/drbd1]. Expected /dev/??? to exist
Dec 12 14:45:03 voipserver kernel: drbd r0: Starting worker thread (from
drbdsetup-84 [1524])
Dec 12 14:45:03 voipserver kernel: block drbd1: disk( Diskless -> Attaching
)
Dec 12 14:45:03 voipserver kernel: drbd r0: Method to ensure write
ordering: flush
Dec 12 14:45:03 voipserver kernel: block drbd1: max BIO size = 524288
Dec 12 14:45:03 voipserver kernel: block drbd1: drbd_bm_resize called with
capacity == 419153344
Dec 12 14:45:03 voipserver kernel: block drbd1: resync bitmap:
bits=52394168 words=818659 pages=1599
Dec 12 14:45:03 voipserver kernel: block drbd1: size = 200 GB (209576672 KB)
Dec 12 14:45:03 voipserver kernel: block drbd1: recounting of set bits took
additional 1 jiffies
Dec 12 14:45:03 voipserver kernel: block drbd1: 3552 KB (888 bits) marked
out-of-sync by on disk bit-map.
Dec 12 14:45:03 voipserver kernel: block drbd1: disk( Attaching -> UpToDate
)
Dec 12 14:45:03 voipserver kernel: block drbd1: attached to UUIDs
FBA12F26BE1DEE73:EE5942173C75DE98:1BF4DECFE20D51E2:1BF3DECFE20D51E3
Dec 12 14:45:03 voipserver kernel: drbd r0: conn( StandAlone -> Unconnected
)
Dec 12 14:45:03 voipserver kernel: drbd r0: Starting receiver thread (from
drbd_w_r0 [1525])
Dec 12 14:45:03 voipserver kernel: drbd r0: receiver (re)started
Dec 12 14:45:03 voipserver kernel: drbd r0: conn( Unconnected ->
WFConnection )
Dec 12 14:45:03 voipserver kernel: drbd r0: Handshake successful: Agreed
network protocol version 101
Dec 12 14:45:03 voipserver kernel: drbd r0: Feature flags enabled on
protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
Dec 12 14:45:03 voipserver kernel: drbd r0: conn( WFConnection ->
WFReportParams )
Dec 12 14:45:03 voipserver kernel: drbd r0: Starting ack_recv thread (from
drbd_r_r0 [1534])
Dec 12 14:45:03 voipserver kernel: block drbd1: drbd_sync_handshake:
Dec 12 14:45:03 voipserver kernel: block drbd1: self
FBA12F26BE1DEE72:EE5942173C75DE98:1BF4DECFE20D51E2:1BF3DECFE20D51E3
bits:888 flags:0
Dec 12 14:45:03 voipserver kernel: block drbd1: peer
93BB6F0A5075345D:EE5942173C75DE99:1BF4DECFE20D51E3:1BF3DECFE20D51E3
bits:38004 flags:2
Dec 12 14:45:03 voipserver kernel: block drbd1: uuid_compare()=100 by rule
90
Dec 12 14:45:03 voipserver kernel: block drbd1: helper command:
/sbin/drbdadm initial-split-brain minor-1
Dec 12 14:45:03 voipserver kernel: block drbd1: helper command:
/sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0)
Dec 12 14:45:03 voipserver kernel: block drbd1: Split-Brain detected but
unresolved, dropping connection!
Dec 12 14:45:03 voipserver kernel: block drbd1: helper command:
/sbin/drbdadm split-brain minor-1
Dec 12 14:45:03 voipserver kernel: block drbd1: helper command:
/sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
Dec 12 14:45:03 voipserver kernel: drbd r0: conn( WFReportParams ->
Disconnecting )
Dec 12 14:45:03 voipserver kernel: drbd r0: error receiving ReportState, e:
-5 l: 0!
Dec 12 14:45:03 voipserver kernel: drbd r0: ack_receiver terminated
Dec 12 14:45:03 voipserver kernel: drbd r0: Terminating drbd_a_r0
Dec 12 14:45:03 voipserver kernel: drbd r0: Connection closed
Dec 12 14:45:03 voipserver kernel: drbd r0: conn( Disconnecting ->
StandAlone )
Dec 12 14:45:03 voipserver kernel: drbd r0: receiver terminated
Dec 12 14:45:03 voipserver kernel: drbd r0: Terminating drbd_r_r0



So i need to decide the best way now to conf split brain recovery
config files appreciated.

Primary
[root at voipserver ~]# drbd-overview
NOTE: drbd-overview will be deprecated soon.
Please consider using drbdtop.

 1:r0/0  WFConnection Primary/Unknown UpToDate/DUnknown /replica ext3 197G
720M 186G 1%

Secondary

[root at voipserver ~]# drbd-overview
NOTE: drbd-overview will be deprecated soon.
Please consider using drbdtop.

 1:r0/0  StandAlone Secondary/Unknown UpToDate/DUnknown


So i need to decide the best way now to conf split brain recovery
config files appreciated.
THANKS

-- 
Best regards
Antony
tel.   +380669197533
tel2. +380636564340
Paypal http://paypal.me/Satskiy
<http://paypal.me/Satskiy?ppid=PPC000654&cnac=PL&rsta=en_PL(en_DK)&cust=NN8XJS9XEP22C&unptid=21db79ac-ef8d-11e5-9553-9c8e992ea258&t=&cal=4d776c21ca7d2&calc=4d776c21ca7d2&calf=4d776c21ca7d2&unp_tpcid=ppme-social-business-profile-created&page=main:email&pgrp=main:email&e=op&mchn=em&s=ci&mail=sys>
satskiy.a at gmail.com <mail%3Asatskiy.a at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171212/9b8d998f/attachment-0002.html>


More information about the Users mailing list