[Pacemaker] Split Brain on DRBD Dual Primary

Wed Jan 7 22:25:34 EST 2015

> On 12 Nov 2014, at 5:16 pm, Ho, Alamsyah - ACE Life Indonesia <Alamsyah.Ho at acegroup.com> wrote:
> 
> Hi All,
>  
> On October archives, I saw the issue reported by Felix Zachlod on http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022653.html and the same is actually happens to me now on dual primary DRBD node.
>  
> My current OS was RHEL 6.6 and software version that I used was
> pacemaker-1.1.12-4.el6.x86_64
> corosync-1.4.7-1.el6.x86_64
> cman-3.0.12.1-68.el6.x86_64

Since you're using cman, let it start the dlm and gfs controld's.
Do not create them as resources in pacemaker.

> drbd84-utils-8.9.1-1.el6.elrepo.x86_64
> kmod-drbd84-8.4.5-2.el6.elrepo.x86_64
> gfs2-utils-3.0.12.1-68.el6.x86_64
>  
> First, I will explain my existing resource. I have 3 resource which are drbd, dlm for gfs2, and HomeFS.
>  
> Master: HomeDataClone
>   Meta Attrs: master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true interval=0s
>   Resource: HomeData (class=ocf provider=linbit type=drbd)
>    Attributes: drbd_resource=homedata
>    Operations: start interval=0s timeout=240 (HomeData-start-timeout-240)
>                promote interval=0s (HomeData-promote-interval-0s)
>                demote interval=0s timeout=90 (HomeData-demote-timeout-90)
>                stop interval=0s timeout=100 (HomeData-stop-timeout-100)
>                monitor interval=60s (HomeData-monitor-interval-60s)
> Clone: HomeFS-clone
>   Meta Attrs: start-delay=30s target-role=Stopped
>   Resource: HomeFS (class=ocf provider=heartbeat type=Filesystem)
>    Attributes: device=/dev/drbd/by-res/homedata directory=/home fstype=gfs2
>    Operations: start interval=0s timeout=60 (HomeFS-start-timeout-60)
>                stop interval=0s timeout=60 (HomeFS-stop-timeout-60)
>                monitor interval=20 timeout=40 (HomeFS-monitor-interval-20)
> Clone: dlm-clone
>   Meta Attrs: clone-max=2 clone-node-max=1 start-delay=0s
>   Resource: dlm (class=ocf provider=pacemaker type=controld)
>    Operations: start interval=0s timeout=90 (dlm-start-timeout-90)
>                stop interval=0s timeout=100 (dlm-stop-timeout-100)
>                monitor interval=60s (dlm-monitor-interval-60s)
>  
>  
> But when I try to start the cluster on normal condition, It will cause split brain on DRBD on each node. From the log I can see it was the same case with Felix which was caused by pacemaker promoting drbd to primary  while it was still waiting for handshake connection on each node.
>  
> Nov 12 11:37:32 node002 kernel: block drbd1: disk( Attaching -> UpToDate )
> Nov 12 11:37:32 node002 kernel: block drbd1: attached to UUIDs C9630089EC3B58CC:0000000000000000:B4653C665EBC0DBB:B4643C665EBC0DBA
> Nov 12 11:37:32 node002 kernel: drbd homedata: conn( StandAlone -> Unconnected )
> Nov 12 11:37:32 node002 kernel: drbd homedata: Starting receiver thread (from drbd_w_homedata [22531])
> Nov 12 11:37:32 node002 kernel: drbd homedata: receiver (re)started
> Nov 12 11:37:32 node002 kernel: drbd homedata: conn( Unconnected -> WFConnection )
> Nov 12 11:37:32 node002 attrd[22340]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-HomeData (1000)
> Nov 12 11:37:32 node002 attrd[22340]:   notice: attrd_perform_update: Sent update 17: master-HomeData=1000
> Nov 12 11:37:32 node002 crmd[22342]:   notice: process_lrm_event: Operation HomeData_start_0: ok (node=node002, call=18, rc=0, cib-update=13, confirmed=true)
> Nov 12 11:37:33 node002 crmd[22342]:   notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=19, rc=0, cib-update=0, confirmed=true)
> Nov 12 11:37:33 node002 crmd[22342]:   notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=20, rc=0, cib-update=0, confirmed=true)
> Nov 12 11:37:33 node002 kernel: block drbd1: role( Secondary -> Primary )
> Nov 12 11:37:33 node002 kernel: block drbd1: new current UUID 58F02AE0E03C1C91:C9630089EC3B58CC:B4653C665EBC0DBB:B4643C665EBC0DBA
> Nov 12 11:37:33 node002 crmd[22342]:   notice: process_lrm_event: Operation HomeData_promote_0: ok (node=node002, call=21, rc=0, cib-update=14, confirmed=true)
> Nov 12 11:37:33 node002 attrd[22340]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-HomeData (10000)
> Nov 12 11:37:33 node002 attrd[22340]:   notice: attrd_perform_update: Sent update 23: master-HomeData=10000
> Nov 12 11:37:33 node002 crmd[22342]:   notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=22, rc=0, cib-update=0, confirmed=true)
> Nov 12 11:37:33 node002 kernel: drbd homedata: Handshake successful: Agreed network protocol version 101
> Nov 12 11:37:33 node002 kernel: drbd homedata: Agreed to support TRIM on protocol level
> Nov 12 11:37:33 node002 kernel: drbd homedata: Peer authenticated using 20 bytes HMAC
> Nov 12 11:37:33 node002 kernel: drbd homedata: conn( WFConnection -> WFReportParams )
> Nov 12 11:37:33 node002 kernel: drbd homedata: Starting asender thread (from drbd_r_homedata [22543])
> Nov 12 11:37:33 node002 kernel: block drbd1: drbd_sync_handshake:
> Nov 12 11:37:33 node002 kernel: block drbd1: self 58F02AE0E03C1C91:C9630089EC3B58CC:B4653C665EBC0DBB:B4643C665EBC0DBA bits:0 flags:0
> Nov 12 11:37:33 node002 kernel: block drbd1: peer 0FAA8E4B66817421:C9630089EC3B58CD:B4653C665EBC0DBA:B4643C665EBC0DBA bits:0 flags:0
> Nov 12 11:37:33 node002 kernel: block drbd1: uuid_compare()=100 by rule 90
> Nov 12 11:37:33 node002 kernel: block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
> Nov 12 11:37:33 node002 kernel: block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0)
> Nov 12 11:37:33 node002 kernel: block drbd1: Split-Brain detected but unresolved, dropping connection!
> Nov 12 11:37:33 node002 kernel: block drbd1: helper command: /sbin/drbdadm split-brain minor-1
> Nov 12 11:37:33 node002 kernel: block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
> Nov 12 11:37:33 node002 kernel: drbd homedata: conn( WFReportParams -> Disconnecting )
> Nov 12 11:37:33 node002 kernel: drbd homedata: error receiving ReportState, e: -5 l: 0!
> Nov 12 11:37:33 node002 kernel: drbd homedata: asender terminated
> Nov 12 11:37:33 node002 kernel: drbd homedata: Terminating drbd_a_homedata
> Nov 12 11:37:33 node002 kernel: drbd homedata: Connection closed
> Nov 12 11:37:33 node002 kernel: drbd homedata: conn( Disconnecting -> StandAlone )
> Nov 12 11:37:33 node002 kernel: drbd homedata: receiver terminated
> Nov 12 11:37:33 node002 kernel: drbd homedata: Terminating drbd_r_homedata
>  
> But if I disable the other two resource and only have HomeDataClone resource enabled on cluster startup, then drbd device starts connected and both nodes promoted to primary. Here is the log
>  
> Nov 12 12:38:11 node002 kernel: drbd homedata: Starting worker thread (from drbdsetup-84 [26752])
> Nov 12 12:38:11 node002 kernel: block drbd1: disk( Diskless -> Attaching )
> Nov 12 12:38:11 node002 kernel: drbd homedata: Method to ensure write ordering: flush
> Nov 12 12:38:11 node002 kernel: block drbd1: max BIO size = 1048576
> Nov 12 12:38:11 node002 kernel: block drbd1: drbd_bm_resize called with capacity == 314563128
> Nov 12 12:38:11 node002 kernel: block drbd1: resync bitmap: bits=39320391 words=614382 pages=1200
> Nov 12 12:38:11 node002 kernel: block drbd1: size = 150 GB (157281564 KB)
> Nov 12 12:38:11 node002 kernel: block drbd1: recounting of set bits took additional 7 jiffies
> Nov 12 12:38:11 node002 kernel: block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Nov 12 12:38:11 node002 kernel: block drbd1: disk( Attaching -> UpToDate )
> Nov 12 12:38:11 node002 kernel: block drbd1: attached to UUIDs 01FA7FA3D219A8B4:0000000000000000:0FAB8E4B66817420:0FAA8E4B66817421
> Nov 12 12:38:11 node002 kernel: drbd homedata: conn( StandAlone -> Unconnected )
> Nov 12 12:38:11 node002 kernel: drbd homedata: Starting receiver thread (from drbd_w_homedata [26753])
> Nov 12 12:38:11 node002 kernel: drbd homedata: receiver (re)started
> Nov 12 12:38:11 node002 kernel: drbd homedata: conn( Unconnected -> WFConnection )
> Nov 12 12:38:11 node002 attrd[26577]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-HomeData (1000)
> Nov 12 12:38:11 node002 attrd[26577]:   notice: attrd_perform_update: Sent update 17: master-HomeData=1000
> Nov 12 12:38:11 node002 crmd[26579]:   notice: process_lrm_event: Operation HomeData_start_0: ok (node=node002, call=18, rc=0, cib-update=12, confirmed=true)
> Nov 12 12:38:11 node002 crmd[26579]:   notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=19, rc=0, cib-update=0, confirmed=true)
> Nov 12 12:38:11 node002 kernel: drbd homedata: Handshake successful: Agreed network protocol version 101
> Nov 12 12:38:11 node002 kernel: drbd homedata: Agreed to support TRIM on protocol level
> Nov 12 12:38:11 node002 kernel: drbd homedata: Peer authenticated using 20 bytes HMAC
> Nov 12 12:38:11 node002 kernel: drbd homedata: conn( WFConnection -> WFReportParams )
> Nov 12 12:38:11 node002 kernel: drbd homedata: Starting asender thread (from drbd_r_homedata [26764])
> Nov 12 12:38:11 node002 kernel: block drbd1: drbd_sync_handshake:
> Nov 12 12:38:11 node002 kernel: block drbd1: self 01FA7FA3D219A8B4:0000000000000000:0FAB8E4B66817420:0FAA8E4B66817421 bits:0 flags:0
> Nov 12 12:38:11 node002 kernel: block drbd1: peer 4499ABF2AAE91DF2:01FA7FA3D219A8B5:0FAB8E4B66817421:0FAA8E4B66817421 bits:0 flags:0
> Nov 12 12:38:11 node002 kernel: block drbd1: uuid_compare()=-1 by rule 50
> Nov 12 12:38:11 node002 kernel: block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
> Nov 12 12:38:11 node002 kernel: block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Nov 12 12:38:11 node002 kernel: block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Nov 12 12:38:11 node002 kernel: block drbd1: conn( WFBitMapT -> WFSyncUUID )
> Nov 12 12:38:11 node002 kernel: block drbd1: updated sync uuid 01FB7FA3D219A8B4:0000000000000000:0FAB8E4B66817420:0FAA8E4B66817421
> Nov 12 12:38:11 node002 kernel: block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
> Nov 12 12:38:11 node002 kernel: block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
> Nov 12 12:38:11 node002 kernel: block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
> Nov 12 12:38:11 node002 kernel: block drbd1: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
> Nov 12 12:38:11 node002 kernel: block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
> Nov 12 12:38:11 node002 kernel: block drbd1: updated UUIDs 4499ABF2AAE91DF2:0000000000000000:01FB7FA3D219A8B4:01FA7FA3D219A8B5
> Nov 12 12:38:11 node002 kernel: block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
> Nov 12 12:38:11 node002 kernel: block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
> Nov 12 12:38:11 node002 crm-unfence-peer.sh[26804]: invoked for homedata
> Nov 12 12:38:11 node002 kernel: block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0)
> Nov 12 12:38:12 node002 crmd[26579]:   notice: process_lrm_event: Operation dlm_stop_0: ok (node=node002, call=17, rc=0, cib-update=13, confirmed=true)
> Nov 12 12:38:13 node002 crmd[26579]:   notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=20, rc=0, cib-update=0, confirmed=true)
> Nov 12 12:38:13 node002 kernel: block drbd1: peer( Secondary -> Primary )
> Nov 12 12:38:13 node002 kernel: block drbd1: role( Secondary -> Primary )
> Nov 12 12:38:13 node002 crmd[26579]:   notice: process_lrm_event: Operation HomeData_promote_0: ok (node=node002, call=21, rc=0, cib-update=14, confirmed=true)
> Nov 12 12:38:13 node002 attrd[26577]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-HomeData (10000)
> Nov 12 12:38:13 node002 attrd[26577]:   notice: attrd_perform_update: Sent update 23: master-HomeData=10000
> Nov 12 12:38:13 node002 crmd[26579]:   notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=22, rc=0, cib-update=0, confirmed=true)
>  
> So based on the result above, then I tried to add constraint order to start HomeDataClone then start dlm-clone while still disabling HomeFS-clone. The result is drbd startup both connected  and become primary but it seems dlm-clone cannot be started after HomeDataClone and that also caused dlm-clone service to be stopped and not running.
>  
> Master/Slave Set: HomeDataClone [HomeData]
>      Masters: [ node001 node002 ]
> Clone Set: HomeFS-clone [HomeFS]
>      Stopped: [ node001 node002 ]
> Clone Set: dlm-clone [dlm]
>      Stopped: [ node001 node002 ]
>  
> Failed actions:
>     dlm_start_0 on node001 'not configured' (6): call=21, status=complete, last-rc-change='Wed Nov 12 12:46:42 2014', queued=0ms, exec=59ms
>     dlm_start_0 on node002 'not configured' (6): call=21, status=complete, last-rc-change='Wed Nov 12 12:46:40 2014', queued=0ms, exec=69ms
>  
>  
> Please help to answer some of my questions
> 1.       Why pacemaker controld failed if I set the constraint order to start drbd resource first then after that start controld?
> 2.       Is there a configuration to enable drbd promote delay? For example after start drbd service then wait before promoting the resource to primary
>  
> So right now, I am stuck on getting cluster running and managed all the resources without any error at all. My temporary solution is to start the drbd service manually first before using pcs to start up the cluster. Of course this is not the best practices so I would like to ask any advice or feedback to fix this issue.
>  
>  
> Thanks before for any hint/advice.
> 
> ___________________________________________________________________
> This email is intended for the designated recipient(s) only, and may be confidential, non-public, proprietary, protected by the attorney/client or other privilege. Unauthorized reading, distribution, copying or other use of this communication is prohibited and may be unlawful. Receipt by anyone other than the intended recipient(s) should not be deemed a waiver of any privilege or protection. If you are not the intended recipient or if you believe that you have received this email in error, please notify the sender immediately and delete all copies from your computer system without reading, saving, or using it in any manner. Although it has been checked for viruses and other malicious software ("malware"), we do not warrant, represent or guarantee in any way that this communication is free of malware or potentially damaging defects. All liability for any actual or alleged loss, damage, or injury arising out of or resulting in any way from the receipt, opening or use of this email is expressly disclaimed.
> ______________________________________________________________________
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org