[ClusterLabs] DRBD split brain after Cluster node recovery
emmanuel segura
emi2fast at gmail.com
Wed Jul 12 05:46:32 EDT 2017
you need to configure cluster fencing and drbd fencing handler, in this
way, the cluster can recevory without manual intervention.
2017-07-12 11:33 GMT+02:00 ArekW <arkaduis at gmail.com>:
> Hi,
> Can in be fixed that the drbd is entering split brain after cluster
> node recovery? After few tests I saw drbd recovered but in most
> situations (9/10) it didn't sync.
>
> 1. When a node is put to standby and than unstandby everything is
> working OK. The drbd is syncing and go to primary mode.
>
> 2. When a node is (hard)poweroff, the stonith brings it up and
> eventually the node becomes online but the drdb is in StandAlone state
> on the recovered node. I can sync it only manually but that require to
> stop the cluster.
>
> Logs:
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Handshake to
> peer 1 successful: Agreed network protocol version 112
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Feature flags
> enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Starting
> ack_recv thread (from drbd_r_storage [28960])
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage: Preparing cluster-wide
> state change 2237079084 (0->1 499/145)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage: State change
> 2237079084: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage: Committing cluster-wide
> state change 2237079084 (1ms)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn(
> Connecting -> Connected ) peer( Unknown -> Secondary )
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: current_size:
> 14679544
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> c_size: 14679544 u_size: 0 d_size: 14679544 max_size: 14679544
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> la_size: 14679544 my_usize: 0 my_max_size: 14679544
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: my node_id: 0
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> node_id: 1 idx: 0 bm-uuid: 0x441536064ceddc92 flags: 0x10 max_size:
> 14679544 (DUnknown)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> calling drbd_determine_dev_size()
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: my node_id: 0
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> node_id: 1 idx: 0 bm-uuid: 0x441536064ceddc92 flags: 0x10 max_size:
> 14679544 (DUnknown)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> drbd_sync_handshake:
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: self
> 342BE98297943C35:441536064CEDDC92:69D98E1FCC2BB44C:E04101C6FF76D1CC
> bits:15450 flags:120
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: peer
> A8908796A7CCFF6E:CE6B672F4EDA6E78:69D98E1FCC2BB44C:E04101C6FF76D1CC
> bits:32768 flags:2
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> uuid_compare()=-100 by rule 100
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper
> command: /sbin/drbdadm initial-split-brain
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper
> command: /sbin/drbdadm initial-split-brain exit code 0 (0x0)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: Split-Brain
> detected but unresolved, dropping connection!
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper
> command: /sbin/drbdadm split-brain
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper
> command: /sbin/drbdadm split-brain exit code 0 (0x0)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn(
> Connected -> Disconnecting ) peer( Secondary -> Unknown )
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: error
> receiving P_STATE, e: -5 l: 0!
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: ack_receiver
> terminated
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Terminating
> ack_recv thread
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Connection closed
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn(
> Disconnecting -> StandAlone )
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Terminating
> receiver thread
>
>
> Config:
> resource storage {
> protocol C;
> meta-disk internal;
> device /dev/drbd1;
> syncer {
> verify-alg sha1;
> }
> net {
> allow-two-primaries;
> }
> on nfsnode1 {
> disk /dev/storage/drbd;
> address 10.0.2.15:7789;
> }
> on nfsnode2 {
> disk /dev/storage/drbd;
> address 10.0.2.4:7789;
> }
> }
>
> pcs resource show StorageFS-clone
> Clone: StorageFS-clone
> Resource: StorageFS (class=ocf provider=heartbeat type=Filesystem)
> Attributes: device=/dev/drbd1 directory=/mnt/drbd fstype=gfs2
> Operations: start interval=0s timeout=60 (StorageFS-start-interval-0s)
> stop interval=0s timeout=60 (StorageFS-stop-interval-0s)
> monitor interval=20 timeout=40 (StorageFS-monitor-interval-
> 20)
>
> Full list of resources:
>
> Master/Slave Set: StorageClone [Storage]
> Masters: [ nfsnode1 nfsnode2 ]
> Clone Set: dlm-clone [dlm]
> Started: [ nfsnode1 nfsnode2 ]
> Clone Set: ClusterIP-clone [ClusterIP] (unique)
> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started nfsnode2
> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started nfsnode1
> Clone Set: StorageFS-clone [StorageFS]
> Started: [ nfsnode1 nfsnode2 ]
> Clone Set: WebSite-clone [WebSite]
> Started: [ nfsnode1 nfsnode2 ]
> Clone Set: nfs-group-clone [nfs-group]
> Started: [ nfsnode1 nfsnode2 ]
> Clone Set: ping-clone [ping]
> Started: [ nfsnode1 nfsnode2 ]
> vbox-fencing (stonith:fence_vbox): Started nfsnode2
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
--
.~.
/V\
// \\
/( )\
^`~'^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170712/df4d1466/attachment-0003.html>
More information about the Users
mailing list