[ClusterLabs] DRBD sync stalled at 100% ?
Eric Robinson
eric.robinson at psmnv.com
Sat Jun 27 17:31:42 EDT 2020
Thanks for the feedback. I was hoping for a non-downtime solution. No way to do that?
Get Outlook for Android<https://aka.ms/ghei36>
________________________________
From: Strahil Nikolov <hunter86_bg at yahoo.com>
Sent: Saturday, June 27, 2020 2:40:38 PM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>; Eric Robinson <eric.robinson at psmnv.com>
Subject: Re: [ClusterLabs] DRBD sync stalled at 100% ?
I've seen this on a test setup after multiple network disruptions.
I managed to fix it by stopping drbd on all nodes and starting it back.
I guess you can get downtime and try that approach.
Best Regards,
Strahil Nikolov
На 27 юни 2020 г. 16:36:10 GMT+03:00, Eric Robinson <eric.robinson at psmnv.com> написа:
>I'm not seeing anything on Google about this. Two DRBD nodes lost
>communication with each other, and then reconnected and started sync.
>But then it got to 100% and is just stalled there.
>
>The nodes are 001db03a, 001db03b.
>
>On 001db03a:
>
>[root at 001db03a ~]# drbdadm status
>ha01_mysql role:Primary
> disk:UpToDate
> 001db03b role:Secondary
> replication:SyncSource peer-disk:Inconsistent done:100.00
>
>ha02_mysql role:Secondary
> disk:UpToDate
> 001db03b role:Primary
> peer-disk:UpToDate
>
>On 001drbd03b:
>
>[root at 001db03b ~]# drbdadm status
>ha01_mysql role:Secondary
> disk:Inconsistent
> 001db03a role:Primary
> replication:SyncTarget peer-disk:UpToDate done:100.00
>
>ha02_mysql role:Primary
> disk:UpToDate
> 001db03a role:Secondary
> peer-disk:UpToDate
>
>
>On 001db03a, here are the DRBD messages from the onset of the problem
>until now.
>
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: PingAck did
>not arrive in time.
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connected -> NetworkFailure ) peer( Primary -> Unknown )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>UpToDate -> Consistent )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver
>terminated
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Terminating
>ack_recv thread
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql: Preparing
>cluster-wide state change 2946943372 (1->-1 0/0)
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql: Committing
>cluster-wide state change 2946943372 (6ms)
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Consistent -> UpToDate )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Connection
>closed
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>NetworkFailure -> Unconnected )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Restarting
>receiver thread
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Unconnected -> Connecting )
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: PingAck did
>not arrive in time.
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn(
>Connected -> NetworkFailure ) peer( Secondary -> Unknown )
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: ack_receiver
>terminated
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Terminating
>ack_recv thread
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql/0 drbd0: new current
>UUID: D07A3D4B2F99832D weak: FFFFFFFFFFFFFFFD
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Connection
>closed
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn(
>NetworkFailure -> Unconnected )
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Restarting
>receiver thread
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn(
>Unconnected -> Connecting )
>Jun 26 22:34:33 001db03a pengine[1474]: notice: * Start
>p_drbd0:1 ( 001db03b )
>Jun 26 22:34:33 001db03a crmd[1475]: notice: Initiating notify
>operation p_drbd0_pre_notify_start_0 locally on 001db03a
>Jun 26 22:34:33 001db03a crmd[1475]: notice: Result of notify
>operation for p_drbd0 on 001db03a: 0 (ok)
>Jun 26 22:34:33 001db03a crmd[1475]: notice: Initiating start
>operation p_drbd0_start_0 on 001db03b
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to
>peer 0 successful: Agreed network protocol version 113
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Feature
>flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>WRITE_ZEROES.
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Starting
>ack_recv thread (from drbd_r_ha02_mys [2116])
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Preparing
>remote state change 3920461435
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Committing
>remote state change 3920461435 (primary_nodes=1)
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connecting -> Connected ) peer( Unknown -> Primary )
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>UpToDate -> Outdated )
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>drbd_sync_handshake:
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self
>492F8D33A72A8E08:0000000000000000:659DC04F5C85B6E4:8254EEA2EC50AD7C
>bits:0 flags:120
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer
>5A6B1EBE80500C39:492F8D33A72A8E09:659DC04F5C85B6E4:51A00A23ED88187A
>bits:1 flags:120
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>uuid_compare()=-2 by rule 50
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
>compression: 100.0%
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: send
>bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
>compression: 100.0%
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>helper command: /sbin/drbdadm before-resync-target
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>helper command: /sbin/drbdadm before-resync-target exit code 0 (0x0)
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Outdated -> Inconsistent )
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>repl( WFBitMapT -> SyncTarget )
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>Began resync as SyncTarget (will sync 4 KB [1 bits set]).
>Jun 26 22:34:35 001db03a crmd[1475]: notice: Initiating notify
>operation p_drbd0_post_notify_start_0 locally on 001db03a
>Jun 26 22:34:35 001db03a crmd[1475]: notice: Initiating notify
>operation p_drbd0_post_notify_start_0 on 001db03b
>Jun 26 22:34:35 001db03a crmd[1475]: notice: Transition aborted by
>status-2-master-p_drbd0 doing create master-p_drbd0=10000: Transient
>attribute change
>Jun 26 22:34:35 001db03a crmd[1475]: notice: Result of notify
>operation for p_drbd0 on 001db03a: 0 (ok)
>Jun 26 22:34:35 001db03a crmd[1475]: notice: Initiating monitor
>operation p_drbd0_monitor_60000 on 001db03b
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>Resync done (total 1 sec; paused 0 sec; 4 K/sec)
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>updated UUIDs
>5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Inconsistent -> UpToDate )
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>repl( SyncTarget -> Established )
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>helper command: /sbin/drbdadm after-resync-target
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>helper command: /sbin/drbdadm after-resync-target exit code 0 (0x0)
>Jun 26 22:34:35 001db03a crmd[1475]: notice: Transition aborted by
>status-2-master-p_drbd0 doing modify master-p_drbd0=1000: Transient
>attribute change
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Handshake to
>peer 0 successful: Agreed network protocol version 113
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Feature
>flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>WRITE_ZEROES.
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Starting
>ack_recv thread (from drbd_r_ha01_mys [2110])
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Preparing
>remote state change 3458191960
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Committing
>remote state change 3458191960 (primary_nodes=2)
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: conn(
>Connecting -> Connected ) peer( Unknown -> Secondary )
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>drbd_sync_handshake:
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: self
>D07A3D4B2F99832D:50AE57670FCB98C3:7DDDDEEEEEA477C4:B75C5B6B7AAFBB6A
>bits:22 flags:120
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: peer
>50AE57670FCB98C2:0000000000000000:7DDDDEEEEEA477C4:D2AAA82A5FF6EE84
>bits:0 flags:20
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>uuid_compare()=2 by rule 70
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>pdsk( DUnknown -> Consistent ) repl( Off -> WFBitMapS )
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: send
>bitmap stats [Bytes(packets)]: plain 0(0), RLE 83(1), total 83;
>compression: 100.0%
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>pdsk( Consistent -> Outdated )
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 83(1), total 83;
>compression: 100.0%
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>helper command: /sbin/drbdadm before-resync-source
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>helper command: /sbin/drbdadm before-resync-source exit code 0 (0x0)
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>Began resync as SyncSource (will sync 212 KB [53 bits set]).
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: sock was
>shut down by peer
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connected -> BrokenPipe ) peer( Primary -> Unknown )
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>UpToDate -> Consistent )
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: meta
>connection shut down by peer.
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver
>terminated
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: Terminating
>ack_recv thread
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql: Preparing
>cluster-wide state change 2546365252 (1->-1 0/0)
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql: Committing
>cluster-wide state change 2546365252 (9ms)
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Consistent -> UpToDate )
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: Connection
>closed
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>BrokenPipe -> Unconnected )
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: Restarting
>receiver thread
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Unconnected -> Connecting )
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to
>peer 0 successful: Agreed network protocol version 113
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Feature
>flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>WRITE_ZEROES.
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Starting
>ack_recv thread (from drbd_r_ha02_mys [2116])
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Preparing
>remote state change 1109150886
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Committing
>remote state change 1109150886 (primary_nodes=1)
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connecting -> Connected ) peer( Unknown -> Primary )
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>drbd_sync_handshake:
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self
>5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>bits:0 flags:120
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer
>5A6B1EBE80500C39:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>bits:0 flags:120
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>uuid_compare()=0 by rule 38
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: PingAck did
>not arrive in time.
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connected -> NetworkFailure ) peer( Primary -> Unknown )
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>UpToDate -> Consistent )
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver
>terminated
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: Terminating
>ack_recv thread
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: sock was
>shut down by peer
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql: Preparing
>cluster-wide state change 3067178175 (1->-1 0/0)
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql: Committing
>cluster-wide state change 3067178175 (8ms)
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Consistent -> UpToDate )
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Connection
>closed
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>NetworkFailure -> Unconnected )
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Restarting
>receiver thread
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Unconnected -> Connecting )
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to
>peer 0 successful: Agreed network protocol version 113
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Feature
>flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>WRITE_ZEROES.
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Starting
>ack_recv thread (from drbd_r_ha02_mys [2116])
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Preparing
>remote state change 2747304939
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Committing
>remote state change 2747304939 (primary_nodes=1)
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connecting -> Connected ) peer( Unknown -> Primary )
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>drbd_sync_handshake:
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self
>5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>bits:0 flags:120
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer
>5A6B1EBE80500C39:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>bits:0 flags:120
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>uuid_compare()=0 by rule 38
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
>
>[cid:image001.png at 01D64C5C.FA76F310]
>
>Disclaimer : This email and any files transmitted with it are
>confidential and intended solely for intended recipients. If you are
>not the named addressee you should not disseminate, distribute, copy or
>alter this email. Any views or opinions presented in this email are
>solely those of the author and might not represent those of Physician
>Select Management. Warning: Although Physician Select Management has
>taken reasonable precautions to ensure no viruses are present in this
>email, the company cannot accept responsibility for any loss or damage
>arising from the use of this email or attachments.
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200627/d0b17f67/attachment-0001.htm>
More information about the Users
mailing list