[ClusterLabs] DRBD sync stalled at 100% ?

Strahil Nikolov hunter86_bg at yahoo.com
Sat Jun 27 15:40:38 EDT 2020


I've  seen this  on a  test setup  after multiple  network disruptions.
I managed  to fix it by stopping drbd on all  nodes  and starting it back.

I guess  you can get downtime  and try that approach.


Best Regards,
Strahil Nikolov



На 27 юни 2020 г. 16:36:10 GMT+03:00, Eric Robinson <eric.robinson at psmnv.com> написа:
>I'm not seeing anything on Google about this. Two DRBD nodes lost
>communication with each other, and then reconnected and started sync.
>But then it got to 100% and is just stalled there.
>
>The nodes are 001db03a, 001db03b.
>
>On 001db03a:
>
>[root at 001db03a ~]# drbdadm status
>ha01_mysql role:Primary
>  disk:UpToDate
>  001db03b role:Secondary
>    replication:SyncSource peer-disk:Inconsistent done:100.00
>
>ha02_mysql role:Secondary
>  disk:UpToDate
>  001db03b role:Primary
>    peer-disk:UpToDate
>
>On 001drbd03b:
>
>[root at 001db03b ~]# drbdadm status
>ha01_mysql role:Secondary
>  disk:Inconsistent
>  001db03a role:Primary
>    replication:SyncTarget peer-disk:UpToDate done:100.00
>
>ha02_mysql role:Primary
>  disk:UpToDate
>  001db03a role:Secondary
>    peer-disk:UpToDate
>
>
>On 001db03a, here are the DRBD messages from the onset of the problem
>until now.
>
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: PingAck did
>not arrive in time.
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connected -> NetworkFailure ) peer( Primary -> Unknown )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>UpToDate -> Consistent )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver
>terminated
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Terminating
>ack_recv thread
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql: Preparing
>cluster-wide state change 2946943372 (1->-1 0/0)
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql: Committing
>cluster-wide state change 2946943372 (6ms)
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Consistent -> UpToDate )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Connection
>closed
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>NetworkFailure -> Unconnected )
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Restarting
>receiver thread
>Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Unconnected -> Connecting )
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: PingAck did
>not arrive in time.
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn(
>Connected -> NetworkFailure ) peer( Secondary -> Unknown )
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: ack_receiver
>terminated
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Terminating
>ack_recv thread
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql/0 drbd0: new current
>UUID: D07A3D4B2F99832D weak: FFFFFFFFFFFFFFFD
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Connection
>closed
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn(
>NetworkFailure -> Unconnected )
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Restarting
>receiver thread
>Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn(
>Unconnected -> Connecting )
>Jun 26 22:34:33 001db03a pengine[1474]:  notice:  * Start     
>p_drbd0:1        (                 001db03b )
>Jun 26 22:34:33 001db03a crmd[1475]:  notice: Initiating notify
>operation p_drbd0_pre_notify_start_0 locally on 001db03a
>Jun 26 22:34:33 001db03a crmd[1475]:  notice: Result of notify
>operation for p_drbd0 on 001db03a: 0 (ok)
>Jun 26 22:34:33 001db03a crmd[1475]:  notice: Initiating start
>operation p_drbd0_start_0 on 001db03b
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to
>peer 0 successful: Agreed network protocol version 113
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Feature
>flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>WRITE_ZEROES.
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Starting
>ack_recv thread (from drbd_r_ha02_mys [2116])
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Preparing
>remote state change 3920461435
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Committing
>remote state change 3920461435 (primary_nodes=1)
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connecting -> Connected ) peer( Unknown -> Primary )
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>UpToDate -> Outdated )
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>drbd_sync_handshake:
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self
>492F8D33A72A8E08:0000000000000000:659DC04F5C85B6E4:8254EEA2EC50AD7C
>bits:0 flags:120
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer
>5A6B1EBE80500C39:492F8D33A72A8E09:659DC04F5C85B6E4:51A00A23ED88187A
>bits:1 flags:120
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>uuid_compare()=-2 by rule 50
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
>Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
>compression: 100.0%
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: send
>bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
>compression: 100.0%
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>helper command: /sbin/drbdadm before-resync-target
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>helper command: /sbin/drbdadm before-resync-target exit code 0 (0x0)
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Outdated -> Inconsistent )
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>repl( WFBitMapT -> SyncTarget )
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>Began resync as SyncTarget (will sync 4 KB [1 bits set]).
>Jun 26 22:34:35 001db03a crmd[1475]:  notice: Initiating notify
>operation p_drbd0_post_notify_start_0 locally on 001db03a
>Jun 26 22:34:35 001db03a crmd[1475]:  notice: Initiating notify
>operation p_drbd0_post_notify_start_0 on 001db03b
>Jun 26 22:34:35 001db03a crmd[1475]:  notice: Transition aborted by
>status-2-master-p_drbd0 doing create master-p_drbd0=10000: Transient
>attribute change
>Jun 26 22:34:35 001db03a crmd[1475]:  notice: Result of notify
>operation for p_drbd0 on 001db03a: 0 (ok)
>Jun 26 22:34:35 001db03a crmd[1475]:  notice: Initiating monitor
>operation p_drbd0_monitor_60000 on 001db03b
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>Resync done (total 1 sec; paused 0 sec; 4 K/sec)
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>updated UUIDs
>5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Inconsistent -> UpToDate )
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>repl( SyncTarget -> Established )
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>helper command: /sbin/drbdadm after-resync-target
>Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>helper command: /sbin/drbdadm after-resync-target exit code 0 (0x0)
>Jun 26 22:34:35 001db03a crmd[1475]:  notice: Transition aborted by
>status-2-master-p_drbd0 doing modify master-p_drbd0=1000: Transient
>attribute change
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Handshake to
>peer 0 successful: Agreed network protocol version 113
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Feature
>flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>WRITE_ZEROES.
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Starting
>ack_recv thread (from drbd_r_ha01_mys [2110])
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Preparing
>remote state change 3458191960
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Committing
>remote state change 3458191960 (primary_nodes=2)
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: conn(
>Connecting -> Connected ) peer( Unknown -> Secondary )
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>drbd_sync_handshake:
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: self
>D07A3D4B2F99832D:50AE57670FCB98C3:7DDDDEEEEEA477C4:B75C5B6B7AAFBB6A
>bits:22 flags:120
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: peer
>50AE57670FCB98C2:0000000000000000:7DDDDEEEEEA477C4:D2AAA82A5FF6EE84
>bits:0 flags:20
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>uuid_compare()=2 by rule 70
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>pdsk( DUnknown -> Consistent ) repl( Off -> WFBitMapS )
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: send
>bitmap stats [Bytes(packets)]: plain 0(0), RLE 83(1), total 83;
>compression: 100.0%
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>pdsk( Consistent -> Outdated )
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 83(1), total 83;
>compression: 100.0%
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>helper command: /sbin/drbdadm before-resync-source
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>helper command: /sbin/drbdadm before-resync-source exit code 0 (0x0)
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
>Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b:
>Began resync as SyncSource (will sync 212 KB [53 bits set]).
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: sock was
>shut down by peer
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connected -> BrokenPipe ) peer( Primary -> Unknown )
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>UpToDate -> Consistent )
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: meta
>connection shut down by peer.
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver
>terminated
>Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: Terminating
>ack_recv thread
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql: Preparing
>cluster-wide state change 2546365252 (1->-1 0/0)
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql: Committing
>cluster-wide state change 2546365252 (9ms)
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Consistent -> UpToDate )
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: Connection
>closed
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>BrokenPipe -> Unconnected )
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: Restarting
>receiver thread
>Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Unconnected -> Connecting )
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to
>peer 0 successful: Agreed network protocol version 113
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Feature
>flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>WRITE_ZEROES.
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Starting
>ack_recv thread (from drbd_r_ha02_mys [2116])
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Preparing
>remote state change 1109150886
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Committing
>remote state change 1109150886 (primary_nodes=1)
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connecting -> Connected ) peer( Unknown -> Primary )
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>drbd_sync_handshake:
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self
>5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>bits:0 flags:120
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer
>5A6B1EBE80500C39:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>bits:0 flags:120
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>uuid_compare()=0 by rule 38
>Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: PingAck did
>not arrive in time.
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connected -> NetworkFailure ) peer( Primary -> Unknown )
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>UpToDate -> Consistent )
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver
>terminated
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: Terminating
>ack_recv thread
>Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: sock was
>shut down by peer
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql: Preparing
>cluster-wide state change 3067178175 (1->-1 0/0)
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql: Committing
>cluster-wide state change 3067178175 (8ms)
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1: disk(
>Consistent -> UpToDate )
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Connection
>closed
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>NetworkFailure -> Unconnected )
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Restarting
>receiver thread
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Unconnected -> Connecting )
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to
>peer 0 successful: Agreed network protocol version 113
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Feature
>flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>WRITE_ZEROES.
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Starting
>ack_recv thread (from drbd_r_ha02_mys [2116])
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Preparing
>remote state change 2747304939
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Committing
>remote state change 2747304939 (primary_nodes=1)
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn(
>Connecting -> Connected ) peer( Unknown -> Primary )
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>drbd_sync_handshake:
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self
>5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>bits:0 flags:120
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer
>5A6B1EBE80500C39:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
>bits:0 flags:120
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>uuid_compare()=0 by rule 38
>Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b:
>pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
>
>[cid:image001.png at 01D64C5C.FA76F310]
>
>Disclaimer : This email and any files transmitted with it are
>confidential and intended solely for intended recipients. If you are
>not the named addressee you should not disseminate, distribute, copy or
>alter this email. Any views or opinions presented in this email are
>solely those of the author and might not represent those of Physician
>Select Management. Warning: Although Physician Select Management has
>taken reasonable precautions to ensure no viruses are present in this
>email, the company cannot accept responsibility for any loss or damage
>arising from the use of this email or attachments.


More information about the Users mailing list