[ClusterLabs] Antw: Trying to Understanding crm-fence-peer.sh

Wed Jan 16 10:36:25 EST 2019

On Wed, Jan 16, 2019 at 04:07:36PM +0100, Ulrich Windl wrote:
> Hi!
> 
> I guess we need more logs; especially some events from storage2 before fencing
> is triggered.
> 
> Regards,
> Ulrich

Here are the rest of the logs, starting from the time that I issued the
reboot command, to the end of the fencing attempt.

Thanks,
Bryan
-------------- next part --------------
Jan 11 08:49:52 storage2 crmd[13173]:  notice: Result of notify operation for StorageCluster on storage2: 0 (ok)
Jan 11 08:49:52 storage2 kernel: block drbd1: peer( Primary -> Secondary ) 
Jan 11 08:49:52 storage2 IPaddr2(iscsiMillipedeIP)[15245]: INFO: Adding inet address 10.40.2.101/32 with broadcast address 10.40.1.255 to device enp179s0f0
Jan 11 08:49:52 storage2 IPaddr2(iscsiCentipedeIP)[15246]: INFO: Adding inet address 10.40.1.101/32 with broadcast address 10.40.2.255 to device enp179s0f1
Jan 11 08:49:52 storage2 IPaddr2(iscsiMillipedeIP)[15245]: INFO: Bringing device enp179s0f0 up
Jan 11 08:49:52 storage2 IPaddr2(iscsiCentipedeIP)[15246]: INFO: Bringing device enp179s0f1 up
Jan 11 08:49:52 storage2 IPaddr2(iscsiMillipedeIP)[15245]: INFO: /usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.40.2.101 enp179s0f0 10.40.2.101 auto not_used not_used
Jan 11 08:49:52 storage2 IPaddr2(iscsiCentipedeIP)[15246]: INFO: /usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.40.1.101 enp179s0f1 10.40.1.101 auto not_used not_used
Jan 11 08:49:52 storage2 crmd[13173]:  notice: Result of start operation for iscsiMillipedeIP on storage2: 0 (ok)
Jan 11 08:49:52 storage2 crmd[13173]:  notice: Result of start operation for iscsiCentipedeIP on storage2: 0 (ok)
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of notify operation for StorageCluster on storage2: 0 (ok)
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of notify operation for StorageCluster on storage2: 0 (ok)
Jan 11 08:49:53 storage2 kernel: drbd r0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) 
Jan 11 08:49:53 storage2 kernel: drbd r0: ack_receiver terminated
Jan 11 08:49:53 storage2 kernel: drbd r0: Terminating drbd_a_r0
Jan 11 08:49:53 storage2 kernel: drbd r0: Connection closed
Jan 11 08:49:53 storage2 kernel: drbd r0: conn( TearDown -> Unconnected ) 
Jan 11 08:49:53 storage2 kernel: drbd r0: receiver terminated
Jan 11 08:49:53 storage2 kernel: drbd r0: Restarting receiver thread
Jan 11 08:49:53 storage2 kernel: drbd r0: receiver (re)started
Jan 11 08:49:53 storage2 kernel: drbd r0: conn( Unconnected -> WFConnection ) 
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of notify operation for StorageCluster on storage2: 0 (ok)
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of notify operation for StorageCluster on storage2: 0 (ok)
Jan 11 08:49:53 storage2 kernel: drbd r0: helper command: /sbin/drbdadm fence-peer r0
Jan 11 08:49:53 storage2 crm-fence-peer.sh[15594]: DRBD_CONF=/etc/drbd.conf DRBD_DONT_WARN_ON_VERSION_MISMATCH=1 DRBD_MINOR=1 DRBD_PEER=storage1 DRBD_PEERS=storage1 DRBD_PEER_ADDRESS=192.168.0.2 DRBD_PEER_AF=ipv4 DRBD_RESOURCE=r0 UP_TO_DATE_NODES='' /usr/lib/drbd/crm-fence-peer.sh
Jan 11 08:49:53 storage2 crm-fence-peer.sh[15594]: INFO peer is reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-r0-StorageClusterClone'
Jan 11 08:49:53 storage2 kernel: drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 4 (0x400)
Jan 11 08:49:53 storage2 kernel: drbd r0: fence-peer helper returned 4 (peer was fenced)
Jan 11 08:49:53 storage2 kernel: drbd r0: pdsk( DUnknown -> Outdated ) 
Jan 11 08:49:53 storage2 kernel: block drbd1: role( Secondary -> Primary ) 
Jan 11 08:49:53 storage2 kernel: block drbd1: new current UUID 8193109A1958EDC1:6E65E262290A59E6:0525636210B40C9E:0524636210B40C9F
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of promote operation for StorageCluster on storage2: 0 (ok)
Jan 11 08:49:54 storage2 crmd[13173]:  notice: Result of notify operation for StorageCluster on storage2: 0 (ok)
Jan 11 08:49:54 storage2 crmd[13173]:  notice: Our peer on the DC (storage1) is dead
Jan 11 08:49:54 storage2 crmd[13173]:  notice: State transition S_NOT_DC -> S_ELECTION
Jan 11 08:49:54 storage2 crmd[13173]:  notice: State transition S_ELECTION -> S_INTEGRATION
Jan 11 08:49:54 storage2 attrd[13171]:  notice: Node storage1 state is now lost
Jan 11 08:49:54 storage2 attrd[13171]:  notice: Removing all storage1 attributes for peer loss
Jan 11 08:49:54 storage2 attrd[13171]:  notice: Lost attribute writer storage1
Jan 11 08:49:54 storage2 attrd[13171]:  notice: Purged 1 peer with id=1 and/or uname=storage1 from the membership cache
Jan 11 08:49:54 storage2 stonith-ng[13169]:  notice: Node storage1 state is now lost
Jan 11 08:49:54 storage2 stonith-ng[13169]:  notice: Purged 1 peer with id=1 and/or uname=storage1 from the membership cache
Jan 11 08:49:54 storage2 cib[13168]:  notice: Node storage1 state is now lost
Jan 11 08:49:54 storage2 cib[13168]:  notice: Purged 1 peer with id=1 and/or uname=storage1 from the membership cache
Jan 11 08:49:54 storage2 corosync[13141]: [TOTEM ] A new membership (192.168.0.3:1280) was formed. Members left: 1
Jan 11 08:49:54 storage2 corosync[13141]: [QUORUM] Members[1]: 2
Jan 11 08:49:54 storage2 corosync[13141]: [MAIN  ] Completed service synchronization, ready to provide service.
Jan 11 08:49:54 storage2 pacemakerd[13167]:  notice: Node storage1 state is now lost
Jan 11 08:49:54 storage2 crmd[13173]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check
Jan 11 08:49:54 storage2 crmd[13173]:  notice: Node storage1 state is now lost
Jan 11 08:49:54 storage2 crmd[13173]: warning: No reason to expect node 1 to be down
Jan 11 08:49:54 storage2 crmd[13173]:  notice: Stonith/shutdown of storage1 not matched
Jan 11 08:49:54 storage2 ntpd[10808]: Listen normally on 12 enp179s0f0 10.40.2.101 UDP 123
Jan 11 08:49:54 storage2 ntpd[10808]: Listen normally on 13 enp179s0f1 10.40.1.101 UDP 123
Jan 11 08:49:55 storage2 pengine[13172]:  notice:  * Start      StorageClusterFS     ( storage2 )
Jan 11 08:49:55 storage2 pengine[13172]:  notice: Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-2933.bz2
Jan 11 08:49:55 storage2 crmd[13173]:  notice: Initiating start operation StorageClusterFS_start_0 locally on storage2
Jan 11 08:49:55 storage2 Filesystem(StorageClusterFS)[15660]: INFO: Running start for /dev/drbd1 on /mnt/storage
Jan 11 08:49:55 storage2 kernel: XFS (drbd1): Mounting V5 Filesystem
Jan 11 08:49:55 storage2 kernel: XFS (drbd1): Ending clean mount
Jan 11 08:49:55 storage2 crmd[13173]:  notice: Result of start operation for StorageClusterFS on storage2: 0 (ok)
Jan 11 08:49:55 storage2 crmd[13173]:  notice: Initiating monitor operation StorageClusterFS_monitor_20000 locally on storage2
Jan 11 08:49:55 storage2 crmd[13173]:  notice: Transition 0 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2933.bz2): Complete
Jan 11 08:49:55 storage2 crmd[13173]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Jan 11 08:49:57 storage2 IPaddr2(iscsiMillipedeIP)[15245]: INFO: ARPING 10.40.2.101 from 10.40.2.101 enp179s0f0#012Sent 5 probes (5 broadcast(s))#012Received 0 response(s)
Jan 11 08:49:57 storage2 IPaddr2(iscsiCentipedeIP)[15246]: INFO: ARPING 10.40.1.101 from 10.40.1.101 enp179s0f1#012Sent 5 probes (5 broadcast(s))#012Received 0 response(s)