[Pacemaker] fence-peer helper broken, returned 1

Patrick Zwahlen paz at navixia.com
Mon Feb 22 05:23:37 EST 2010


Hi,

I have a simple active/passive Corosync/DRBD/XFS/NFS cluster (see
http://thread.gmane.org/gmane.linux.highavailability.pacemaker/4672 for
config details).

I have made some more failover tests, and I see some errors in one case
that I would like to share.

Initial situation:
Node 1 is DRBD master, XFS mounted, NFS started
Node 2 is DRBD slave, Pacemaker DC

If I power off node 2, I get the following logs (filtered on
drbd/fencing):

Feb 22 11:03:20 tnfsa kernel: block drbd0: PingAck did not arrive in
time.
Feb 22 11:03:20 tnfsa kernel: block drbd0: peer( Secondary -> Unknown )
conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) 
Feb 22 11:03:20 tnfsa kernel: block drbd0: asender terminated
Feb 22 11:03:20 tnfsa kernel: block drbd0: Terminating asender thread
Feb 22 11:03:20 tnfsa kernel: block drbd0: short read expecting header
on sock: r=-512
Feb 22 11:03:20 tnfsa kernel: block drbd0: Creating new current UUID
Feb 22 11:03:20 tnfsa kernel: block drbd0: Connection closed
Feb 22 11:03:20 tnfsa kernel: block drbd0: helper command: /sbin/drbdadm
fence-peer minor-0
Feb 22 11:03:20 tnfsa crm-fence-peer.sh[11711]: invoked for nfs
Feb 22 11:03:50 tnfsa crm-fence-peer.sh[11711]: Call cib_create failed
(-41): Remote node did not respond
Feb 22 11:03:50 tnfsa crm-fence-peer.sh[11711]: <null>
Feb 22 11:03:50 tnfsa crm-fence-peer.sh[11711]: WARNING could not place
the constraint!
Feb 22 11:03:50 tnfsa kernel: block drbd0: helper command: /sbin/drbdadm
fence-peer minor-0 exit code 1 (0x100)
Feb 22 11:03:50 tnfsa kernel: block drbd0: fence-peer helper broken,
returned 1
Feb 22 11:03:50 tnfsa kernel: block drbd0: Considering state change from
bad state. Error would be: 'Refusing to be Primary while peer is not
outdated'
Feb 22 11:03:50 tnfsa kernel: block drbd0:  old = { cs:NetworkFailure
ro:Primary/Unknown ds:UpToDate/DUnknown r--- }
Feb 22 11:03:50 tnfsa kernel: block drbd0:  new = { cs:Unconnected
ro:Primary/Unknown ds:UpToDate/DUnknown r--- }
Feb 22 11:03:50 tnfsa kernel: block drbd0: conn( NetworkFailure ->
Unconnected ) 
Feb 22 11:03:50 tnfsa kernel: block drbd0: receiver terminated
Feb 22 11:03:50 tnfsa kernel: block drbd0: Restarting receiver thread
Feb 22 11:03:50 tnfsa kernel: block drbd0: receiver (re)started
Feb 22 11:03:50 tnfsa kernel: block drbd0: Considering state change from
bad state. Error would be: 'Refusing to be Primary while peer is not
outdated'
Feb 22 11:03:50 tnfsa kernel: block drbd0:  old = { cs:Unconnected
ro:Primary/Unknown ds:UpToDate/DUnknown r--- }
Feb 22 11:03:50 tnfsa kernel: block drbd0:  new = { cs:WFConnection
ro:Primary/Unknown ds:UpToDate/DUnknown r--- }
Feb 22 11:03:50 tnfsa kernel: block drbd0: conn( Unconnected ->
WFConnection )

If I later check the config, I can see that the fencing 'location'
constraint isn't there. I am not sure it is a big deal, but wanted to
share and have you insight. This only happens when I power off the
current DC, and a constraint has to be placed by the surviving node.

Thanks a ton, - Patrick -

**************************************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager. postmaster at navixia.com
**************************************************************************************




More information about the Pacemaker mailing list