[Pacemaker] Pacemaker very often STONITHs other node

Mon Nov 25 17:43:19 UTC 2013

W dniu 25.11.2013 18:25, Digimer pisze:
> I'd like to see the full logs, starting from a little before the issue
> started.
>

Here are logs since Nov 17 until Nov 24 (my pastebin is too small to 
handle them):

Node A - https://www.dropbox.com/sh/dj08fbckj9zo104/Ew1QpdRq9A/A.log
Node B - https://www.dropbox.com/sh/dj08fbckj9zo104/p9ldlBkGkG/B.log

> It looks though like, for whatever reason, a stop was called, failed, so
> the node was fenced. This would mean that congestion, as you suggested,
> is not the likely cause.
>
> Out of curiosity though; what bonding mode are you using? My testing
> showed that only mode=1 was reliable. Since I tested, corosync added
> support for mode=0 and mode=2, but I've not re-tested them. When I was
> doing my bonding tests, I found all other modes to break communications
> in some manner of use or failure/recovery testing.
>
>

I use 802.3ad mode (so it is mode 4):

auto bond0
iface bond0 inet static
         slaves eth4 eth5
         bond-mode 802.3ad
         bond-lacp_rate fast
         bond-miimon 100
         bond-downdelay 200
         bond-updelay 200
         address 10.0.0.1
         netmask 255.255.255.0
         broadcast 10.0.0.255

Do you think that it could be the reason - I mean wrong mode and some 
communication issues because of that?

Thank you once more!

-- 
Michał Margula, alchemyx at uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]