[Pacemaker] Need explanation for start stonith behaviour

Tue May 28 07:44:45 EDT 2013

Hi all,

I've a two-node-cluster on a RHEL-clone (6.4, cman, pacemaker)
and I'm facing a startup behaviour I can't explain and therefore
hope, that you can enlight me.

- 2 nodes: N1 N2
- both nodes up
- everything is fine

Start:
- service pacemaker stop on N2
- all resources get migrated => OK
- all pacemaker and corosync related processes seem to be
shutdown correctly
- now service pacemaker stop on N1
- all resources seem to be stopped correctly
- all cluster stack processes seem to be stopped
correctly.

Scenario 1:
Let's start with the node which was stopped last.
- service pacemaker start on N1
- cluster stack gets started, we have to wait at topic
"joining fence domain"
- after timeout node gets started
- resources get started on that node
- now service pacemaker start  on N2
- cluster stack does come up
- resources started as requested by config
=> everything seems ok and straight forward

Scenario 2:
Don't start with the last node shut down but with
the node which was stopped first, therefore:
- service pacemaker start on N2
- cluster stack comes up seemingly the same way
as in scenario 1. A litte wait on topic "joining fence domain".

- And now the difference: Node N1 gets stonithed, which seems
ok for me as N2 wants to get sure that it is the one and only 
node in the cluster. (Is this interpretation right?)

Why is a stonith triggered in the one but not in the other
scenario? Insights really appreciated. Is there some knowledge
about the last cluster state made persistant?
Is it correct that the node N2 is not stonithed in scenario 1?

Thank you in advance.

Best regards
Andreas Mock