[Pacemaker] can't reset a fail node with external/ssh stonith

Tue Nov 10 12:41:25 EST 2009

Hello,

I am using Pacemaker 1.0.6 Corosync on Debian Lenny. I have created ssh 
automatic login on both nodes. When I stop postgresql resource on psql01 
node, there is no attempt to reset the fail node from psql02 node.

The syslogs seem to be fine and there are no errors:
Nov 11 01:40:03 psql02 stonithd: [1742]: info: st-ssh:1 stonith resource 
started
Nov 11 01:25:30 psql01 stonithd: [1751]: info: st-ssh:0 stonith resource 
started

When I do "stonith -t external/ssh -p "psql01 psql02" -T reset psql02, it 
will reset the psql02 node.

What do I miss?   Thanks in advance.

Regards,
Ronny

My configuration as follows:
node psql01
node psql02
primitive failover-ip ocf:heartbeat:IPaddr \
        params ip="10.100.53.100" cidr_netmask="32" \
        op monitor interval="5s" \
        meta migration-threshold="1"
primitive failover-psql lsb:postgresql-8.3 \
        op monitor interval="30s" \
        op start interval="0" timeout="45s" \
        meta migration-threshold="1"
primitive st-ssh stonith:external/ssh \
        params hostlist="psql01 psql02"
clone fencing st-ssh
colocation psql-with-failover-ip inf: failover-psql failover-ip
order psql-after-failover-ip inf: failover-ip failover-psql
property $id="cib-bootstrap-options" \
        dc-version="1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="true" \
        default-resource-stickiness="0" \
        stonith-action="reboot"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

crm_mon:
============
Last updated: Wed Nov 11 01:44:27 2009
Stack: openais
Current DC: psql01 - partition with quorum
Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ psql01 psql02 ]

 Clone Set: fencing
     Started: [ psql01 psql02 ]
failover-psql   (lsb:postgresql-8.3):   Started psql02
failover-ip     (ocf::heartbeat:IPaddr):        Started psql02

Migration summary:
* Node psql02:
* Node psql01:
   failover-psql: migration-threshold=1 fail-count=1

Failed actions:
    failover-psql_monitor_30000 (node=psql01, call=9, rc=7, 
status=complete): not running

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: syslog-psql01.txt
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091111/84892ce1/attachment-0004.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: syslog-psql02.txt
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091111/84892ce1/attachment-0005.txt>