[Pacemaker] Fwd: Problem with monitor

Юлия Школьникова shkolnikova_yuli at mail.ru
Fri Nov 23 00:03:59 EST 2012

Hello, again. Why you didn't answer me? I so need your help!!

-------- Пересылаемое сообщение --------
От кого: Юлия Школьникова <shkolnikova_yuli at mail.ru>
Кому: pacemaker at oss.clusterlabs.org
Дата: Mon 19 Nov 2012 16:37:21
Тема: [Pacemaker] Problem with monitor

I configure master/slave cluster for postgresql 9.1 based on corosync и pacemaker.
I do it using this presentation: http://schedule2012.rmll.info/IMG/pdf/postgresql-9-0-ha.pdf.
Resource agent (pgsql-ms) for master/slave postgresql I took from this: https://github.com/roidelapluie/puppet-cluster. 
My nodes are node1 и node2.
My config file of pacemaker:
node node1
node node2
primitive DBIP ocf:heartbeat:IPaddr2 \
params nic="eth0" ip="" cidr_netmask="22" \
op monitor interval="30s" \
meta target-role="Started" is-managed="true"
primitive pgsql ocf:inuits:pgsql-ms \
op monitor interval="5s" role="Master" \
op monitor interval="10s" role="Slave" 
primitive ping ocf:pacemaker:ping \
params host_list="" \
op monitor interval="10s" timeout="10s" \
op start interval="0" timeout="45s"
ms pgsql-ms pgsql \
params pgsqlconfig="/var/lib/pgsql/9.1/data/postgresql.conf" lsb_script="/etc/init.d/postgresql-9.1" pgsqlrecovery="/var/lib/pgsql/9.1/data/recovery.conf" \
meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="true"
clone clone-ping ping \
meta globally-unique="false"
location connected PSQL \
rule $id="connected-rule" -inf: not_defined pingd or pingd lte 0
colocation ip_psql inf: PSQL pgsql-ms:Master
property $id="cib-bootstrap-options" \
dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
default-resource-stickiness="INFINITY" \
rsc_defaults $id="rsc_defaults-options" \
migration-threshold="INFINITY" \
failure-timeout="10" \

Then I try to test my cluster: 
1) If I switch off the master, then the slave becomes a new master as expected. This works fine and can be repeated many times 
2) But if I try to stop postgresql (to simulate a failure of postgresql) with command: service postgresql-9.1 stop, the following occurs:
Given node1 is master, node2 is slave. 
On the node1 I run "service postgresql-9.1 stop" and the node2 becomes the master.
Now, on the node2 I run "service postgresql-9.1 stop" and the node1 becomes the master again.
At this time a monitoring of my resource on node1 stops, and the following entry appears in the log:

node1 crmd[1362]: info: process_lrm_event: LRM operation pgsql:0_monitor_10000 (call=33, status=1, cib-update=0, confirmed=true) Cancelled

Now if I run "service postgresql-9.1 stop" on the node1, pacemaker doesn't see that postgresql have stopped and doesn't try to restart it 
and promote node2 to master.
If I run "crm resource reprobe" montor action resumes to work.
I can not understand why the operation monitor stops working. Please, help me.

Shkolnikova Yulia. 


