[Pacemaker] why pacemaker does not control the resources

Andrey Groshev greenx at yandex.ru
Thu Nov 7 20:49:17 UTC 2013


Hi, PPL!
I need help. I do not understand... Why has stopped working.
This configuration work on other cluster, but on corosync1.

So... cluster postgres with master/slave.
Classic config as in wiki.
I build cluster, start, he is working.
Next I kill postgres on Master with 6 signal, as if "disk space left"

# pkill -6 postgres
# ps axuww|grep postgres
root      9032  0.0  0.1 103236   860 pts/0    S+   00:37   0:00 grep postgres 

PostgreSQL die, But crm_mon shows that the master is still running.

Last updated: Fri Nov  8 00:42:08 2013
Last change: Fri Nov  8 00:37:05 2013 via crm_attribute on dev-cluster2-node4
Stack: corosync
Current DC: dev-cluster2-node4 (172793107) - partition with quorum
Version: 1.1.10-1.el6-368c726
3 Nodes configured
7 Resources configured


Node dev-cluster2-node2 (172793105): online
        pingCheck       (ocf::pacemaker:ping):  Started
        pgsql   (ocf::heartbeat:pgsql): Started
Node dev-cluster2-node3 (172793106): online
        pingCheck       (ocf::pacemaker:ping):  Started
        pgsql   (ocf::heartbeat:pgsql): Started
Node dev-cluster2-node4 (172793107): online
        pgsql   (ocf::heartbeat:pgsql): Master
        pingCheck       (ocf::pacemaker:ping):  Started
        VirtualIP       (ocf::heartbeat:IPaddr2):       Started

Node Attributes:
* Node dev-cluster2-node2:
    + default_ping_set                  : 100
    + master-pgsql                      : -INFINITY 
    + pgsql-data-status                 : STREAMING|ASYNC
    + pgsql-status                      : HS:async  
* Node dev-cluster2-node3:
    + default_ping_set                  : 100
    + master-pgsql                      : -INFINITY 
    + pgsql-data-status                 : STREAMING|ASYNC
    + pgsql-status                      : HS:async  
* Node dev-cluster2-node4:
    + default_ping_set                  : 100
    + master-pgsql                      : 1000
    + pgsql-data-status                 : LATEST    
    + pgsql-master-baseline             : 0000000002000078
    + pgsql-status                      : PRI

Migration summary:
* Node dev-cluster2-node4: 
* Node dev-cluster2-node2: 
* Node dev-cluster2-node3: 

Tickets:

CONFIG:
node $id="172793105" dev-cluster2-node2. \
        attributes pgsql-data-status="STREAMING|ASYNC" standby="false"
node $id="172793106" dev-cluster2-node3. \
        attributes pgsql-data-status="STREAMING|ASYNC" standby="false"
node $id="172793107" dev-cluster2-node4. \
        attributes pgsql-data-status="LATEST"
primitive VirtualIP ocf:heartbeat:IPaddr2 \
        params ip="10.76.157.194" \
        op start interval="0" timeout="60s" on-fail="stop" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0" timeout="60s" on-fail="block"
primitive pgsql ocf:heartbeat:pgsql \
        params pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" tmpdir="/tmp/pg" start_opt="-p 5432" logfile="/var/lib/pgsql/9.1//pgstartup.log" rep_mode="async" node_list=" dev-cluster2-node2. dev-cluster2-node3. dev-cluster2-node4. " restore_command="gzip -cd /var/backup/pitr/dev-cluster2-master#5432/xlog/%f.gz > %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="10.76.157.194" \
        op start interval="0" timeout="60s" on-fail="restart" \
        op monitor interval="5s" timeout="61s" on-fail="restart" \
        op monitor interval="1s" role="Master" timeout="62s" on-fail="restart" \
        op promote interval="0" timeout="63s" on-fail="restart" \
        op demote interval="0" timeout="64s" on-fail="stop" \
        op stop interval="0" timeout="65s" on-fail="block" \
        op notify interval="0" timeout="66s"
primitive pingCheck ocf:pacemaker:ping \
        params name="default_ping_set" host_list="10.76.156.1" multiplier="100" \
        op start interval="0" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0" timeout="60s" on-fail="ignore"
ms msPostgresql pgsql \
        meta master-max="1" master-node-max="1" clone-node-max="1" notify="true" target-role="Master" clone-max="3"
clone clnPingCheck pingCheck \
        meta clone-max="3"
location l0_DontRunPgIfNotPingGW msPostgresql \
        rule $id="l0_DontRunPgIfNotPingGW-rule" -inf: not_defined default_ping_set or default_ping_set lt 100
colocation r0_StartPgIfPingGW inf: msPostgresql clnPingCheck
colocation r1_MastersGroup inf: VirtualIP msPostgresql:Master
order rsc_order-1 0: clnPingCheck msPostgresql
order rsc_order-2 0: msPostgresql:promote VirtualIP:start symmetrical=false
order rsc_order-3 0: msPostgresql:demote VirtualIP:stop symmetrical=false
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-1.el6-368c726" \
        cluster-infrastructure="corosync" \
        stonith-enabled="false" \
        no-quorum-policy="stop"
rsc_defaults $id="rsc-options" \
        resource-stickiness="INFINITY" \
        migration-threshold="1"




Tell me where to look - why does pacemaker not work?




More information about the Pacemaker mailing list