[Pacemaker] why pacemaker does not control the resources
Andrew Beekhof
andrew at beekhof.net
Sun Nov 10 18:41:36 EST 2013
On 8 Nov 2013, at 7:49 am, Andrey Groshev <greenx at yandex.ru> wrote:
> Hi, PPL!
> I need help. I do not understand... Why has stopped working.
> This configuration work on other cluster, but on corosync1.
>
> So... cluster postgres with master/slave.
> Classic config as in wiki.
> I build cluster, start, he is working.
> Next I kill postgres on Master with 6 signal, as if "disk space left"
>
> # pkill -6 postgres
> # ps axuww|grep postgres
> root 9032 0.0 0.1 103236 860 pts/0 S+ 00:37 0:00 grep postgres
>
> PostgreSQL die, But crm_mon shows that the master is still running.
>
> Last updated: Fri Nov 8 00:42:08 2013
> Last change: Fri Nov 8 00:37:05 2013 via crm_attribute on dev-cluster2-node4
> Stack: corosync
> Current DC: dev-cluster2-node4 (172793107) - partition with quorum
> Version: 1.1.10-1.el6-368c726
> 3 Nodes configured
> 7 Resources configured
>
>
> Node dev-cluster2-node2 (172793105): online
> pingCheck (ocf::pacemaker:ping): Started
> pgsql (ocf::heartbeat:pgsql): Started
> Node dev-cluster2-node3 (172793106): online
> pingCheck (ocf::pacemaker:ping): Started
> pgsql (ocf::heartbeat:pgsql): Started
> Node dev-cluster2-node4 (172793107): online
> pgsql (ocf::heartbeat:pgsql): Master
> pingCheck (ocf::pacemaker:ping): Started
> VirtualIP (ocf::heartbeat:IPaddr2): Started
>
> Node Attributes:
> * Node dev-cluster2-node2:
> + default_ping_set : 100
> + master-pgsql : -INFINITY
> + pgsql-data-status : STREAMING|ASYNC
> + pgsql-status : HS:async
> * Node dev-cluster2-node3:
> + default_ping_set : 100
> + master-pgsql : -INFINITY
> + pgsql-data-status : STREAMING|ASYNC
> + pgsql-status : HS:async
> * Node dev-cluster2-node4:
> + default_ping_set : 100
> + master-pgsql : 1000
> + pgsql-data-status : LATEST
> + pgsql-master-baseline : 0000000002000078
> + pgsql-status : PRI
>
> Migration summary:
> * Node dev-cluster2-node4:
> * Node dev-cluster2-node2:
> * Node dev-cluster2-node3:
>
> Tickets:
>
> CONFIG:
> node $id="172793105" dev-cluster2-node2. \
> attributes pgsql-data-status="STREAMING|ASYNC" standby="false"
> node $id="172793106" dev-cluster2-node3. \
> attributes pgsql-data-status="STREAMING|ASYNC" standby="false"
> node $id="172793107" dev-cluster2-node4. \
> attributes pgsql-data-status="LATEST"
> primitive VirtualIP ocf:heartbeat:IPaddr2 \
> params ip="10.76.157.194" \
> op start interval="0" timeout="60s" on-fail="stop" \
> op monitor interval="10s" timeout="60s" on-fail="restart" \
> op stop interval="0" timeout="60s" on-fail="block"
> primitive pgsql ocf:heartbeat:pgsql \
> params pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" tmpdir="/tmp/pg" start_opt="-p 5432" logfile="/var/lib/pgsql/9.1//pgstartup.log" rep_mode="async" node_list=" dev-cluster2-node2. dev-cluster2-node3. dev-cluster2-node4. " restore_command="gzip -cd /var/backup/pitr/dev-cluster2-master#5432/xlog/%f.gz > %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="10.76.157.194" \
> op start interval="0" timeout="60s" on-fail="restart" \
> op monitor interval="5s" timeout="61s" on-fail="restart" \
> op monitor interval="1s" role="Master" timeout="62s" on-fail="restart" \
> op promote interval="0" timeout="63s" on-fail="restart" \
> op demote interval="0" timeout="64s" on-fail="stop" \
> op stop interval="0" timeout="65s" on-fail="block" \
> op notify interval="0" timeout="66s"
> primitive pingCheck ocf:pacemaker:ping \
> params name="default_ping_set" host_list="10.76.156.1" multiplier="100" \
> op start interval="0" timeout="60s" on-fail="restart" \
> op monitor interval="10s" timeout="60s" on-fail="restart" \
> op stop interval="0" timeout="60s" on-fail="ignore"
> ms msPostgresql pgsql \
> meta master-max="1" master-node-max="1" clone-node-max="1" notify="true" target-role="Master" clone-max="3"
> clone clnPingCheck pingCheck \
> meta clone-max="3"
> location l0_DontRunPgIfNotPingGW msPostgresql \
> rule $id="l0_DontRunPgIfNotPingGW-rule" -inf: not_defined default_ping_set or default_ping_set lt 100
> colocation r0_StartPgIfPingGW inf: msPostgresql clnPingCheck
> colocation r1_MastersGroup inf: VirtualIP msPostgresql:Master
> order rsc_order-1 0: clnPingCheck msPostgresql
> order rsc_order-2 0: msPostgresql:promote VirtualIP:start symmetrical=false
> order rsc_order-3 0: msPostgresql:demote VirtualIP:stop symmetrical=false
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-1.el6-368c726" \
> cluster-infrastructure="corosync" \
> stonith-enabled="false" \
> no-quorum-policy="stop"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="INFINITY" \
> migration-threshold="1"
>
>
>
>
> Tell me where to look - why does pacemaker not work?
You might want to follow some of the steps at:
http://blog.clusterlabs.org/blog/2013/debugging-pacemaker/
under the heading "Resource-level failures".
'crm_mon -o' might be a good source of information too.
More information about the Pacemaker
mailing list