[Pacemaker] Postgres RA won't start

Amar Prasovic amar at linux.org.ba
Tue Oct 11 10:10:24 EDT 2011


Hello everyone,

I tried to configure postgres RA and I ran into some problems.

I configured several resources in my cluster config where pgsql was set to
run last, after DRBD, Filesystem, IPAddr2 and nginx.

Here is how it looks like in crm configure:

crm(live)configure# show
node webnode01 \
        attributes standby="off"
node webnode02 \
        attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="192.168.10.80" cidr_netmask="32" \
        op monitor interval="30s"
primitive drbd_res ocf:linbit:drbd \
        params drbd_resource="yorxs" \
        op monitor interval="60s" \
        op start interval="0s" timeout="240s" \
        op stop interval="0s" timeout="100s"
primitive fs_res ocf:heartbeat:Filesystem \
        params device="/dev/drbd1" directory="/srv" fstype="ext4" \
        op start interval="0s" timeout="60s" \
        op stop interval="0s" timeout="60s" \
        op monitor interval="60s" timeout="40s"
primitive nginx_res ocf:heartbeat:nginx \
        params configfile="/etc/nginx/nginx.conf"
httpd="/usr/local/sbin/nginx" status10url="http:/127.0.0.1" \
        op monitor interval="10s" timeout="30s" \
        op start interval="0" timeout="40s" \
        op stop interval="0" timeout="60s"
primitive postgres_res ocf:heartbeat:pgsql \
        params psql="/bin/psql" pgdata="/var/lib/postgres/8.4/main"
logfile="/var/log/postgres/postgres.log" \
        op start interval="0" timeout="120s" \
        op stop interval="0" timeout="120s" \
        op monitor interval="30s" timeout="30s"
group cluster_1 fs_res ClusterIP nginx_res postgres_res
ms drbd_cluster drbd_res \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location prefer_webnode01 cluster_1 50: webnode01
location prefer_webnode01_drbd drbd_cluster 50: webnode01
colocation cluster_1_on_drbd inf: cluster_1 drbd_cluster:Master
order cluster_1_after_drbd inf: drbd_cluster:promote cluster_1:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1318326771"

However, when I run this config, everything except for pgsql starts without
problems. For pgsql, I got the following error:

in crm_mon
Online: [ webnode02 webnode01 ]

 Master/Slave Set: drbd_cluster
     Masters: [ webnode01 ]
     Slaves: [ webnode02 ]
 Resource Group: cluster_1
     fs_res     (ocf::heartbeat:Filesystem):    Started webnode01
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started webnode01
     nginx_res  (ocf::heartbeat:nginx):    Started webnode01
     postgres_res       (ocf::heartbeat:pgsql): Stopped

Failed actions:
    postgres_res_start_0 (node=webnode01, call=84, rc=5, status=complete):
not installed
    postgres_res_start_0 (node=webnode02, call=66, rc=5, status=complete):
not installed

in /var/log/syslog
webnode01 log # cat syslog |grep postgres_res
Oct 11 11:39:34 webnode01 crmd: [921]: info: do_lrm_rsc_op: Performing
key=6:93:7:933bf2ab-00d0-435c-a24f-85897e0c9725 op=postgres_res_monitor_0 )
Oct 11 11:39:34 webnode01 lrmd: [914]: info: rsc:postgres_res:27: probe
Oct 11 11:39:34 webnode01 crmd: [921]: info: process_lrm_event: LRM
operation postgres_res_monitor_0 (call=27, rc=7, cib-update=36,
confirmed=true) not running
Oct 11 11:39:50 webnode01 crmd: [921]: info: do_lrm_rsc_op: Performing
key=39:96:0:933bf2ab-00d0-435c-a24f-85897e0c9725 op=postgres_res_start_0 )
Oct 11 11:39:50 webnode01 lrmd: [914]: info: rsc:postgres_res:39: start
Oct 11 11:39:50 webnode01 crmd: [921]: info: process_lrm_event: LRM
operation postgres_res_start_0 (call=39, rc=5, cib-update=47,
confirmed=true) not installed
Oct 11 11:39:50 webnode01 attrd: [918]: info: find_hash_entry: Creating hash
entry for fail-count-postgres_res
Oct 11 11:39:50 webnode01 attrd: [918]: info: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-postgres_res (INFINITY)
Oct 11 11:39:50 webnode01 attrd: [918]: info: attrd_perform_update: Sent
update 63: fail-count-postgres_res=INFINITY
Oct 11 11:39:50 webnode01 attrd: [918]: info: find_hash_entry: Creating hash
entry for last-failure-postgres_res
Oct 11 11:39:50 webnode01 attrd: [918]: info: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-postgres_res (1318325990)
Oct 11 11:39:50 webnode01 attrd: [918]: info: attrd_perform_update: Sent
update 66: last-failure-postgres_res=1318325990
Oct 11 11:39:50 webnode01 crmd: [921]: info: do_lrm_rsc_op: Performing
key=4:97:0:933bf2ab-00d0-435c-a24f-85897e0c9725 op=postgres_res_stop_0 )
Oct 11 11:39:50 webnode01 lrmd: [914]: info: rsc:postgres_res:40: stop
Oct 11 11:39:50 webnode01 crmd: [921]: info: process_lrm_event: LRM
operation postgres_res_stop_0 (call=40, rc=0, cib-update=49, confirmed=true)
ok

Additional info:

/etc/postgresql, /etc/postgresql-common and /var/lib/postgresql are symlinks
on both nodes. Actual directories are on shared DRBD disk.
Postgres starts without any problems with init script. On both nodes.

Thanks a lot in advance for any advice.

-- 
Amar Prasovic
Gaißacher Straße 17
D - 81371 München
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111011/9786a733/attachment-0002.html>


More information about the Pacemaker mailing list