[ClusterLabs] Postgres never promoted

Alexandre alxgomz at gmail.com
Mon Feb 23 08:25:07 UTC 2015


Hi Andrew,

Thanks for the answer.
I didn't use the wiki link you posted as I prefer streaming replication
(and usually I have no problem with it). The page I used is
http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
I have tried today's latest git and cleanup the resource to trigger a new
promotion retry.

crm resource cleanup ms_pgsql; tail -f /var/log/messages

On node 1 where I would expect promotion the logs I get are:

Cleaning up pri_pgsql:0 on pp-obm-sgbd.upond.fr
Cleaning up pri_pgsql:0 on pp-obm-sgbd2.upond.fr
Cleaning up pri_pgsql:1 on pp-obm-sgbd.upond.fr
Cleaning up pri_pgsql:1 on pp-obm-sgbd2.upond.fr
Waiting for 4 replies from the CRMd.... OK

Feb 22 05:28:01 pp-obm-sgbd auditd[974]: Audit daemon rotating log files
Feb 23 09:13:30 pp-obm-sgbd crmd[21348]:   notice: crm_add_logfile:
Additional logging available in /var/log/pacemaker.log
Feb 23 09:13:30 pp-obm-sgbd stonith-ng[23067]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)
Feb 23 09:13:30 pp-obm-sgbd crmd[23071]:   notice: do_lrm_invoke: Not
creating resource for a delete event: (null)
Feb 23 09:13:30 pp-obm-sgbd stonith-ng[23067]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed, requesting a full refresh (-207)
Feb 23 09:13:31 pp-obm-sgbd pacemakerd[21363]:   notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO: Don't check
/var/lib/pgsql/9.1/data/ during probe
Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO: Master does not
exist.
Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO: My data status=.
Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO:
pp-obm-sgbd.upond.fr xlog location : 000000002D000078
Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO:
pp-obm-sgbd2.upond.fr xlog location : 000000002D000000
Feb 23 09:13:31 pp-obm-sgbd crmd[23071]:   notice: process_lrm_event:
Operation pri_pgsql_monitor_0: ok (node=pp-obm-sgbd.upond.fr, call=97,
rc=0, cib-update=94, confirmed=true)

On the second node I get the logs bellow:

Feb 23 09:13:30 pp-obm-sgbd2 stonith-ng[19622]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)
Feb 23 09:13:30 pp-obm-sgbd2 crmd[19626]:   notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Promote
pri_pgsql:1#011(Slave -> Master pp-obm-sgbd.upond.fr)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
pri_vip#011(pp-obm-sgbd.upond.fr)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
Calculated Transition 301: /var/lib/pacemaker/pengine/pe-input-53.bz2
Feb 23 09:13:30 pp-obm-sgbd2 attrd[19624]:   notice: attrd_cs_dispatch:
Update relayed from pp-obm-sgbd.upond.fr
Feb 23 09:13:30 pp-obm-sgbd2 attrd[19624]:   notice: attrd_cs_dispatch:
Update relayed from pp-obm-sgbd.upond.fr
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Feb 23 09:13:30 pp-obm-sgbd2 crmd[19626]:   notice: do_lrm_invoke: Not
creating resource for a delete event: (null)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
pri_pgsql:1#011(pp-obm-sgbd.upond.fr)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
Calculated Transition 302: /var/lib/pacemaker/pengine/pe-input-54.bz2
Feb 23 09:13:30 pp-obm-sgbd2 stonith-ng[19622]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed, requesting a full refresh (-207)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
pri_pgsql:0#011(pp-obm-sgbd.upond.fr)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
pri_pgsql:1#011(pp-obm-sgbd2.upond.fr)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
Calculated Transition 303: /var/lib/pacemaker/pengine/pe-input-55.bz2
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
pri_pgsql:0#011(pp-obm-sgbd.upond.fr)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
pri_pgsql:1#011(pp-obm-sgbd2.upond.fr)
Feb 23 09:13:30 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
Initiating action 5: monitor pri_pgsql:0_monitor_0 on pp-obm-sgbd.upond.fr
Feb 23 09:13:30 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
Initiating action 7: monitor pri_pgsql:1_monitor_0 on pp-obm-sgbd2.upond.fr
(local)
Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
Calculated Transition 304: /var/lib/pacemaker/pengine/pe-input-55.bz2
Feb 23 09:13:31 pp-obm-sgbd2 pacemakerd[13234]:   notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Feb 23 09:13:31 pp-obm-sgbd2 pgsql(pri_pgsql)[13223]: INFO: Don't check
/var/lib/pgsql/9.1/data/ during probe
Feb 23 09:13:31 pp-obm-sgbd2 pgsql(pri_pgsql)[13223]: INFO: Master does not
exist.
Feb 23 09:13:31 pp-obm-sgbd2 pgsql(pri_pgsql)[13223]: WARNING: My data is
out-of-date. status=DISCONNECT
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: process_lrm_event:
Operation pri_pgsql_monitor_0: ok (node=pp-obm-sgbd2.upond.fr, call=95,
rc=0, cib-update=488, confirmed=true)
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:  warning: status_from_rc: Action
7 (pri_pgsql:1_monitor_0) on pp-obm-sgbd2.upond.fr failed (target: 7 vs.
rc: 0): Error
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: abort_transition_graph:
Transition aborted by pri_pgsql_monitor_0 'create' on (null): Event failed
(magic=0:0;7:304:7:ec2cd9b1-9c7e-46cd-b043-604d6399a62e, cib=2.47.2,
source=match_graph_event:344, 0)
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:  warning: status_from_rc: Action
7 (pri_pgsql:1_monitor_0) on pp-obm-sgbd2.upond.fr failed (target: 7 vs.
rc: 0): Error
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
Initiating action 6: probe_complete probe_complete-pp-obm-sgbd2.upond.fr on
pp-obm-sgbd2.upond.fr (local) - no waiting
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:  warning: status_from_rc: Action
5 (pri_pgsql:0_monitor_0) on pp-obm-sgbd.upond.fr failed (target: 7 vs. rc:
0): Error
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:  warning: status_from_rc: Action
5 (pri_pgsql:0_monitor_0) on pp-obm-sgbd.upond.fr failed (target: 7 vs. rc:
0): Error
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
Initiating action 4: probe_complete probe_complete-pp-obm-sgbd.upond.fr on
pp-obm-sgbd.upond.fr - no waiting
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: run_graph: Transition
304 (Complete=5, Pending=0, Fired=0, Skipped=5, Incomplete=1,
Source=/var/lib/pacemaker/pengine/pe-input-55.bz2): Stopped
Feb 23 09:13:31 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Feb 23 09:13:31 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Promote
pri_pgsql:1#011(Slave -> Master pp-obm-sgbd.upond.fr)
Feb 23 09:13:31 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
pri_vip#011(pp-obm-sgbd.upond.fr)
Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
Initiating action 8: monitor pri_pgsql_monitor_20000 on
pp-obm-sgbd2.upond.fr (local)
Feb 23 09:13:31 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
Calculated Transition 305: /var/lib/pacemaker/pengine/pe-input-56.bz2
Feb 23 09:13:31 pp-obm-sgbd2 pacemakerd[13359]:   notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Feb 23 09:13:32 pp-obm-sgbd2 pgsql(pri_pgsql)[13348]: INFO: Master does not
exist.
Feb 23 09:13:32 pp-obm-sgbd2 pgsql(pri_pgsql)[13348]: WARNING: My data is
out-of-date. status=DISCONNECT
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: process_lrm_event:
Operation pri_pgsql_monitor_20000: ok (node=pp-obm-sgbd2.upond.fr, call=96,
rc=0, cib-update=490, confirmed=false)
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:  warning: run_graph: Transition
305 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=6,
Source=/var/lib/pacemaker/pengine/pe-input-56.bz2): Terminated
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:  warning: te_graph_trigger:
Transition failed: terminated
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_graph: Graph 305
with 7 actions: batch-limit=7 jobs, network-delay=0ms
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action    8]: Completed rsc op pri_pgsql_monitor_20000           on
pp-obm-sgbd2.upond.fr (priority: 0, waiting: none)
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   12]: Pending rsc op pri_pgsql_monitor_15000             on
pp-obm-sgbd.upond.fr (priority: 0, waiting:  11)
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   11]: Pending rsc op pri_pgsql_promote_0                 on
pp-obm-sgbd.upond.fr (priority: 0, waiting:  17)
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   18]: Pending pseudo op ms_pgsql_promoted_0              on N/A
(priority: 1000000, waiting:  11)
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   17]: Pending pseudo op ms_pgsql_promote_0               on N/A
(priority: 0, waiting:  21)
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   22]: Pending rsc op pri_vip_monitor_10000               on
pp-obm-sgbd.upond.fr (priority: 0, waiting:  21)
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   21]: Pending rsc op pri_vip_start_0                     on
pp-obm-sgbd.upond.fr (priority: 0, waiting:  18)
Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]

So, so far noluck. I am really lost.
I it normal to have the crmd log line stating: "Pending pseudo op
ms_pgsql_promoted_0              on N/A (priority: 1000000, waiting:  11)"?
What does it mean?
Am I right thinking it means the cluster doesn't know where to promote the
MS resource? Or is it just normal because it appears only on the node that
should not be promoted?

Should you need any frurther informations just ask, i'll be happy to
provide.

In other words: I am lost. Thanks for any heads up.


2015-02-22 20:43 GMT+01:00 Andrew Beekhof <andrew at beekhof.net>:

>
> > On 21 Feb 2015, at 6:59 am, Alexandre <alxgomz at gmail.com> wrote:
> >
> > Hi list,
> >
> > I am facing a very strange issue.
> > I have setup a postgresql cluster (with streaming repl).
> > The replication works ok when started manually but the RA seems to never
> promote any host where the resource is started.
> >
> > I am running pacemaker 1.12 on centos 6.6 (and I added crmsh from an
> opensuse repo, as I am used to it)
>
> It might be worth grabbing the very latest pgsql agent from
> https://github.com/ClusterLabs/resource-agents
> You've also compared your setup to the one described at
> http://clusterlabs.org/wiki/PostgresHowto ?
>
> >
> > my config is bellow:
> > node pp-obm-sgbd.upond.fr
> > node pp-obm-sgbd2.upond.fr \
> >     attributes pri_pgsql-data-status=DISCONNECT
> > primitive pri_obm-locator lsb:obm-locator \
> >     params \
> >     op start interval=0s timeout=60s \
> >     op stop interval=0s timeout=60s \
> >     op monitor interval=10s timeout=20s
> > primitive pri_pgsql pgsql \
> >     params pgctl="/usr/pgsql-9.1/bin/pg_ctl"
> psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data/" node_list="
> pp-obm-sgbd.upond.fr pp-obm-sgbd2.upond.fr" repuser=replication
> rep_mode=sync restart_on_promote=true restore_command="cp
> /var/lib/pgsql/replication/%f %p" primary_conninfo_opt="keepalives_idle=60
> keepalives_interval=5 keepalives_count=5" master_ip=193.50.151.200 \
> >     op start interval=0 on-fail=restart timeout=120s \
> >     op monitor interval=20s on-fail=restart timeout=60s \
> >     op monitor interval=15s on-fail=restart role=Master timeout=60s \
> >     op promote interval=0 on-fail=restart timeout=120s \
> >     op demote interval=0 on-fail=stop timeout=120s \
> >     op notify interval=0s timeout=60s \
> >     op stop interval=0 on-fail=block timeout=120s
> > primitive pri_vip IPaddr2 \
> >     params ip=193.50.151.200 nic=eth1 cidr_netmask=32 \
> >     op start interval=0s timeout=60s \
> >     op monitor interval=10s timeout=60s \
> >     op stop interval=0s timeout=60s
> > ms ms_pgsql pri_pgsql \
> >     meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> > colocation clc_vip-ms_pgsql inf: pri_vip:Started ms_pgsql:Master
> > order ord_dm_pgsql-vip 0: ms_pgsql:demote pri_vip:stop
> > order ord_pm_pgsql-vip 0: ms_pgsql:promote pri_vip:start
> symmetrical=false
> > property cib-bootstrap-options: \
> >     dc-version=1.1.11-97629de \
> >     cluster-infrastructure=cman \
> >     last-lrm-refresh=1424459378 \
> >     no-quorum-policy=ignore \
> >     stonith-enabled=false \
> >     maintenance-mode=false
> > rsc_defaults rsc_defaults-options: \
> >     resource-stickiness=1000 \
> >     migration-threshold=5
> >
> > crm_mon shows both hosts as slaves and none is never promoted ever:
> >
> > Master/Slave Set: ms_pgsql [pri_pgsql]
> >      Slaves: [ pp-obm-sgbd.upond.fr pp-obm-sgbd2.upond.fr ]
> > Node Attributes:
> > * Node pp-obm-sgbd.upond.fr:
> >     + master-pri_pgsql                  : 1000
> >     + pri_pgsql-status                  : HS:alone
> >     + pri_pgsql-xlog-loc                : 000000002D000078
> > * Node pp-obm-sgbd2.upond.fr:
> >     + master-pri_pgsql                  : -INFINITY
> >     + pri_pgsql-data-status             : DISCONNECT
> >     + pri_pgsql-status                  : HS:alone
> >     + pri_pgsql-xlog-loc                : 000000002D000000
> >
> > on the host I am expecting promotion I see when doing cleanups:
> > Feb 20 20:15:07 pp-obm-sgbd pgsql(pri_pgsql)[30994]: INFO: Master does
> not exist.
> > Feb 20 20:15:07 pp-obm-sgbd pgsql(pri_pgsql)[30994]: INFO: My data
> status=.
> >
> > And on the other node I see the following logs that sounds interrseting:
> > Feb 20 20:16:10 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action   18]: Pending pseudo op ms_pgsql_promoted_0              on N/A
> (priority: 1000000, waiting:  11)
> > Feb 20 20:16:10 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action   17]: Pending pseudo op ms_pgsql_promote_0               on N/A
> (priority: 0, waiting:  21)
> >
> > the N/A part seems to tell me the cluster don't know where to promote
> the resource but I can't understand why.
> >
> > bellow are my constraint rules:
> >
> > pcs constraint show
> > Location Constraints:
> > Ordering Constraints:
> >   demote ms_pgsql then stop pri_vip (score:0)
> >   promote ms_pgsql then start pri_vip (score:0) (non-symmetrical)
> > Colocation Constraints:
> >   pri_vip with ms_pgsql (score:INFINITY) (rsc-role:Started)
> (with-rsc-role:Master)
> >
> > I am now out of ideas so any help is very much appreciated.
> >
> > Regards.
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150223/466b6206/attachment-0002.html>


More information about the Users mailing list