[ClusterLabs] Postgres never promoted

Thu Feb 26 07:07:34 EST 2015

I got no further response regarding this topic.
If I am asking at the wrong place just let me know.
In any case I think the questions bellow could be answered here:

I it normal to have the crmd log line stating: "Pending pseudo op
ms_pgsql_promoted_0              on N/A (priority: 1000000, waiting:  11)"?
What does it mean?

Am I right thinking it means the cluster doesn't know where to promote the
MS resource? Or is it just normal because it appears only on the node that
should not be promoted?

Thank you

2015-02-23 9:25 GMT+01:00 Alexandre <alxgomz at gmail.com>:

> Hi Andrew,
>
> Thanks for the answer.
> I didn't use the wiki link you posted as I prefer streaming replication
> (and usually I have no problem with it). The page I used is
> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
> I have tried today's latest git and cleanup the resource to trigger a new
> promotion retry.
>
> crm resource cleanup ms_pgsql; tail -f /var/log/messages
>
> On node 1 where I would expect promotion the logs I get are:
>
> Cleaning up pri_pgsql:0 on pp-obm-sgbd.upond.fr
> Cleaning up pri_pgsql:0 on pp-obm-sgbd2.upond.fr
> Cleaning up pri_pgsql:1 on pp-obm-sgbd.upond.fr
> Cleaning up pri_pgsql:1 on pp-obm-sgbd2.upond.fr
> Waiting for 4 replies from the CRMd.... OK
>
> Feb 22 05:28:01 pp-obm-sgbd auditd[974]: Audit daemon rotating log files
> Feb 23 09:13:30 pp-obm-sgbd crmd[21348]:   notice: crm_add_logfile:
> Additional logging available in /var/log/pacemaker.log
> Feb 23 09:13:30 pp-obm-sgbd stonith-ng[23067]:   notice:
> update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
> update diff failed (-206)
> Feb 23 09:13:30 pp-obm-sgbd crmd[23071]:   notice: do_lrm_invoke: Not
> creating resource for a delete event: (null)
> Feb 23 09:13:30 pp-obm-sgbd stonith-ng[23067]:   notice:
> update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
> update diff failed, requesting a full refresh (-207)
> Feb 23 09:13:31 pp-obm-sgbd pacemakerd[21363]:   notice: crm_add_logfile:
> Additional logging available in /var/log/cluster/corosync.log
> Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO: Don't check
> /var/lib/pgsql/9.1/data/ during probe
> Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO: Master does not
> exist.
> Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO: My data status=.
> Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO:
> pp-obm-sgbd.upond.fr xlog location : 000000002D000078
> Feb 23 09:13:31 pp-obm-sgbd pgsql(pri_pgsql)[21351]: INFO:
> pp-obm-sgbd2.upond.fr xlog location : 000000002D000000
> Feb 23 09:13:31 pp-obm-sgbd crmd[23071]:   notice: process_lrm_event:
> Operation pri_pgsql_monitor_0: ok (node=pp-obm-sgbd.upond.fr, call=97,
> rc=0, cib-update=94, confirmed=true)
>
> On the second node I get the logs bellow:
>
> Feb 23 09:13:30 pp-obm-sgbd2 stonith-ng[19622]:   notice:
> update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
> update diff failed (-206)
> Feb 23 09:13:30 pp-obm-sgbd2 crmd[19626]:   notice: do_state_transition:
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
> loss of CCM Quorum: Ignore
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Promote
> pri_pgsql:1#011(Slave -> Master pp-obm-sgbd.upond.fr)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
> pri_vip#011(pp-obm-sgbd.upond.fr)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
> Calculated Transition 301: /var/lib/pacemaker/pengine/pe-input-53.bz2
> Feb 23 09:13:30 pp-obm-sgbd2 attrd[19624]:   notice: attrd_cs_dispatch:
> Update relayed from pp-obm-sgbd.upond.fr
> Feb 23 09:13:30 pp-obm-sgbd2 attrd[19624]:   notice: attrd_cs_dispatch:
> Update relayed from pp-obm-sgbd.upond.fr
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
> loss of CCM Quorum: Ignore
> Feb 23 09:13:30 pp-obm-sgbd2 crmd[19626]:   notice: do_lrm_invoke: Not
> creating resource for a delete event: (null)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
> pri_pgsql:1#011(pp-obm-sgbd.upond.fr)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
> Calculated Transition 302: /var/lib/pacemaker/pengine/pe-input-54.bz2
> Feb 23 09:13:30 pp-obm-sgbd2 stonith-ng[19622]:   notice:
> update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
> update diff failed, requesting a full refresh (-207)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
> loss of CCM Quorum: Ignore
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
> pri_pgsql:0#011(pp-obm-sgbd.upond.fr)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
> pri_pgsql:1#011(pp-obm-sgbd2.upond.fr)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
> Calculated Transition 303: /var/lib/pacemaker/pengine/pe-input-55.bz2
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
> loss of CCM Quorum: Ignore
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
> pri_pgsql:0#011(pp-obm-sgbd.upond.fr)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
> pri_pgsql:1#011(pp-obm-sgbd2.upond.fr)
> Feb 23 09:13:30 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
> Initiating action 5: monitor pri_pgsql:0_monitor_0 on pp-obm-sgbd.upond.fr
> Feb 23 09:13:30 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
> Initiating action 7: monitor pri_pgsql:1_monitor_0 on
> pp-obm-sgbd2.upond.fr (local)
> Feb 23 09:13:30 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
> Calculated Transition 304: /var/lib/pacemaker/pengine/pe-input-55.bz2
> Feb 23 09:13:31 pp-obm-sgbd2 pacemakerd[13234]:   notice: crm_add_logfile:
> Additional logging available in /var/log/cluster/corosync.log
> Feb 23 09:13:31 pp-obm-sgbd2 pgsql(pri_pgsql)[13223]: INFO: Don't check
> /var/lib/pgsql/9.1/data/ during probe
> Feb 23 09:13:31 pp-obm-sgbd2 pgsql(pri_pgsql)[13223]: INFO: Master does
> not exist.
> Feb 23 09:13:31 pp-obm-sgbd2 pgsql(pri_pgsql)[13223]: WARNING: My data is
> out-of-date. status=DISCONNECT
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: process_lrm_event:
> Operation pri_pgsql_monitor_0: ok (node=pp-obm-sgbd2.upond.fr, call=95,
> rc=0, cib-update=488, confirmed=true)
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:  warning: status_from_rc: Action
> 7 (pri_pgsql:1_monitor_0) on pp-obm-sgbd2.upond.fr failed (target: 7 vs.
> rc: 0): Error
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice:
> abort_transition_graph: Transition aborted by pri_pgsql_monitor_0 'create'
> on (null): Event failed
> (magic=0:0;7:304:7:ec2cd9b1-9c7e-46cd-b043-604d6399a62e, cib=2.47.2,
> source=match_graph_event:344, 0)
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:  warning: status_from_rc: Action
> 7 (pri_pgsql:1_monitor_0) on pp-obm-sgbd2.upond.fr failed (target: 7 vs.
> rc: 0): Error
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
> Initiating action 6: probe_complete probe_complete-pp-obm-sgbd2.upond.fr
> on pp-obm-sgbd2.upond.fr (local) - no waiting
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:  warning: status_from_rc: Action
> 5 (pri_pgsql:0_monitor_0) on pp-obm-sgbd.upond.fr failed (target: 7 vs.
> rc: 0): Error
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:  warning: status_from_rc: Action
> 5 (pri_pgsql:0_monitor_0) on pp-obm-sgbd.upond.fr failed (target: 7 vs.
> rc: 0): Error
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
> Initiating action 4: probe_complete probe_complete-pp-obm-sgbd.upond.fr
> on pp-obm-sgbd.upond.fr - no waiting
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: run_graph: Transition
> 304 (Complete=5, Pending=0, Fired=0, Skipped=5, Incomplete=1,
> Source=/var/lib/pacemaker/pengine/pe-input-55.bz2): Stopped
> Feb 23 09:13:31 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
> loss of CCM Quorum: Ignore
> Feb 23 09:13:31 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Promote
> pri_pgsql:1#011(Slave -> Master pp-obm-sgbd.upond.fr)
> Feb 23 09:13:31 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
> pri_vip#011(pp-obm-sgbd.upond.fr)
> Feb 23 09:13:31 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
> Initiating action 8: monitor pri_pgsql_monitor_20000 on
> pp-obm-sgbd2.upond.fr (local)
> Feb 23 09:13:31 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
> Calculated Transition 305: /var/lib/pacemaker/pengine/pe-input-56.bz2
> Feb 23 09:13:31 pp-obm-sgbd2 pacemakerd[13359]:   notice: crm_add_logfile:
> Additional logging available in /var/log/cluster/corosync.log
> Feb 23 09:13:32 pp-obm-sgbd2 pgsql(pri_pgsql)[13348]: INFO: Master does
> not exist.
> Feb 23 09:13:32 pp-obm-sgbd2 pgsql(pri_pgsql)[13348]: WARNING: My data is
> out-of-date. status=DISCONNECT
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: process_lrm_event:
> Operation pri_pgsql_monitor_20000: ok (node=pp-obm-sgbd2.upond.fr,
> call=96, rc=0, cib-update=490, confirmed=false)
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:  warning: run_graph: Transition
> 305 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=6,
> Source=/var/lib/pacemaker/pengine/pe-input-56.bz2): Terminated
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:  warning: te_graph_trigger:
> Transition failed: terminated
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_graph: Graph 305
> with 7 actions: batch-limit=7 jobs, network-delay=0ms
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action    8]: Completed rsc op pri_pgsql_monitor_20000           on
> pp-obm-sgbd2.upond.fr (priority: 0, waiting: none)
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action   12]: Pending rsc op pri_pgsql_monitor_15000             on
> pp-obm-sgbd.upond.fr (priority: 0, waiting:  11)
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action   11]: Pending rsc op pri_pgsql_promote_0                 on
> pp-obm-sgbd.upond.fr (priority: 0, waiting:  17)
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action   18]: Pending pseudo op ms_pgsql_promoted_0              on N/A
> (priority: 1000000, waiting:  11)
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action   17]: Pending pseudo op ms_pgsql_promote_0               on N/A
> (priority: 0, waiting:  21)
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action   22]: Pending rsc op pri_vip_monitor_10000               on
> pp-obm-sgbd.upond.fr (priority: 0, waiting:  21)
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
> [Action   21]: Pending rsc op pri_vip_start_0                     on
> pp-obm-sgbd.upond.fr (priority: 0, waiting:  18)
> Feb 23 09:13:32 pp-obm-sgbd2 crmd[19626]:   notice: do_state_transition:
> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>
> So, so far noluck. I am really lost.
> I it normal to have the crmd log line stating: "Pending pseudo op
> ms_pgsql_promoted_0              on N/A (priority: 1000000, waiting:  11)"?
> What does it mean?
> Am I right thinking it means the cluster doesn't know where to promote the
> MS resource? Or is it just normal because it appears only on the node that
> should not be promoted?
>
> Should you need any frurther informations just ask, i'll be happy to
> provide.
>
> In other words: I am lost. Thanks for any heads up.
>
>
> 2015-02-22 20:43 GMT+01:00 Andrew Beekhof <andrew at beekhof.net>:
>
>>
>> > On 21 Feb 2015, at 6:59 am, Alexandre <alxgomz at gmail.com> wrote:
>> >
>> > Hi list,
>> >
>> > I am facing a very strange issue.
>> > I have setup a postgresql cluster (with streaming repl).
>> > The replication works ok when started manually but the RA seems to
>> never promote any host where the resource is started.
>> >
>> > I am running pacemaker 1.12 on centos 6.6 (and I added crmsh from an
>> opensuse repo, as I am used to it)
>>
>> It might be worth grabbing the very latest pgsql agent from
>> https://github.com/ClusterLabs/resource-agents
>> You've also compared your setup to the one described at
>> http://clusterlabs.org/wiki/PostgresHowto ?
>>
>> >
>> > my config is bellow:
>> > node pp-obm-sgbd.upond.fr
>> > node pp-obm-sgbd2.upond.fr \
>> >     attributes pri_pgsql-data-status=DISCONNECT
>> > primitive pri_obm-locator lsb:obm-locator \
>> >     params \
>> >     op start interval=0s timeout=60s \
>> >     op stop interval=0s timeout=60s \
>> >     op monitor interval=10s timeout=20s
>> > primitive pri_pgsql pgsql \
>> >     params pgctl="/usr/pgsql-9.1/bin/pg_ctl"
>> psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data/" node_list="
>> pp-obm-sgbd.upond.fr pp-obm-sgbd2.upond.fr" repuser=replication
>> rep_mode=sync restart_on_promote=true restore_command="cp
>> /var/lib/pgsql/replication/%f %p" primary_conninfo_opt="keepalives_idle=60
>> keepalives_interval=5 keepalives_count=5" master_ip=193.50.151.200 \
>> >     op start interval=0 on-fail=restart timeout=120s \
>> >     op monitor interval=20s on-fail=restart timeout=60s \
>> >     op monitor interval=15s on-fail=restart role=Master timeout=60s \
>> >     op promote interval=0 on-fail=restart timeout=120s \
>> >     op demote interval=0 on-fail=stop timeout=120s \
>> >     op notify interval=0s timeout=60s \
>> >     op stop interval=0 on-fail=block timeout=120s
>> > primitive pri_vip IPaddr2 \
>> >     params ip=193.50.151.200 nic=eth1 cidr_netmask=32 \
>> >     op start interval=0s timeout=60s \
>> >     op monitor interval=10s timeout=60s \
>> >     op stop interval=0s timeout=60s
>> > ms ms_pgsql pri_pgsql \
>> >     meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
>> > colocation clc_vip-ms_pgsql inf: pri_vip:Started ms_pgsql:Master
>> > order ord_dm_pgsql-vip 0: ms_pgsql:demote pri_vip:stop
>> > order ord_pm_pgsql-vip 0: ms_pgsql:promote pri_vip:start
>> symmetrical=false
>> > property cib-bootstrap-options: \
>> >     dc-version=1.1.11-97629de \
>> >     cluster-infrastructure=cman \
>> >     last-lrm-refresh=1424459378 \
>> >     no-quorum-policy=ignore \
>> >     stonith-enabled=false \
>> >     maintenance-mode=false
>> > rsc_defaults rsc_defaults-options: \
>> >     resource-stickiness=1000 \
>> >     migration-threshold=5
>> >
>> > crm_mon shows both hosts as slaves and none is never promoted ever:
>> >
>> > Master/Slave Set: ms_pgsql [pri_pgsql]
>> >      Slaves: [ pp-obm-sgbd.upond.fr pp-obm-sgbd2.upond.fr ]
>> > Node Attributes:
>> > * Node pp-obm-sgbd.upond.fr:
>> >     + master-pri_pgsql                  : 1000
>> >     + pri_pgsql-status                  : HS:alone
>> >     + pri_pgsql-xlog-loc                : 000000002D000078
>> > * Node pp-obm-sgbd2.upond.fr:
>> >     + master-pri_pgsql                  : -INFINITY
>> >     + pri_pgsql-data-status             : DISCONNECT
>> >     + pri_pgsql-status                  : HS:alone
>> >     + pri_pgsql-xlog-loc                : 000000002D000000
>> >
>> > on the host I am expecting promotion I see when doing cleanups:
>> > Feb 20 20:15:07 pp-obm-sgbd pgsql(pri_pgsql)[30994]: INFO: Master does
>> not exist.
>> > Feb 20 20:15:07 pp-obm-sgbd pgsql(pri_pgsql)[30994]: INFO: My data
>> status=.
>> >
>> > And on the other node I see the following logs that sounds interrseting:
>> > Feb 20 20:16:10 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
>> [Action   18]: Pending pseudo op ms_pgsql_promoted_0              on N/A
>> (priority: 1000000, waiting:  11)
>> > Feb 20 20:16:10 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
>> [Action   17]: Pending pseudo op ms_pgsql_promote_0               on N/A
>> (priority: 0, waiting:  21)
>> >
>> > the N/A part seems to tell me the cluster don't know where to promote
>> the resource but I can't understand why.
>> >
>> > bellow are my constraint rules:
>> >
>> > pcs constraint show
>> > Location Constraints:
>> > Ordering Constraints:
>> >   demote ms_pgsql then stop pri_vip (score:0)
>> >   promote ms_pgsql then start pri_vip (score:0) (non-symmetrical)
>> > Colocation Constraints:
>> >   pri_vip with ms_pgsql (score:INFINITY) (rsc-role:Started)
>> (with-rsc-role:Master)
>> >
>> > I am now out of ideas so any help is very much appreciated.
>> >
>> > Regards.
>> >
>> > _______________________________________________
>> > Users mailing list: Users at clusterlabs.org
>> > http://clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150226/bcc024d0/attachment-0003.html>