[ClusterLabs] Postgres never promoted

Mon Mar 2 09:32:14 UTC 2015

Hi Takatoshi,

Thank you for your reply.
I did read the page you mentionned. Actually this is the page I use to
setup such clusters (except I don't use only one VIP).
I can have initial sync working following this page. The problem I have is
that, once the cluster is up (one master PRI and one slave STREAMING|SYNC),
I standby the master node in order to test failover.
I would then expect the former slave to be promoted but this never seems to
happen.
When I issue the crm node standby on the first master, I get the logs
bellow on the slave to be promoted:

==> /var/lib/pgsql/9.1/data/pg_log/postgresql-Mon.log <==
FATAL:  r?plication termin?e par le serveur primaire (french logs meaning:
replication ended by master)

==> /var/log/messages <==
Mar  2 10:04:37 pp-obm-sgbd2 crmd[19626]:   notice: run_graph: Transition
995 (Complete=4, Pending=0, Fired=0, Skipped=9, Incomplete=2,
Source=/var/lib/pacemaker/pengine/pe-input-76.bz2): Stopped
Mar  2 10:04:37 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar  2 10:04:37 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: *Promote
pri_pgsql:0#011(Slave -> Master pp-obm-sgbd2.upond.fr
<http://pp-obm-sgbd2.upond.fr>)*
Mar  2 10:04:37 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: *Stop
pri_pgsql:1#011(pp-obm-sgbd.upond.fr <http://pp-obm-sgbd.upond.fr>)*
Mar  2 10:04:37 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Move
pri_vip#011(Started pp-obm-sgbd.upond.fr -> pp-obm-sgbd2.upond.fr)
Mar  2 10:04:37 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
Initiating action 20: stop pri_vip_stop_0 on pp-obm-sgbd.upond.fr
Mar  2 10:04:37 pp-obm-sgbd2 crmd[19626]:   notice: te_rsc_command:
Initiating action 11: stop pri_pgsql_stop_0 on pp-obm-sgbd.upond.fr
Mar  2 10:04:37 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
Calculated Transition 996: /var/lib/pacemaker/pengine/pe-input-77.bz2
Mar  2 10:04:37 pp-obm-sgbd2 crmd[19626]:   notice: abort_transition_graph:
Transition aborted by deletion of
nvpair[@id='status-pp-obm-sgbd.upond.fr-pri_pgsql-xlog-loc']: Transient
attribute change (cib=2.61.6, source=te_update_diff:391,
path=/cib/status/node_state[@id='pp-obm-sgbd.upond.fr
']/transient_attributes[@id='pp-obm-sgbd.upond.fr
']/instance_attributes[@id='status-pp-obm-sgbd.upond.fr']/nvpair[@id='status-pp-obm-sgbd.upond.fr-pri_pgsql-xlog-loc'],
0)
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: run_graph: Transition
996 (Complete=4, Pending=0, Fired=0, Skipped=6, Incomplete=1,
Source=/var/lib/pacemaker/pengine/pe-input-77.bz2): Stopped
Mar  2 10:04:38 pp-obm-sgbd2 pengine[19625]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar  2 10:04:38 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: *Promote
pri_pgsql:0#011(Slave -> Master pp-obm-sgbd2.upond.fr
<http://pp-obm-sgbd2.upond.fr>)*
Mar  2 10:04:38 pp-obm-sgbd2 pengine[19625]:   notice: LogActions: Start
pri_vip#011(pp-obm-sgbd2.upond.fr)
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:  warning: run_graph: Transition
997 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=6,
Source=/var/lib/pacemaker/pengine/pe-input-78.bz2): Terminated
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:  warning: te_graph_trigger:
Transition failed: terminated
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: print_graph: Graph 997
with 6 actions: batch-limit=6 jobs, network-delay=0ms
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action    9]: Pending rsc op pri_pgsql_monitor_15000             on
pp-obm-sgbd2.upond.fr (priority: 0, waiting:  8)
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action    8]: *Pending rsc op pri_pgsql_promote_0                 on
pp-obm-sgbd2.upond.fr <http://pp-obm-sgbd2.upond.fr> (priority: 0,
waiting:  14)*
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   15]:* Pending pseudo op ms_pgsql_promoted_0              on N/A
(priority: 1000000, waiting:  8)*
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   14]:* Pending pseudo op ms_pgsql_promote_0               on N/A
(priority: 0, waiting:  18)*
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   19]: Pending rsc op pri_vip_monitor_10000               on
pp-obm-sgbd2.upond.fr (priority: 0, waiting:  18)
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse:
[Action   18]: Pending rsc op pri_vip_start_0                     on
pp-obm-sgbd2.upond.fr (priority: 0, waiting:  15)
Mar  2 10:04:38 pp-obm-sgbd2 crmd[19626]:   notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar  2 10:04:38 pp-obm-sgbd2 pengine[19625]:   notice: process_pe_message:
Calculated Transition 997: /var/lib/pacemaker/pengine/pe-input-78.bz2

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Pacemaker seems to be trying to acheive promotion of the former slave as
expected, but I don't understand why crmd logs "ms_pgsql_promoted_0" and
"ms_pgsql_promote_0" on "N/A"
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

==> /var/lib/pgsql/9.1/data/pg_log/postgresql-Mon.log <==
cp: impossible d'?valuer <<
/var/lib/pgsql/replication/00000002000000000000002F >>: Aucun fichier ou
dossier de ce type
LOG:  enregistrement de longueur nulle ? 0/2F000078
cp: impossible d'?valuer <<
/var/lib/pgsql/replication/00000002000000000000002F >>: Aucun fichier ou
dossier de ce type
cp: impossible d'?valuer << /var/lib/pgsql/replication/00000003.history >>:
Aucun fichier ou dossier de ce type
FATAL:  n'a pas pu se connecter au serveur principal : n'a pas pu se
connecter au serveur : Connexion termin?e par expiration du d?lai d'attente
        Le serveur est-il actif sur l'h?te << 193.50.151.200 >> et
accepte-t-il les connexions
        TCP/IP sur le port 5432 ?

cp: impossible d'?valuer <<
/var/lib/pgsql/replication/00000002000000000000002F >>: Aucun fichier ou
dossier de ce type
cp: impossible d'?valuer <<
/var/lib/pgsql/replication/00000002000000000000002F >>: Aucun fichier ou
dossier de ce type
cp: impossible d'?valuer << /var/lib/pgsql/replication/00000003.history >>:
Aucun fichier ou dossier de ce type
FATAL:  n'a pas pu se connecter au serveur principal : n'a pas pu se
connecter au serveur : Aucun chemin d'acc?s pour atteindre l'h?te cible
        Le serveur est-il actif sur l'h?te << 193.50.151.200 >> et
accepte-t-il les connexions

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Now the RA seems to be trying to make the node a slave again.... but htere
is no master. The node stays in the state shown bellow for ever:

* Node pp-obm-sgbd2.upond.fr:
    + master-pri_pgsql                  : 100
    + pri_pgsql-data-status             : STREAMING|SYNC
    + pri_pgsql-status                  : HS:sync
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

That the very fist time I see this behaviour and I don't understand what I
did differents from other setups. Any advice or help to find more
informations or debug traces are very welcome!

Thanks

2015-03-01 14:06 GMT+01:00 Takatoshi MATSUO <matsuo.tak at gmail.com>:

> Hi Alexandre
>
> > pgsql(pri_pgsql)[13223]: WARNING: My data is out-of-date.
> status=DISCONNECT
>
> Did you read Q&A ?
> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>
> -------
> How do I force start Master although pgsql-data-status is "DISCONNECT"?
> # crm_attribute -l forever -N {Node Name} -n "pgsql-data-status" -v
> "LATEST"
> ------
>
> Regards,
> Takatoshi MATSUO
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150302/764b6fe3/attachment.htm>