[ClusterLabs] Postgres never promoted
NAKAHIRA Kazutomo
nakahira_kazutomo_b1 at lab.ntt.co.jp
Fri Mar 6 09:23:31 CET 2015
Hi, Alexandre
Could you please send /var/log/messages file
both pp-obm-sgbd.upond.fr and pp-obm-sgbd2.upond.fr?
(and all output of "crm_mon -frA1" command)
It seems that following /var/log/messages is snipped
and some important messages are dropped.
Best regards,
Kazutomo NAKAHIRA
> Hi Takatoshi,
>
> Thank you for your reply.
> I did read the page you mentionned. Actually this is the page I use to
> setup such clusters (except I don't use only one VIP).
> I can have initial sync working following this page. The problem I have is
> that, once the cluster is up (one master PRI and one slave STREAMING|SYNC),
> I standby the master node in order to test failover.
> I would then expect the former slave to be promoted but this never seems to
> happen.
> When I issue the crm node standby on the first master, I get the logs
> bellow on the slave to be promoted:
>
> ==> /var/lib/pgsql/9.1/data/pg_log/postgresql-Mon.log <==
> FATAL: r?plication termin?e par le serveur primaire (french logs meaning:
> replication ended by master)
>
> ==> /var/log/messages <==
> Mar 2 10:04:37 pp-obm-sgbd2 crmd[19626]: notice: run_graph: Transition
> 995 (Complete=4, Pending=0, Fired=0, Skipped=9, Incomplete=2,
> Source=/var/lib/pacemaker/pengine/pe-input-76.bz2): Stopped
> Mar 2 10:04:37 pp-obm-sgbd2 pengine[19625]: notice: unpack_config: On
> loss of CCM Quorum: Ignore
> Mar 2 10:04:37 pp-obm-sgbd2 pengine[19625]: notice: LogActions: *Promote
> pri_pgsql:0#011(Slave -> Master pp-obm-sgbd2.upond.fr
> <http://pp-obm-sgbd2.upond.fr>)*
> Mar 2 10:04:37 pp-obm-sgbd2 pengine[19625]: notice: LogActions: *Stop
> pri_pgsql:1#011(pp-obm-sgbd.upond.fr <http://pp-obm-sgbd.upond.fr>)*
> Mar 2 10:04:37 pp-obm-sgbd2 pengine[19625]: notice: LogActions: Move
> pri_vip#011(Started pp-obm-sgbd.upond.fr -> pp-obm-sgbd2.upond.fr)
> Mar 2 10:04:37 pp-obm-sgbd2 crmd[19626]: notice: te_rsc_command:
> Initiating action 20: stop pri_vip_stop_0 on pp-obm-sgbd.upond.fr
> Mar 2 10:04:37 pp-obm-sgbd2 crmd[19626]: notice: te_rsc_command:
> Initiating action 11: stop pri_pgsql_stop_0 on pp-obm-sgbd.upond.fr
> Mar 2 10:04:37 pp-obm-sgbd2 pengine[19625]: notice: process_pe_message:
> Calculated Transition 996: /var/lib/pacemaker/pengine/pe-input-77.bz2
> Mar 2 10:04:37 pp-obm-sgbd2 crmd[19626]: notice: abort_transition_graph:
> Transition aborted by deletion of
> nvpair[@id='status-pp-obm-sgbd.upond.fr-pri_pgsql-xlog-loc']: Transient
> attribute change (cib=2.61.6, source=te_update_diff:391,
> path=/cib/status/node_state[@id='pp-obm-sgbd.upond.fr
> ']/transient_attributes[@id='pp-obm-sgbd.upond.fr
> ']/instance_attributes[@id='status-pp-obm-sgbd.upond.fr']/nvpair[@id='status-pp-obm-sgbd.upond.fr-pri_pgsql-xlog-loc'],
> 0)
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: run_graph: Transition
> 996 (Complete=4, Pending=0, Fired=0, Skipped=6, Incomplete=1,
> Source=/var/lib/pacemaker/pengine/pe-input-77.bz2): Stopped
> Mar 2 10:04:38 pp-obm-sgbd2 pengine[19625]: notice: unpack_config: On
> loss of CCM Quorum: Ignore
> Mar 2 10:04:38 pp-obm-sgbd2 pengine[19625]: notice: LogActions: *Promote
> pri_pgsql:0#011(Slave -> Master pp-obm-sgbd2.upond.fr
> <http://pp-obm-sgbd2.upond.fr>)*
> Mar 2 10:04:38 pp-obm-sgbd2 pengine[19625]: notice: LogActions: Start
> pri_vip#011(pp-obm-sgbd2.upond.fr)
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: warning: run_graph: Transition
> 997 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=6,
> Source=/var/lib/pacemaker/pengine/pe-input-78.bz2): Terminated
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: warning: te_graph_trigger:
> Transition failed: terminated
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: print_graph: Graph 997
> with 6 actions: batch-limit=6 jobs, network-delay=0ms
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: print_synapse:
> [Action 9]: Pending rsc op pri_pgsql_monitor_15000 on
> pp-obm-sgbd2.upond.fr (priority: 0, waiting: 8)
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: print_synapse:
> [Action 8]: *Pending rsc op pri_pgsql_promote_0 on
> pp-obm-sgbd2.upond.fr <http://pp-obm-sgbd2.upond.fr> (priority: 0,
> waiting: 14)*
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: print_synapse:
> [Action 15]:* Pending pseudo op ms_pgsql_promoted_0 on N/A
> (priority: 1000000, waiting: 8)*
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: print_synapse:
> [Action 14]:* Pending pseudo op ms_pgsql_promote_0 on N/A
> (priority: 0, waiting: 18)*
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: print_synapse:
> [Action 19]: Pending rsc op pri_vip_monitor_10000 on
> pp-obm-sgbd2.upond.fr (priority: 0, waiting: 18)
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: print_synapse:
> [Action 18]: Pending rsc op pri_vip_start_0 on
> pp-obm-sgbd2.upond.fr (priority: 0, waiting: 15)
> Mar 2 10:04:38 pp-obm-sgbd2 crmd[19626]: notice: do_state_transition:
> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Mar 2 10:04:38 pp-obm-sgbd2 pengine[19625]: notice: process_pe_message:
> Calculated Transition 997: /var/lib/pacemaker/pengine/pe-input-78.bz2
>
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> Pacemaker seems to be trying to acheive promotion of the former slave as
> expected, but I don't understand why crmd logs "ms_pgsql_promoted_0" and
> "ms_pgsql_promote_0" on "N/A"
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> ==> /var/lib/pgsql/9.1/data/pg_log/postgresql-Mon.log <==
> cp: impossible d'?valuer <<
> /var/lib/pgsql/replication/00000002000000000000002F >>: Aucun fichier ou
> dossier de ce type
> LOG: enregistrement de longueur nulle ? 0/2F000078
> cp: impossible d'?valuer <<
> /var/lib/pgsql/replication/00000002000000000000002F >>: Aucun fichier ou
> dossier de ce type
> cp: impossible d'?valuer << /var/lib/pgsql/replication/00000003.history >>:
> Aucun fichier ou dossier de ce type
> FATAL: n'a pas pu se connecter au serveur principal : n'a pas pu se
> connecter au serveur : Connexion termin?e par expiration du d?lai d'attente
> Le serveur est-il actif sur l'h?te << 193.50.151.200 >> et
> accepte-t-il les connexions
> TCP/IP sur le port 5432 ?
>
> cp: impossible d'?valuer <<
> /var/lib/pgsql/replication/00000002000000000000002F >>: Aucun fichier ou
> dossier de ce type
> cp: impossible d'?valuer <<
> /var/lib/pgsql/replication/00000002000000000000002F >>: Aucun fichier ou
> dossier de ce type
> cp: impossible d'?valuer << /var/lib/pgsql/replication/00000003.history >>:
> Aucun fichier ou dossier de ce type
> FATAL: n'a pas pu se connecter au serveur principal : n'a pas pu se
> connecter au serveur : Aucun chemin d'acc?s pour atteindre l'h?te cible
> Le serveur est-il actif sur l'h?te << 193.50.151.200 >> et
> accepte-t-il les connexions
>
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> Now the RA seems to be trying to make the node a slave again.... but htere
> is no master. The node stays in the state shown bellow for ever:
>
> * Node pp-obm-sgbd2.upond.fr:
> + master-pri_pgsql : 100
> + pri_pgsql-data-status : STREAMING|SYNC
> + pri_pgsql-status : HS:sync
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> That the very fist time I see this behaviour and I don't understand what I
> did differents from other setups. Any advice or help to find more
> informations or debug traces are very welcome!
>
>
> Thanks
>
>
>
> 2015-03-01 14:06 GMT+01:00 Takatoshi MATSUO <matsuo.tak at gmail.com>:
>
>> Hi Alexandre
>>
>> > pgsql(pri_pgsql)[13223]: WARNING: My data is out-of-date.
>> status=DISCONNECT
>>
>> Did you read Q&A ?
>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>>
>> -------
>> How do I force start Master although pgsql-data-status is "DISCONNECT"?
>> # crm_attribute -l forever -N {Node Name} -n "pgsql-data-status" -v
>> "LATEST"
>> ------
>>
>> Regards,
>> Takatoshi MATSUO
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
More information about the Users
mailing list