[ClusterLabs] 答复: The slave not does not promote to master

范国腾 fanguoteng at highgo.com
Mon May 7 02:54:25 EDT 2018


Thank you, Klaus. There is no fencing device in our network according to the request. Is there any other way to configure the cluster to make it work?


发件人: Klaus Wenninger [mailto:kwenning at redhat.com]
发送时间: 2018年5月7日 14:40
收件人: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>; 范国腾 <fanguoteng at highgo.com>
主题: Re: [ClusterLabs] The slave not does not promote to master

On 05/07/2018 07:39 AM, 范国腾 wrote:

Hi,



We have two nodes cluster using PAF to manage the postgres. Node2 is master. Master/Slave Set: pgsql-ha [pgsqld]

     Master: [sds2]

     Slaves: [ sds1 ]



In the master node(sds2), I remove the data directory of postgres. I expect the master nodes(sds2) stop and the slave node(sds1) is promoted to master.

The sds2 log show that is executes monitor->notify->demote->notify->stop. The sds1 log also show " Promote pgsqld:0#011(Slave -> Master sds1)". But the "pcs status" shows the status like the following. Could you please help check what prevents the promotion happen in sds1? What should I do if I want to recovery the system?

Didn't check all detail but looks as if stopping the resource would
fail. So that it doesn't know the state on sds2 and thus can't
promote on sds1.
If you had enabled fencing this would lead to sds2 being fenced
so that sds1 can take over.

As digimer would say: "use fencing!"

Regards,
Klaus







2 nodes configured

3 resources configured

Online: [ sds1 sds2 ]

Full list of resources:

 Master/Slave Set: pgsql-ha [pgsqld]

     pgsqld     (ocf::heartbeat:pgsqlms):       FAILED Master sds2 (blocked)

     Slaves: [ sds1 ]

 Resource Group: mastergroup

     master-vip (ocf::heartbeat:IPaddr2):       Started sds2

Failed Actions:

* pgsqld_stop_0 on sds2 'invalid parameter' (2): call=42, status=complete, exitreason='PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists',

    last-rc-change='Mon May  7 00:39:06 2018', queued=1ms, exec=72ms







Here is the sds2 log:

May  7 00:38:46 node2 pgsqlms(pgsqld)[14000]: INFO: Execute action monitor and the result 8

May  7 00:38:56 node2 pgsqlms(pgsqld)[14077]: INFO: Execute action monitor and the result 8

May  7 00:39:06 node2 pgsqlms(pgsqld)[14152]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_monitor_10000:14152:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_10000:36 [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists\n ]

May  7 00:39:06 node2 pgsqlms(pgsqld)[14162]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14162:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for pgsqld on sds2: 0 (ok)

May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_10000:36 [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists\n ]

May  7 00:39:06 node2 pgsqlms(pgsqld)[14172]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_demote_0:14172:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of demote operation for pgsqld on sds2: 2 (invalid parameter)

May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_demote_0:39 [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists\n ]

May  7 00:39:06 node2 pgsqlms(pgsqld)[14182]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14182:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for pgsqld on sds2: 0 (ok)

May  7 00:39:06 node2 pgsqlms(pgsqld)[14192]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14192:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for pgsqld on sds2: 0 (ok)

May  7 00:39:06 node2 pgsqlms(pgsqld)[14202]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_stop_0:14202:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of stop operation for pgsqld on sds2: 2 (invalid parameter)

May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_stop_0:42 [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists\n ]

May  7 00:40:01 node2 systemd: Started Session 4 of user root.

May  7 00:40:01 node2 systemd: Starting Session 4 of user root.

May  7 00:47:21 node2 pacemakerd[1063]:  notice: Caught 'Terminated' signal

May  7 00:47:21 node2 systemd: Stopping Pacemaker High Availability Cluster Manager...

May  7 00:47:21 node2 pacemakerd[1063]:  notice: Shutting down Pacemaker

May  7 00:47:21 node2 pacemakerd[1063]:  notice: Stopping crmd

May  7 00:47:21 node2 crmd[1129]:  notice: Caught 'Terminated' signal

May  7 00:47:21 node2 crmd[1129]:  notice: Shutting down cluster resource manager



Here is the sds1 log(in the attachment)

May  7 00:38:47 node1 pgsqlms(pgsqld)[4426]: INFO: Execute action monitor and the result 0May  7 00:39:03 node1 pgsqlms(pgsqld)[4442]: INFO: Execute action monitor and the result 0May  7 00:39:06 node1 crmd[1133]:  notice: State transition S_IDLE -> S_POLICY_ENGINEMay  7 00:39:06 node1 pengine[1132]: warning: Processing failed op monitor for pgsqld:1 on sds2: invalid parameter (2)May  7 00:39:06 node1 pengine[1132]:   error: Preventing pgsql-ha from re-starting on sds2: operation monitor failed 'invalid parameter' (2)May  7 00:39:06 node1 pengine[1132]:  notice: Promote pgsqld:0#011(Slave -> Master sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Demote  pgsqld:1#011(Master -> Stopped sds2)May  7 00:39:06 node1 pengine[1132]:  notice: Move    master-vip#011(Started sds2 -> sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Calculated transition 31, saving inputs in /var/lib/pacemaker/pengine/pe-input-97.bz2May  7 00:39:06 node1 pengine[1132]: warning: Processing failed op monitor for pgsqld:1 on sds2: invalid parameter (2)May  7 00:39:06 node1 pengine[1132]:   error: Preventing pgsql-ha from re-starting on sds2: operation monitor failed 'invalid parameter' (2)May  7 00:39:06 node1 pengine[1132]:  notice: Promote pgsqld:0#011(Slave -> Master sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Demote  pgsqld:1#011(Master -> Stopped sds2)May  7 00:39:06 node1 pengine[1132]:  notice: Move    master-vip#011(Started sds2 -> sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Calculated transition 32, saving inputs in /var/lib/pacemaker/pengine/pe-input-98.bz2May  7 00:39:06 node1 crmd[1133]:  notice: Initiating cancel operation pgsqld_monitor_16000 locally on sds1May  7 00:39:06 node1 crmd[1133]:  notice: Initiating notify operation pgsqld_pre_notify_demote_0 locally on sds1May  7 00:39:06 node1 crmd[1133]:  notice: Initiating notify operation pgsqld_pre_notify_demote_0 on sds2




_______________________________________________

Users mailing list: Users at clusterlabs.org<mailto:Users at clusterlabs.org>

https://lists.clusterlabs.org/mailman/listinfo/users



Project Home: http://www.clusterlabs.org

Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180507/e60cd2a4/attachment-0002.html>


More information about the Users mailing list