[ClusterLabs] Antw: 答复: The slave not does not promote to master

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon May 7 06:52:48 UTC 2018


What about this: Configure fencing, then if everything works OK, try without
fencing.

>>> ??? <fanguoteng at highgo.com> schrieb am 07.05.2018 um 08:54 in Nachricht
<177fb170fe264dbca52df5e25d27cbfe at EX01.highgo.com>:
> Thank you, Klaus. There is no fencing device in our network according to the

> request. Is there any other way to configure the cluster to make it work?
> 
> 
> 发件人: Klaus Wenninger [mailto:kwenning at redhat.com]
> 发送时间: 2018年5月7日 14:40
> 收件人: Cluster Labs - All topics related to open-source clustering welcomed 
> <users at clusterlabs.org>; 范国腾 <fanguoteng at highgo.com>
> 主题: Re: [ClusterLabs] The slave not does not promote to master
> 
> On 05/07/2018 07:39 AM, 范国腾 wrote:
> 
> Hi,
> 
> 
> 
> We have two nodes cluster using PAF to manage the postgres. Node2 is master.

> Master/Slave Set: pgsql-ha [pgsqld]
> 
>      Master: [sds2]
> 
>      Slaves: [ sds1 ]
> 
> 
> 
> In the master node(sds2), I remove the data directory of postgres. I expect

> the master nodes(sds2) stop and the slave node(sds1) is promoted to master.
> 
> The sds2 log show that is executes monitor->notify->demote->notify->stop.
The 
> sds1 log also show " Promote pgsqld:0#011(Slave -> Master sds1)". But the
"pcs 
> status" shows the status like the following. Could you please help check
what 
> prevents the promotion happen in sds1? What should I do if I want to
recovery 
> the system?
> 
> Didn't check all detail but looks as if stopping the resource would
> fail. So that it doesn't know the state on sds2 and thus can't
> promote on sds1.
> If you had enabled fencing this would lead to sds2 being fenced
> so that sds1 can take over.
> 
> As digimer would say: "use fencing!"
> 
> Regards,
> Klaus
> 
> 
> 
> 
> 
> 
> 
> 2 nodes configured
> 
> 3 resources configured
> 
> Online: [ sds1 sds2 ]
> 
> Full list of resources:
> 
>  Master/Slave Set: pgsql-ha [pgsqld]
> 
>      pgsqld     (ocf::heartbeat:pgsqlms):       FAILED Master sds2
(blocked)
> 
>      Slaves: [ sds1 ]
> 
>  Resource Group: mastergroup
> 
>      master-vip (ocf::heartbeat:IPaddr2):       Started sds2
> 
> Failed Actions:
> 
> * pgsqld_stop_0 on sds2 'invalid parameter' (2): call=42, status=complete, 
> exitreason='PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists',
> 
>     last-rc-change='Mon May  7 00:39:06 2018', queued=1ms, exec=72ms
> 
> 
> 
> 
> 
> 
> 
> Here is the sds2 log:
> 
> May  7 00:38:46 node2 pgsqlms(pgsqld)[14000]: INFO: Execute action monitor 
> and the result 8
> 
> May  7 00:38:56 node2 pgsqlms(pgsqld)[14077]: INFO: Execute action monitor 
> and the result 8
> 
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14152]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> 
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_monitor_10000:14152:stderr

> [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not

> exists ]
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_10000:36 [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists\n ]
> 
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14162]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> 
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14162:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
> pgsqld on sds2: 0 (ok)
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_10000:36 [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists\n ]
> 
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14172]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> 
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_demote_0:14172:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of demote operation for 
> pgsqld on sds2: 2 (invalid parameter)
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_demote_0:39 [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists\n ]
> 
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14182]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> 
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14182:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
> pgsqld on sds2: 0 (ok)
> 
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14192]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> 
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14192:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
> pgsqld on sds2: 0 (ok)
> 
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14202]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> 
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_stop_0:14202:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of stop operation for 
> pgsqld on sds2: 2 (invalid parameter)
> 
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_stop_0:42 [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists\n ]
> 
> May  7 00:40:01 node2 systemd: Started Session 4 of user root.
> 
> May  7 00:40:01 node2 systemd: Starting Session 4 of user root.
> 
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Caught 'Terminated' signal
> 
> May  7 00:47:21 node2 systemd: Stopping Pacemaker High Availability Cluster

> Manager...
> 
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Shutting down Pacemaker
> 
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Stopping crmd
> 
> May  7 00:47:21 node2 crmd[1129]:  notice: Caught 'Terminated' signal
> 
> May  7 00:47:21 node2 crmd[1129]:  notice: Shutting down cluster resource 
> manager
> 
> 
> 
> Here is the sds1 log(in the attachment)
> 
> May  7 00:38:47 node1 pgsqlms(pgsqld)[4426]: INFO: Execute action monitor 
> and the result 0May  7 00:39:03 node1 pgsqlms(pgsqld)[4442]: INFO: Execute 
> action monitor and the result 0May  7 00:39:06 node1 crmd[1133]:  notice: 
> State transition S_IDLE -> S_POLICY_ENGINEMay  7 00:39:06 node1
pengine[1132]: 
> warning: Processing failed op monitor for pgsqld:1 on sds2: invalid
parameter 
> (2)May  7 00:39:06 node1 pengine[1132]:   error: Preventing pgsql-ha from 
> re-starting on sds2: operation monitor failed 'invalid parameter' (2)May  7

> 00:39:06 node1 pengine[1132]:  notice: Promote pgsqld:0#011(Slave -> Master

> sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Demote  
> pgsqld:1#011(Master -> Stopped sds2)May  7 00:39:06 node1 pengine[1132]:  
> notice: Move    master-vip#011(Started sds2 -> sds1)May  7 00:39:06 node1 
> pengine[1132]:  notice: Calculated transition 31, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-97.bz2May  7 00:39:06 node1 
> pengine[1132]: warning: Processing failed op monitor for pgsqld:1 on sds2: 
> invalid parameter (2)May  7 00:39:06 node1 pengine[1132]:   error:
Preventing 
> pgsql-ha from re-starting on sds2: operation monitor failed 'invalid 
> parameter' (2)May  7 00:39:06 node1 pengine[1132]:  notice: Promote 
> pgsqld:0#011(Slave -> Master sds1)May  7 00:39:06 node1 pengine[1132]:  
> notice: Demote  pgsqld:1#011(Master -> Stopped sds2)May  7 00:39:06 node1 
> pengine[1132]:  notice: Move    master-vip#011(Started sds2 -> sds1)May  7 
> 00:39:06 node1 pengine[1132]:  notice: Calculated transition 32, saving 
> inputs in /var/lib/pacemaker/pengine/pe-input-98.bz2May  7 00:39:06 node1 
> crmd[1133]:  notice: Initiating cancel operation pgsqld_monitor_16000
locally 
> on sds1May  7 00:39:06 node1 crmd[1133]:  notice: Initiating notify
operation 
> pgsqld_pre_notify_demote_0 locally on sds1May  7 00:39:06 node1 crmd[1133]: 

> notice: Initiating notify operation pgsqld_pre_notify_demote_0 on sds2
> 
> 
> 
> 
> _______________________________________________
> 
> Users mailing list: Users at clusterlabs.org<mailto:Users at clusterlabs.org>
> 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> 
> 
> Project Home: http://www.clusterlabs.org 
> 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> 
> Bugs: http://bugs.clusterlabs.org 





More information about the Users mailing list