[ClusterLabs] The slave not does not promote to master

Klaus Wenninger kwenning at redhat.com
Mon May 7 06:40:07 UTC 2018


On 05/07/2018 07:39 AM, 范国腾 wrote:
> Hi,
>
> We have two nodes cluster using PAF to manage the postgres. Node2 is master. Master/Slave Set: pgsql-ha [pgsqld]
>      Master: [sds2]
>      Slaves: [ sds1 ]
>
> In the master node(sds2), I remove the data directory of postgres. I expect the master nodes(sds2) stop and the slave node(sds1) is promoted to master. 
> The sds2 log show that is executes monitor->notify->demote->notify->stop. The sds1 log also show " Promote pgsqld:0#011(Slave -> Master sds1)". But the "pcs status" shows the status like the following. Could you please help check what prevents the promotion happen in sds1? What should I do if I want to recovery the system?

Didn't check all detail but looks as if stopping the resource would
fail. So that it doesn't know the state on sds2 and thus can't
promote on sds1.
If you had enabled fencing this would lead to sds2 being fenced
so that sds1 can take over.

As digimer would say: "use fencing!"

Regards,
Klaus

>
> 2 nodes configured
> 3 resources configured
> Online: [ sds1 sds2 ]
> Full list of resources:
>  Master/Slave Set: pgsql-ha [pgsqld]
>      pgsqld     (ocf::heartbeat:pgsqlms):       FAILED Master sds2 (blocked)
>      Slaves: [ sds1 ]
>  Resource Group: mastergroup
>      master-vip (ocf::heartbeat:IPaddr2):       Started sds2
> Failed Actions:
> * pgsqld_stop_0 on sds2 'invalid parameter' (2): call=42, status=complete, exitreason='PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists',
>     last-rc-change='Mon May  7 00:39:06 2018', queued=1ms, exec=72ms
>
>
>
> Here is the sds2 log:
> May  7 00:38:46 node2 pgsqlms(pgsqld)[14000]: INFO: Execute action monitor and the result 8
> May  7 00:38:56 node2 pgsqlms(pgsqld)[14077]: INFO: Execute action monitor and the result 8
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14152]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_monitor_10000:14152:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_10000:36 [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists\n ]
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14162]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14162:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for pgsqld on sds2: 0 (ok)
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_10000:36 [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists\n ]
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14172]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_demote_0:14172:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of demote operation for pgsqld on sds2: 2 (invalid parameter)
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_demote_0:39 [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists\n ]
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14182]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14182:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for pgsqld on sds2: 0 (ok)
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14192]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14192:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for pgsqld on sds2: 0 (ok)
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14202]: ERROR: PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_stop_0:14202:stderr [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of stop operation for pgsqld on sds2: 2 (invalid parameter)
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_stop_0:42 [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists\n ]
> May  7 00:40:01 node2 systemd: Started Session 4 of user root.
> May  7 00:40:01 node2 systemd: Starting Session 4 of user root.
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Caught 'Terminated' signal
> May  7 00:47:21 node2 systemd: Stopping Pacemaker High Availability Cluster Manager...
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Shutting down Pacemaker
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Stopping crmd
> May  7 00:47:21 node2 crmd[1129]:  notice: Caught 'Terminated' signal
> May  7 00:47:21 node2 crmd[1129]:  notice: Shutting down cluster resource manager
>
> Here is the sds1 log(in the attachment)
> May  7 00:38:47 node1 pgsqlms(pgsqld)[4426]: INFO: Execute action monitor and the result 0May  7 00:39:03 node1 pgsqlms(pgsqld)[4442]: INFO: Execute action monitor and the result 0May  7 00:39:06 node1 crmd[1133]:  notice: State transition S_IDLE -> S_POLICY_ENGINEMay  7 00:39:06 node1 pengine[1132]: warning: Processing failed op monitor for pgsqld:1 on sds2: invalid parameter (2)May  7 00:39:06 node1 pengine[1132]:   error: Preventing pgsql-ha from re-starting on sds2: operation monitor failed 'invalid parameter' (2)May  7 00:39:06 node1 pengine[1132]:  notice: Promote pgsqld:0#011(Slave -> Master sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Demote  pgsqld:1#011(Master -> Stopped sds2)May  7 00:39:06 node1 pengine[1132]:  notice: Move    master-vip#011(Started sds2 -> sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Calculated transition 31, saving inputs in /var/lib/pacemaker/pengine/pe-input-97.bz2May  7 00:39:06 node1 pengine[1132]: warning: Processing failed op monitor for pgsqld:1 on sds2: invalid parameter (2)May  7 00:39:06 node1 pengine[1132]:   error: Preventing pgsql-ha from re-starting on sds2: operation monitor failed 'invalid parameter' (2)May  7 00:39:06 node1 pengine[1132]:  notice: Promote pgsqld:0#011(Slave -> Master sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Demote  pgsqld:1#011(Master -> Stopped sds2)May  7 00:39:06 node1 pengine[1132]:  notice: Move    master-vip#011(Started sds2 -> sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Calculated transition 32, saving inputs in /var/lib/pacemaker/pengine/pe-input-98.bz2May  7 00:39:06 node1 crmd[1133]:  notice: Initiating cancel operation pgsqld_monitor_16000 locally on sds1May  7 00:39:06 node1 crmd[1133]:  notice: Initiating notify operation pgsqld_pre_notify_demote_0 locally on sds1May  7 00:39:06 node1 crmd[1133]:  notice: Initiating notify operation pgsqld_pre_notify_demote_0 on sds2
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180507/3c0a26b2/attachment.html>


More information about the Users mailing list