[ClusterLabs] Antw: The slave not does not promote to master
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon May 7 02:51:37 EDT 2018
I have no idea about the Postgres stuff, but it seems the logs say it all:
May 7 00:39:06 node2 pgsqlms(pgsqld)[14152]: ERROR: PGDATA
"/home/highgo/highgo/database/4.3.1/data" does not exists
May 7 00:39:06 node1 pengine[1132]: warning: Processing failed op monitor for
pgsqld:1 on sds2: invalid parameter (2)
Regards,
Ulrich
>>> ??? <fanguoteng at highgo.com> schrieb am 07.05.2018 um 07:39 in Nachricht
<69b10bf44e164c11a27ce0bd5d987b5f at EX01.highgo.com>:
> Hi,
>
> We have two nodes cluster using PAF to manage the postgres. Node2 is master.
> Master/Slave Set: pgsql-ha [pgsqld]
> Master: [sds2]
> Slaves: [ sds1 ]
>
> In the master node(sds2), I remove the data directory of postgres. I expect
> the master nodes(sds2) stop and the slave node(sds1) is promoted to master.
> The sds2 log show that is executes monitor->notify->demote->notify->stop.
The
> sds1 log also show " Promote pgsqld:0#011(Slave -> Master sds1)". But the
"pcs
> status" shows the status like the following. Could you please help check
what
> prevents the promotion happen in sds1? What should I do if I want to
recovery
> the system?
>
> 2 nodes configured
> 3 resources configured
> Online: [ sds1 sds2 ]
> Full list of resources:
> Master/Slave Set: pgsql-ha [pgsqld]
> pgsqld (ocf::heartbeat:pgsqlms): FAILED Master sds2
(blocked)
> Slaves: [ sds1 ]
> Resource Group: mastergroup
> master-vip (ocf::heartbeat:IPaddr2): Started sds2
> Failed Actions:
> * pgsqld_stop_0 on sds2 'invalid parameter' (2): call=42, status=complete,
> exitreason='PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists',
> last-rc-change='Mon May 7 00:39:06 2018', queued=1ms, exec=72ms
>
>
>
> Here is the sds2 log:
> May 7 00:38:46 node2 pgsqlms(pgsqld)[14000]: INFO: Execute action monitor
> and the result 8
> May 7 00:38:56 node2 pgsqlms(pgsqld)[14077]: INFO: Execute action monitor
> and the result 8
> May 7 00:39:06 node2 pgsqlms(pgsqld)[14152]: ERROR: PGDATA
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_monitor_10000:14152:stderr
> [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists ]
> May 7 00:39:06 node2 crmd[1129]: notice: sds2-pgsqld_monitor_10000:36 [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists\n ]
> May 7 00:39:06 node2 pgsqlms(pgsqld)[14162]: ERROR: PGDATA
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_notify_0:14162:stderr [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists ]
> May 7 00:39:06 node2 crmd[1129]: notice: Result of notify operation for
> pgsqld on sds2: 0 (ok)
> May 7 00:39:06 node2 crmd[1129]: notice: sds2-pgsqld_monitor_10000:36 [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists\n ]
> May 7 00:39:06 node2 pgsqlms(pgsqld)[14172]: ERROR: PGDATA
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_demote_0:14172:stderr [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists ]
> May 7 00:39:06 node2 crmd[1129]: notice: Result of demote operation for
> pgsqld on sds2: 2 (invalid parameter)
> May 7 00:39:06 node2 crmd[1129]: notice: sds2-pgsqld_demote_0:39 [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists\n ]
> May 7 00:39:06 node2 pgsqlms(pgsqld)[14182]: ERROR: PGDATA
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_notify_0:14182:stderr [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists ]
> May 7 00:39:06 node2 crmd[1129]: notice: Result of notify operation for
> pgsqld on sds2: 0 (ok)
> May 7 00:39:06 node2 pgsqlms(pgsqld)[14192]: ERROR: PGDATA
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_notify_0:14192:stderr [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists ]
> May 7 00:39:06 node2 crmd[1129]: notice: Result of notify operation for
> pgsqld on sds2: 0 (ok)
> May 7 00:39:06 node2 pgsqlms(pgsqld)[14202]: ERROR: PGDATA
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_stop_0:14202:stderr [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists ]
> May 7 00:39:06 node2 crmd[1129]: notice: Result of stop operation for
> pgsqld on sds2: 2 (invalid parameter)
> May 7 00:39:06 node2 crmd[1129]: notice: sds2-pgsqld_stop_0:42 [
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not
> exists\n ]
> May 7 00:40:01 node2 systemd: Started Session 4 of user root.
> May 7 00:40:01 node2 systemd: Starting Session 4 of user root.
> May 7 00:47:21 node2 pacemakerd[1063]: notice: Caught 'Terminated' signal
> May 7 00:47:21 node2 systemd: Stopping Pacemaker High Availability Cluster
> Manager...
> May 7 00:47:21 node2 pacemakerd[1063]: notice: Shutting down Pacemaker
> May 7 00:47:21 node2 pacemakerd[1063]: notice: Stopping crmd
> May 7 00:47:21 node2 crmd[1129]: notice: Caught 'Terminated' signal
> May 7 00:47:21 node2 crmd[1129]: notice: Shutting down cluster resource
> manager
>
> Here is the sds1 log(in the attachment)
> May 7 00:38:47 node1 pgsqlms(pgsqld)[4426]: INFO: Execute action monitor
> and the result 0May 7 00:39:03 node1 pgsqlms(pgsqld)[4442]: INFO: Execute
> action monitor and the result 0May 7 00:39:06 node1 crmd[1133]: notice:
> State transition S_IDLE -> S_POLICY_ENGINEMay 7 00:39:06 node1
pengine[1132]:
> warning: Processing failed op monitor for pgsqld:1 on sds2: invalid
parameter
> (2)May 7 00:39:06 node1 pengine[1132]: error: Preventing pgsql-ha from
> re-starting on sds2: operation monitor failed 'invalid parameter' (2)May 7
> 00:39:06 node1 pengine[1132]: notice: Promote pgsqld:0#011(Slave -> Master
> sds1)May 7 00:39:06 node1 pengine[1132]: notice: Demote
> pgsqld:1#011(Master -> Stopped sds2)May 7 00:39:06 node1 pengine[1132]:
> notice: Move master-vip#011(Started sds2 -> sds1)May 7 00:39:06 node1
> pengine[1132]: notice: Calculated transition 31, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-97.bz2May 7 00:39:06 node1
> pengine[1132]: warning: Processing failed op monitor for pgsqld:1 on sds2:
> invalid parameter (2)May 7 00:39:06 node1 pengine[1132]: error:
Preventing
> pgsql-ha from re-starting on sds2: operation monitor failed 'invalid
> parameter' (2)May 7 00:39:06 node1 pengine[1132]: notice: Promote
> pgsqld:0#011(Slave -> Master sds1)May 7 00:39:06 node1 pengine[1132]:
> notice: Demote pgsqld:1#011(Master -> Stopped sds2)May 7 00:39:06 node1
> pengine[1132]: notice: Move master-vip#011(Started sds2 -> sds1)May 7
> 00:39:06 node1 pengine[1132]: notice: Calculated transition 32, saving
> inputs in /var/lib/pacemaker/pengine/pe-input-98.bz2May 7 00:39:06 node1
> crmd[1133]: notice: Initiating cancel operation pgsqld_monitor_16000
locally
> on sds1May 7 00:39:06 node1 crmd[1133]: notice: Initiating notify
operation
> pgsqld_pre_notify_demote_0 locally on sds1May 7 00:39:06 node1 crmd[1133]:
> notice: Initiating notify operation pgsqld_pre_notify_demote_0 on sds2
More information about the Users
mailing list