<div dir="ltr">Thank you for your answer!<div><br></div><div>I can't see something useful in the pgsqlms log:</div><div><br></div><div><pre style="white-space:pre-wrap;margin-top:0px;margin-bottom:20px;font-family:Monaco,"Vera Sans Terminal",monospace;color:rgb(51,51,51);font-size:12px;padding:8px 15px;background:rgb(245,245,245);border-radius:5px;border:1px solid rgb(229,229,229)">[1398] server2        cib:     info: cib_perform_op:    ++ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources:  <lrm_resource id="pgsqld" type="pgsqlms" class="ocf" provider="heartbeat"/><br>[1402] server2    pengine:     info: native_print:           pgsqld     (ocf::heartbeat:pgsqlms):       FAILED Master server1<br>[1402] server2    pengine:     info: native_print:           pgsqld     (ocf::heartbeat:pgsqlms):       FAILED server1<br>1402] server2    pengine:     info: native_print:           pgsqld     (ocf::heartbeat:pgsqlms):       FAILED server1 (blocked)<br>pgsqlms(pgsqld)[3824]:  Mar 06 18:04:21  WARNING: No secondary connected to the master<br>pgsqlms(pgsqld)[3824]:  Mar 06 18:04:21  WARNING: "server1" is not connected to the primary<br>[1402] server2    pengine:     info: native_print:           pgsqld     (ocf::heartbeat:pgsqlms):       FAILED server1 (blocked)

</pre></div><div><br></div><div>However, here are the general logs:</div><div><br></div><div><pre style="white-space:pre-wrap;margin-top:0px;margin-bottom:20px;font-family:Monaco,"Vera Sans Terminal",monospace;color:rgb(51,51,51);font-size:12px;padding:8px 15px;background:rgb(245,245,245);border-radius:5px;border:1px solid rgb(229,229,229)">[1398] server2        cib:     info: cib_perform_op:    ++                                         <lrm_rsc_op id="pgsqld_last_0" operation_key="pgsqld_promote_0" operation="promote" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="6:0:0:bb00f80b-88c5-4453-8291-44dae9fcc635" transition-magic="0:0;6:0:0:bb00f80b-88c5-4453-8291-44dae9fcc635" on_node="server2" call-id="26" rc-code="0" op-status="0" interval="0" last-run="1583514091" last-rc-change="1583514091" exec-tim<br>[1398] server2        cib:     info: cib_perform_op:    ++                                         <lrm_rsc_op id="pgsqld_monitor_15000" operation_key="pgsqld_monitor_15000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="7:0:8:bb00f80b-88c5-4453-8291-44dae9fcc635" transition-magic="0:8;7:0:8:bb00f80b-88c5-4453-8291-44dae9fcc635" on_node="server2" call-id="29" rc-code="8" op-status="0" interval="15000" last-rc-change="1583514092" exec-time="469"<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating notify operation pgsqld_pre_notify_start_0 locally on server2 | action 48<br>[1403] server2       crmd:     info: do_lrm_rsc_op:     Performing key=48:5:0:bb00f80b-88c5-4453-8291-44dae9fcc635 op=pgsqld_notify_0<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_notify_0 (48) confirmed on server2 (rc=0)<br>[1403] server2       crmd:   notice: process_lrm_event: Result of notify operation for pgsqld on server2: 0 (ok) | call=31 key=pgsqld_notify_0 confirmed=true cib-update=0<br>[1398] server2        cib:     info: cib_perform_op:    ++                                                                <lrm_rsc_op id="pgsqld_last_failure_0" operation_key="pgsqld_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="5:5:7:bb00f80b-88c5-4453-8291-44dae9fcc635" transition-magic="0:9;5:5:7:bb00f80b-88c5-4453-8291-44dae9fcc635" exit-reason="Instance &quot;pgsqld&quot; controldata indicates a running primary instance, the<br>[1398] server2        cib:     info: cib_perform_op:    ++                                                                <lrm_rsc_op id="pgsqld_last_0" operation_key="pgsqld_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="5:5:7:bb00f80b-88c5-4453-8291-44dae9fcc635" transition-magic="0:9;5:5:7:bb00f80b-88c5-4453-8291-44dae9fcc635" exit-reason="Instance &quot;pgsqld&quot; controldata indicates a running primary instance, the instance<br>[1403] server2       crmd:   notice: abort_transition_graph:    Transition aborted by operation pgsqld_monitor_0 'create' on server1: Event failed | magic=0:9;5:5:7:bb00f80b-88c5-4453-8291-44dae9fcc635 cib=0.24.17 source=match_graph_event:310 complete=false<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_monitor_0 (5) confirmed on server1 (rc=9)<br>[1403] server2       crmd:     info: process_graph_event:       Detected action (5.5) pgsqld_monitor_0.6=master (failed): failed<br>[1403] server2       crmd:     info: abort_transition_graph:    Transition aborted by operation pgsqld_monitor_0 'create' on server1: Event failed | magic=0:9;5:5:7:bb00f80b-88c5-4453-8291-44dae9fcc635 cib=0.24.17 source=match_graph_event:310 complete=false<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_monitor_0 (5) confirmed on server1 (rc=9)<br>[1403] server2       crmd:     info: process_graph_event:       Detected action (5.5) pgsqld_monitor_0.6=master (failed): failed<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating notify operation pgsqld_pre_notify_demote_0 on server1 | action 56<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating notify operation pgsqld_pre_notify_demote_0 locally on server2 | action 58<br>[1403] server2       crmd:     info: do_lrm_rsc_op:     Performing key=58:6:0:bb00f80b-88c5-4453-8291-44dae9fcc635 op=pgsqld_notify_0<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_notify_0 (58) confirmed on server2 (rc=0)<br>[1403] server2       crmd:   notice: process_lrm_event: Result of notify operation for pgsqld on server2: 0 (ok) | call=32 key=pgsqld_notify_0 confirmed=true cib-update=0<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_notify_0 (56) confirmed on server1 (rc=0)<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating demote operation pgsqld_demote_0 on server1 | action 6<br>[1398] server2        cib:     info: cib_perform_op:    +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_last_failure_0']:  @operation_key=pgsqld_demote_0, @operation=demote, @transition-key=6:6:0:bb00f80b-88c5-4453-8291-44dae9fcc635, @transition-magic=0:1;6:6:0:bb00f80b-88c5-4453-8291-44dae9fcc635, @call-id=22, @rc-code=1, @last-run=1583514254, @last-rc-change=1583514254, @exec-time=183<br>[1398] server2        cib:     info: cib_perform_op:    +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_last_0']:  @operation_key=pgsqld_demote_0, @operation=demote, @transition-key=6:6:0:bb00f80b-88c5-4453-8291-44dae9fcc635, @transition-magic=0:1;6:6:0:bb00f80b-88c5-4453-8291-44dae9fcc635, @call-id=22, @rc-code=1, @last-run=1583514254, @last-rc-change=1583514254, @exec-time=183<br>[1403] server2       crmd:  warning: status_from_rc:    Action 6 (pgsqld_demote_0) on server1 failed (target: 0 vs. rc: 1): Error<br>[1403] server2       crmd:   notice: abort_transition_graph:    Transition aborted by operation pgsqld_demote_0 'modify' on server1: Event failed | magic=0:1;6:6:0:bb00f80b-88c5-4453-8291-44dae9fcc635 cib=0.24.21 source=match_graph_event:310 complete=false<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_demote_0 (6) confirmed on server1 (rc=1)<br>[1403] server2       crmd:     info: process_graph_event:       Detected action (6.6) pgsqld_demote_0.22=unknown error: failed<br>[1403] server2       crmd:  warning: status_from_rc:    Action 6 (pgsqld_demote_0) on server1 failed (target: 0 vs. rc: 1): Error<br>[1403] server2       crmd:     info: abort_transition_graph:    Transition aborted by operation pgsqld_demote_0 'modify' on server1: Event failed | magic=0:1;6:6:0:bb00f80b-88c5-4453-8291-44dae9fcc635 cib=0.24.21 source=match_graph_event:310 complete=false<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_demote_0 (6) confirmed on server1 (rc=1)<br>[1403] server2       crmd:     info: process_graph_event:       Detected action (6.6) pgsqld_demote_0.22=unknown error: failed<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating notify operation pgsqld_post_notify_demote_0 on server1 | action 57<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating notify operation pgsqld_post_notify_demote_0 locally on server2 | action 59<br>[1403] server2       crmd:     info: do_lrm_rsc_op:     Performing key=59:6:0:bb00f80b-88c5-4453-8291-44dae9fcc635 op=pgsqld_notify_0<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_notify_0 (57) confirmed on server1 (rc=0)<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_notify_0 (59) confirmed on server2 (rc=0)<br>[1403] server2       crmd:   notice: process_lrm_event: Result of notify operation for pgsqld on server2: 0 (ok) | call=33 key=pgsqld_notify_0 confirmed=true cib-update=0<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating notify operation pgsqld_pre_notify_stop_0 on server1 | action 46<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating notify operation pgsqld_pre_notify_stop_0 locally on server2 | action 47<br>[1403] server2       crmd:     info: do_lrm_rsc_op:     Performing key=47:7:0:bb00f80b-88c5-4453-8291-44dae9fcc635 op=pgsqld_notify_0<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_notify_0 (47) confirmed on server2 (rc=0)<br>[1403] server2       crmd:   notice: process_lrm_event: Result of notify operation for pgsqld on server2: 0 (ok) | call=34 key=pgsqld_notify_0 confirmed=true cib-update=0<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_notify_0 (46) confirmed on server1 (rc=0)<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating stop operation pgsqld_stop_0 on server1 | action 2<br>[1398] server2        cib:     info: cib_perform_op:    +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_last_failure_0']:  @operation_key=pgsqld_stop_0, @operation=stop, @transition-key=2:7:0:bb00f80b-88c5-4453-8291-44dae9fcc635, @transition-magic=0:1;2:7:0:bb00f80b-88c5-4453-8291-44dae9fcc635, @exit-reason=Unexpected state for instance "pgsqld" (returned 9), @call-id=25, @last-run=1583514255, @last-rc-change=1583514255, @exec-tim<br>[1398] server2        cib:     info: cib_perform_op:    +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_last_0']:  @operation_key=pgsqld_stop_0, @operation=stop, @transition-key=2:7:0:bb00f80b-88c5-4453-8291-44dae9fcc635, @transition-magic=0:1;2:7:0:bb00f80b-88c5-4453-8291-44dae9fcc635, @exit-reason=Unexpected state for instance "pgsqld" (returned 9), @call-id=25, @last-run=1583514255, @last-rc-change=1583514255, @exec-time=190<br>[1403] server2       crmd:  warning: status_from_rc:    Action 2 (pgsqld_stop_0) on server1 failed (target: 0 vs. rc: 1): Error<br>[1403] server2       crmd:   notice: abort_transition_graph:    Transition aborted by operation pgsqld_stop_0 'modify' on server1: Event failed | magic=0:1;2:7:0:bb00f80b-88c5-4453-8291-44dae9fcc635 cib=0.24.24 source=match_graph_event:310 complete=false<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_stop_0 (2) confirmed on server1 (rc=1)<br>[1403] server2       crmd:     info: process_graph_event:       Detected action (7.2) pgsqld_stop_0.25=unknown error: failed<br>[1403] server2       crmd:  warning: status_from_rc:    Action 2 (pgsqld_stop_0) on server1 failed (target: 0 vs. rc: 1): Error<br>[1403] server2       crmd:     info: abort_transition_graph:    Transition aborted by operation pgsqld_stop_0 'modify' on server1: Event failed | magic=0:1;2:7:0:bb00f80b-88c5-4453-8291-44dae9fcc635 cib=0.24.24 source=match_graph_event:310 complete=false<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_stop_0 (2) confirmed on server1 (rc=1)<br>[1403] server2       crmd:     info: process_graph_event:       Detected action (7.2) pgsqld_stop_0.25=unknown error: failed<br>[1403] server2       crmd:   notice: te_rsc_command:    Initiating notify operation pgsqld_post_notify_stop_0 locally on server2 | action 48<br>[1403] server2       crmd:     info: do_lrm_rsc_op:     Performing key=48:7:0:bb00f80b-88c5-4453-8291-44dae9fcc635 op=pgsqld_notify_0<br>[1403] server2       crmd:     info: match_graph_event: Action pgsqld_notify_0 (48) confirmed on server2 (rc=0)<br>[1403] server2       crmd:   notice: process_lrm_event: Result of notify operation for pgsqld on server2: 0 (ok) | call=35 key=pgsqld_notify_0 confirmed=true cib-update=0

</pre></div><div>Here I can see this kind of logs:</div><div>Action 6 (pgsqld_demote_0) on server1 failed (target: 0 vs. rc: 1): Error<br></div><div>Action 2 (pgsqld_stop_0) on server1 failed (target: 0 vs. rc: 1): Error<br></div><div><br></div><div>It seems it can not demote the previous master as a slave. Recovery.conf file is present at this (failed) node.</div><div><br></div><div>Should I assume that every ungraceful shutdown scenario (and even manual fence) would result with node failover (so I should rebuild the instance)? Also, rebuilding the instance is problematic - stopping pacemaker/corosync service or any related command is not executed (it's blocked) in this state of the node (FAILED/blocked). I guess it's more intuitive that the node at least should have status stopped, but not FAILED/blocked.

</div><div><br></div><div>Also, as a note, the fencing does work - it successfully reboots any node, the rejoin of the previous master seems to have this problem. </div><div></div><div><br></div><div>Thank you in advance,</div><div>Aleksandra</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 5 Mar 2020 at 12:40, Jehan-Guillaume de Rorthais <<a href="mailto:jgdr@dalibo.com">jgdr@dalibo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello,<br>

<br>

On Thu, 5 Mar 2020 12:21:14 +0100<br>

Aleksandra C <<a href="mailto:aleksandra29c@gmail.com" target="_blank">aleksandra29c@gmail.com</a>> wrote:<br>

[...]<br>

> I would be very happy to use some help from you.<br>

> <br>

> I have configured PostgreSQL cluster with Pacemaker+PAF. The pacemaker<br>

> configuration is the following (from<br>

> <a href="https://clusterlabs.github.io/PAF/Quick_Start-CentOS-7.html" rel="noreferrer" target="_blank">https://clusterlabs.github.io/PAF/Quick_Start-CentOS-7.html</a>)<br>

> <br>

> # pgsqld<br>

> pcs -f cluster1.xml resource create pgsqld ocf:heartbeat:pgsqlms \<br>

>     bindir=/usr/pgsql-9.6/bin pgdata=/var/lib/pgsql/9.6/data     \<br>

>     op start timeout=60s                                         \<br>

>     op stop timeout=60s                                          \<br>

>     op promote timeout=30s                                       \<br>

>     op demote timeout=120s                                       \<br>

>     op monitor interval=15s timeout=10s role="Master"            \<br>

>     op monitor interval=16s timeout=10s role="Slave"             \<br>

>     op notify timeout=60s<br>

<br>

If you can, I would recommend using PostgreSQL v11 or v12. Support for v12 is in<br>

PAF 2.3rc2 which is supposed to be released next week.<br>

<br>

<br>

[...]<br>

> The cluster is behaving in strange way. When I manually fence the master<br>

> node (or ungracefully shutdown), after unfencing/starting, the node has<br>

> status Failed/blocked and the node is constantly fenced(restarted) by the<br>

> fencing agent. Should the fencing recover the cluster as Master/Slave<br>

> without problem?<br>

<br>

I suppose a failover occurred after the ungraceful shutdown? The old primary is<br>

probably seen as crashed from PAF point of view.<br>

<br>

Could you share pgsqlms detailed log?<br>

<br>

[...]<br>

> Is this a cluster misconfiguration? Any idea would be greatly appreciated.<br>

<br>

I don't think so. Make sure to look at<br>

<a href="https://clusterlabs.github.io/PAF/administration.html#failover" rel="noreferrer" target="_blank">https://clusterlabs.github.io/PAF/administration.html#failover</a><br>

<br>

Regards,<br>

</blockquote></div>