<div dir="ltr"><div>Yes I haven't been using the "nodes" element in the XML, only the "resources" element. I couldn't find "<span style="font-size:12.8px">node_state" elements or attributes in the XML, so after some searching I found that it is in the CIB that can be gotten with "pcs cluster cib foo.xml". I will start exploring this as an alternative to crm_mon/"pcs status".</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">However I still find what happens to be confusing, so below I try</span> to better explain what I see:</div><div><br></div><div><br></div><div>Before "pcs cluster start test3" at 10:45:36.362 (test3 has been HW shutdown a minute ago):</div><div><br></div><div>crm_mon -1:</div><div><br></div><div> Stack: corosync</div><div> Current DC: test1 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum</div><div> Last updated: Fri May 12 10:45:36 2017 Last change: Fri May 12 09:18:13 2017 by root via crm_attribute on test1</div><div><br></div><div> 3 nodes and 4 resources configured</div><div><br></div><div> Online: [ test1 test2 ]</div><div> OFFLINE: [ test3 ]</div><div><br></div><div> Active resources:</div><div><br></div><div> Master/Slave Set: pgsql-ha [pgsqld]</div><div> Masters: [ test1 ]</div><div> Slaves: [ test2 ]</div><div> pgsql-master-ip (ocf::heartbeat:IPaddr2): Started test1</div><div><br></div><div> </div><div>crm_mon -X:</div><div><br></div><div> <resources></div><div> <clone id="pgsql-ha" multi_state="true" unique="false" managed="true" failed="false" failure_ignored="false" ></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Master" active="true" orphaned="false" managed="true" f</div><div> ailed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test1" id="1" cached="false"/></div><div> </resource></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Slave" active="true" orphaned="false" managed="true" fa</div><div> iled="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test2" id="2" cached="false"/></div><div> </resource></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Stopped" active="false" orphaned="false" managed="true"</div><div> failed="false" failure_ignored="false" nodes_running_on="0" /></div><div> </clone></div><div> <resource id="pgsql-master-ip" resource_agent="ocf::heartbeat:IPaddr2" role="Started" active="true" orphaned="false" managed</div><div> ="true" failed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test1" id="1" cached="false"/></div><div> </resource></div><div> </resources></div><div><br></div><div><br></div><div><br></div><div>At 10:45:39.440, after "pcs cluster start test3", before first "monitor" on test3 (this is where I can't seem to know that resources on test3 are down):</div><div><br></div><div>crm_mon -1:</div><div><br></div><div> Stack: corosync</div><div> Current DC: test1 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum</div><div> Last updated: Fri May 12 10:45:39 2017 Last change: Fri May 12 10:45:39 2017 by root via crm_attribute on test1</div><div><br></div><div> 3 nodes and 4 resources configured</div><div><br></div><div> Online: [ test1 test2 test3 ]</div><div><br></div><div> Active resources:</div><div><br></div><div> Master/Slave Set: pgsql-ha [pgsqld]</div><div> Masters: [ test1 ]</div><div> Slaves: [ test2 test3 ]</div><div> pgsql-master-ip (ocf::heartbeat:IPaddr2): Started test1</div><div><br></div><div><br></div><div>crm_mon -X:</div><div><br></div><div> <resources></div><div> <clone id="pgsql-ha" multi_state="true" unique="false" managed="true" failed="false" failure_ignored="false" ></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Master" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test1" id="1" cached="false"/></div><div> </resource></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Slave" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test2" id="2" cached="false"/></div><div> </resource></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Slave" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test3" id="3" cached="false"/></div><div> </resource></div><div> </clone></div><div> <resource id="pgsql-master-ip" resource_agent="ocf::heartbeat:IPaddr2" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test1" id="1" cached="false"/></div><div> </resource></div><div> </resources></div><div><br></div><div><br></div><div> </div><div>At 10:45:41.606, after first "monitor" on test3 (I can now tell the resources on test3 are not ready):</div><div><br></div><div>crm_mon -1:</div><div><br></div><div> Stack: corosync</div><div> Current DC: test1 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum</div><div> Last updated: Fri May 12 10:45:41 2017 Last change: Fri May 12 10:45:39 2017 by root via crm_attribute on test1</div><div><br></div><div> 3 nodes and 4 resources configured</div><div><br></div><div> Online: [ test1 test2 test3 ]</div><div><br></div><div> Active resources:</div><div><br></div><div> Master/Slave Set: pgsql-ha [pgsqld]</div><div> Masters: [ test1 ]</div><div> Slaves: [ test2 ]</div><div> pgsql-master-ip (ocf::heartbeat:IPaddr2): Started test1</div><div><br></div><div><br></div><div>crm_mon -X:</div><div><br></div><div> <resources></div><div> <clone id="pgsql-ha" multi_state="true" unique="false" managed="true" failed="false" failure_ignored="false" ></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Master" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test1" id="1" cached="false"/></div><div> </resource></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Slave" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test2" id="2" cached="false"/></div><div> </resource></div><div> <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" role="Stopped" active="false" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="0" /></div><div> </clone></div><div> <resource id="pgsql-master-ip" resource_agent="ocf::heartbeat:IPaddr2" role="Started" active="true" orphaned="false" managed="true" failed="false" failure_ignored="false" nodes_running_on="1" ></div><div> <node name="test1" id="1" cached="false"/></div><div> </resource></div><div> </resources></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 12, 2017 at 12:45 AM, Ken Gaillot <span dir="ltr"><<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 05/11/2017 03:00 PM, Ludovic Vaugeois-Pepin wrote:<br>
> Hi<br>
> I translated the a Postgresql multi state RA<br>
> (<a href="https://github.com/dalibo/PAF" rel="noreferrer" target="_blank">https://github.com/dalibo/PAF</a><wbr>) in Python<br>
> (<a href="https://github.com/ulodciv/deploy_cluster" rel="noreferrer" target="_blank">https://github.com/ulodciv/<wbr>deploy_cluster</a>), and I have been editing it<br>
> heavily.<br>
><br>
> In parallel I am writing unit tests and functional tests.<br>
><br>
> I am having an issue with a functional test that abruptly powers off a<br>
> slave named says "host3" (hot standby PG instance). Later on I start the<br>
> slave back. Once it is started, I run "pcs cluster start host3". And<br>
> this is where I start having a problem.<br>
><br>
> I check every second the output of "pcs status xml" until host3 is said<br>
> to be ready as a slave again. In the following I assume that test3 is<br>
> ready as a slave:<br>
><br>
> <nodes><br>
> <node name="test1" id="1" online="true" standby="false"<br>
> standby_onfail="false" maintenance="false" pending="false"<br>
> unclean="false" shutdown="false" expected_up="true" is_dc="false"<br>
> resources_running="2" type="member" /><br>
> <node name="test2" id="2" online="true" standby="false"<br>
> standby_onfail="false" maintenance="false" pending="false"<br>
> unclean="false" shutdown="false" expected_up="true" is_dc="true"<br>
> resources_running="1" type="member" /><br>
> <node name="test3" id="3" online="true" standby="false"<br>
> standby_onfail="false" maintenance="false" pending="false"<br>
> unclean="false" shutdown="false" expected_up="true" is_dc="false"<br>
> resources_running="1" type="member" /><br>
> </nodes><br>
<br>
</div></div>The <nodes> section says nothing about the current state of the nodes.<br>
Look at the <node_state> entries for that. in_ccm means the cluster<br>
stack level, and crmd means the pacemaker level -- both need to be up.<br>
<span class=""><br>
> <resources><br>
> <clone id="pgsql-ha" multi_state="true" unique="false"<br>
> managed="true" failed="false" failure_ignored="false" ><br>
> <resource id="pgsqld" resource_agent="ocf::<wbr>heartbeat:pgha"<br>
> role="Slave" active="true" orphaned="false" managed="true"<br>
> failed="false" failure_ignored="false" nodes_running_on="1" ><br>
> <node name="test3" id="3" cached="false"/><br>
> </resource><br>
> <resource id="pgsqld" resource_agent="ocf::<wbr>heartbeat:pgha"<br>
> role="Master" active="true" orphaned="false" managed="true"<br>
> failed="false" failure_ignored="false" nodes_running_on="1" ><br>
> <node name="test1" id="1" cached="false"/><br>
> </resource><br>
> <resource id="pgsqld" resource_agent="ocf::<wbr>heartbeat:pgha"<br>
> role="Slave" active="true" orphaned="false" managed="true"<br>
> failed="false" failure_ignored="false" nodes_running_on="1" ><br>
> <node name="test2" id="2" cached="false"/><br>
> </resource><br>
> </clone><br>
> By ready to go I mean that upon running "pcs cluster start test3", the<br>
> following occurs before test3 appears ready in the XML:<br>
><br>
> pcs cluster start test3<br>
</span>> monitor-> RA returns unknown error (1)<br>
<span class="">> notify/pre-stop -> RA returns ok (0)<br>
> stop -> RA returns ok (0)<br>
</span>> start-> RA returns ok (0)<br>
<span class="">><br>
> The problem I have is that between "pcs cluster start test3" and<br>
> "monitor", it seems that the XML returned by "pcs status xml" says test3<br>
> is ready (the XML extract above is what I get at that moment). Once<br>
> "monitor" occurs, the returned XML shows test3 to be offline, and not<br>
> until the start is finished do I once again have test3 shown as ready.<br>
><br>
> I am getting anything wrong? Is there a simpler or better way to check<br>
> if test3 is fully functional again, ie OCF start was successful?<br>
><br>
> Thanks<br>
><br>
> Ludovic<br>
<br>
</span>______________________________<wbr>_________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Ludovic Vaugeois-Pepin<br></div>
</div>