<div dir="ltr">First of all, setting the 3rd host to be a standby (this was done before any of the resources were created) didn't stop Pacemaker attempting to start the resources there (that fails as MySQL isn't installed on that server)....<div>
<br></div><div><div>[root@drbd1 billy]# pcs status</div><div>Last updated: Wed Jul 10 13:56:20 2013</div><div>Last change: Wed Jul 10 13:55:16 2013 via cibadmin on drbd1.localdomain</div><div>Stack: cman</div><div>Current DC: drbd1.localdomain - partition with quorum</div>
<div>Version: 1.1.8-7.el6-394e906</div><div>3 Nodes configured, unknown expected votes</div><div>5 Resources configured.</div><div><br></div><div><br></div><div>Node drbd3.localdomain: standby</div><div>Online: [ drbd1.localdomain drbd2.localdomain ]</div>
<div><br></div><div>Full list of resources:</div><div><br></div><div> Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]</div><div> Masters: [ drbd1.localdomain ]</div><div> Slaves: [ drbd2.localdomain ]</div><div> Resource Group: g_mysql</div>
<div> p_fs_mysql (ocf::heartbeat:Filesystem): Started drbd1.localdomain</div><div> p_ip_mysql (ocf::heartbeat:IPaddr2): Started drbd1.localdomain</div><div> p_mysql (ocf::heartbeat:mysql): Started drbd1.localdomain</div>
<div><br></div><div>Failed actions:</div><div> p_mysql_monitor_0 (node=drbd3.localdomain, call=18, rc=5, status=complete): not installed</div></div><div><br></div><div>... </div><div><br></div><div style>Is that a bug?</div>
<div style><br></div><div style>It does at least let me "pcs resource move" my resources and they switch between drbd1 and drbd2.</div><div style><br></div><div style>While the resources are running on drbd1, "ifdown" its network connection. What I'd hope would happen in that scenario is that it would be recognised that there's still a quorum (drbd2 + drbd3) and the resources would be migrated to drbd2; instead the resources are stopped...</div>
<div style><br></div><div style><div>[root@drbd2 billy]# pcs status</div><div>Last updated: Wed Jul 10 14:03:03 2013</div><div>Last change: Wed Jul 10 13:59:19 2013 via crm_resource on drbd1.localdomain</div><div>Stack: cman</div>
<div>Current DC: drbd2.localdomain - partition with quorum</div><div>Version: 1.1.8-7.el6-394e906</div><div>3 Nodes configured, unknown expected votes</div><div>5 Resources configured.</div><div><br></div><div><br></div><div>
Node drbd3.localdomain: standby</div><div>Online: [ drbd2.localdomain ]</div><div>OFFLINE: [ drbd1.localdomain ]</div><div><br></div><div>Full list of resources:</div><div><br></div><div> Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]</div>
<div> Masters: [ drbd2.localdomain ]</div><div> Stopped: [ p_drbd_mysql:1 ]</div><div> Resource Group: g_mysql</div><div> p_fs_mysql (ocf::heartbeat:Filesystem): Stopped</div><div> p_ip_mysql (ocf::heartbeat:IPaddr2): Stopped</div>
<div> p_mysql (ocf::heartbeat:mysql): Stopped</div><div><br></div><div>Failed actions:</div><div> p_mysql_monitor_0 (node=drbd3.localdomain, call=18, rc=5, status=complete): not installed</div><div><br></div></div>
<div style>...</div><div style><br></div><div style>When I look at the log files, I see that there's an attempt to fence drbd1 even though I have <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/> in the CIB. Why would the cluster still be attempting to STONITH?</div>
<div style><br></div><div style>The CIB and the log files from the time I dropped the network connection can be found at <a href="http://clusterdb.com/upload/pacemaker_logs.zip">http://clusterdb.com/upload/pacemaker_logs.zip</a></div>
<div style><br></div><div style>Thanks for the help, Andrew.</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 10 July 2013 12:02, Andrew Beekhof <span dir="ltr"><<a href="mailto:andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><br>
On 09/07/2013, at 3:59 PM, Andrew Morgan <<a href="mailto:andrewjamesmorgan@gmail.com">andrewjamesmorgan@gmail.com</a>> wrote:<br>
<br>
><br>
><br>
><br>
> On 9 July 2013 04:11, Andrew Beekhof <<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>> wrote:<br>
><br>
> On 08/07/2013, at 11:35 PM, Andrew Morgan <<a href="mailto:andrewjamesmorgan@gmail.com">andrewjamesmorgan@gmail.com</a>> wrote:<br>
><br>
> > Thanks Florian.<br>
> ><br>
> > The problem I have is that I'd like to define a HA configuration that isn't dependent on a specific set of fencing hardware (or any fencing hardware at all for that matter) and as the stack has the quorum capability included I'm hoping that this is an option.<br>
> ><br>
> > I've not been able to find any quorum commands within pcs; the closest I've found is setting a node to "standby" but when I do that, it appears to have lost its quorum vote<br>
><br>
> This is not the case.<br>
><br>
> My test was to have 3 nodes, node 3 defined as being on standby. My resources were running on node 2. I then dropped the network connection on node 2 hoping that node 1 and node 3 would maintain a quorum and that the resources would start on node 1 - instead the resources were stopped.<br>
<br>
</div>I'd like to see logs of that. Because I'm having a really hard time believing it.<br>
<div class="im"><br>
><br>
> I have quorum enabled but on pcs status it says that the number of votes required is unknown - is there something else that I need to configure?<br>
<br>
</div>Something sounds very wrong with your cluster.<br>
<div class="HOEnZb"><div class="h5"><br>
><br>
><br>
><br>
> > - this seems at odds with the help text....<br>
> ><br>
> > standby <node><br>
> > Put specified node into standby mode (the node specified will no longer be able to host resources<br>
> ><br>
> > Regards, Andrew.<br>
> ><br>
> ><br>
> > On 8 July 2013 10:23, Florian Crouzat <<a href="mailto:gentoo@floriancrouzat.net">gentoo@floriancrouzat.net</a>> wrote:<br>
> > Le 08/07/2013 09:49, Andrew Morgan a écrit :<br>
> ><br>
> > I'm attempting to implement a 3 node cluster where only 2 nodes are<br>
> > there to actually run the services and the 3rd is there to form a quorum<br>
> > (so that the cluster stays up when one of the 2 'workload' nodes fails).<br>
> ><br>
> > To this end, I added a location avoids contraint so that the services<br>
> > (including drbd) don't get placed on the 3rd node (drbd3)...<br>
> ><br>
> > pcs constraint location ms_drbd avoids drbd3.localdomain<br>
> ><br>
> > the problem is that this constraint doesn't appear to be enforced and I<br>
> > see failed actions where Pacemaker has attempted to start the services<br>
> > on drbd3. In most cases I can just ignore the error but if I attempt to<br>
> > migrate the services using "pcs move" then it causes a fatal startup<br>
> > loop for drbd. If I migrate by adding an extra location contraint<br>
> > preferring the other workload node then I can migrate ok.<br>
> ><br>
> > I'm using Oracle Linux 6.4; drbd83-utils 8.3.11; corosync 1.4.1; cman<br>
> > 3.0.12.1; Pacemaker 1.1.8 & pcs 1.1.8<br>
> ><br>
> ><br>
> > I'm no quorum-node expert but I believe your initial design isn't optimal.<br>
> > You could probably even run with only two nodes (real nodes) and no-quorum-policy=ignore + fencing (for data integrity) [1]<br>
> > This is what most (all?) people with two nodes clusters do.<br>
> ><br>
> > But if you really believe you need to be quorate, then I think you need to define your third node as quorum-node in corosync/cman (not sure how since EL6.4 and CMAN) and I cannot find a valid link. IIRC with such definition, you won't need the location constraints.<br>
> ><br>
> ><br>
> > [1] <a href="http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_perform_a_failover.html#_quorum_and_two_node_clusters" target="_blank">http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_perform_a_failover.html#_quorum_and_two_node_clusters</a><br>
> ><br>
> ><br>
> ><br>
> > --<br>
> > Cheers,<br>
> > Florian Crouzat<br>
> ><br>
> > _______________________________________________<br>
> > Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
> ><br>
> > Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> > Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> > Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
> ><br>
> > _______________________________________________<br>
> > Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
> ><br>
> > Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> > Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> > Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
><br>
> _______________________________________________<br>
> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
> _______________________________________________<br>
> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br>
<br>
_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</div></div></blockquote></div><br></div>