<div dir="ltr"><div>Thanks, that seems to have been the problem in my case. (For some reason the attribute did not reappear on its own, but adding it manually w/ crm_attribute did work).<br></div><div><br></div><div>I assume that this happened since I didn't have another node that could become the DC while restarting pacemaker? If I do add another node then the problem doesn't seem to appear.</div><div><br></div><div>Dirk<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 5, 2019 at 3:17 PM Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 2019-06-05 at 13:28 -0700, Dirk Gassen wrote:<br>

> Thanks for your quick reply. I should have been a bit more verbose in<br>

> my problem description.<br>

> <br>

> After starting up pacemaker again and before "crm node testras3<br>

> ready" I did actually monitor the cluster with "crm_mon" and waited<br>

> until it indicated that it knew about the states of the resources.<br>

> <br>

> Here is actually the excerpt from syslog:<br>

> * crm node maintenance testras3<br>

> > 16:14:50 On loss of CCM Quorum: Ignore<br>

> > 16:14:50 Forcing unmanaged master MariaDB:0 to remain promoted on<br>

> testras3<br>

> > 16:14:50 Calculated Transition 12: /var/lib/pacemaker/pengine/pe-<br>

> input-72.bz2<br>

> * systemctl stop pacemaker<br>

> > 16:15:29 On loss of CCM Quorum: Ignore<br>

> > 16:15:29 Forcing unmanaged master MariaDB:0 to remain promoted on<br>

> testras3<br>

<br>

Ah, there is no master score for MariaDB, so when the node leaves<br>

maintenance mode, the resource must be demoted.<br>

<br>

Restarting pacemaker clears all transient node attributes (including<br>

the master score). The next monitor would set it again, but maintenance<br>

mode cancels monitors, so it won't run until it comes out of<br>

maintenance mode, at which point it wants to do the demote.<br>

<br>

A good way around this would be to unmanage the MariaDB resource before<br>

putting the node in maintenance. When you take the node out of<br>

maintenance, the monitor will start up again, but it won't take any<br>

actions. Once the monitor runs and sets the master score (which you can<br>

confirm with crm_master --query --resource MariaDB --node <node>), you<br>

can manage the resource.<br>

<br>

> > 16:15:29 Scheduling Node testras3 for shutdown<br>

> > 16:15:29 Calculated Transition 13: /var/lib/pacemaker/pengine/pe-<br>

> input-73.bz2<br>

> > 16:15:29 Invoking handler for signal 15: Terminated<br>

> * systemctl start pacemaker<br>

> > 16:15:57 Additional logging available in /var/log/pacemaker.log<br>

> > 16:16:20 On loss of CCM Quorum: Ignore<br>

> > 16:16:20 Calculated Transition 0: /var/lib/pacemaker/pengine/pe-<br>

> input-74.bz2<br>

> > 16:16:20 On loss of CCM Quorum: Ignore<br>

> > 16:16:20 Forcing unmanaged master MariaDB:0 to remain promoted on<br>

> testras3<br>

> > 16:16:20 Calculated Transition 1: /var/lib/pacemaker/pengine/pe-<br>

> input-75.bz2<br>

> * crm node ready testras3<br>

> > 16:18:01 On loss of CCM Quorum: Ignore<br>

> > 16:18:01 Stop    AppserverIP#011(testras3)<br>

> > 16:18:01 Demote  MariaDB:0#011(Master -> Slave testras3)<br>

> > 16:18:01 Calculated Transition 2: /var/lib/pacemaker/pengine/pe-<br>

> input-76.bz2<br>

> > 16:18:01 On loss of CCM Quorum: Ignore<br>

> > 16:18:01 Start   AppserverIP#011(testras3)<br>

> > 16:18:01 Promote MariaDB:0#011(Slave -> Master testras3)<br>

> > 16:18:01 Calculated Transition 3: /var/lib/pacemaker/pengine/pe-<br>

> input-77.bz2<br>

> > 16:18:02 On loss of CCM Quorum: Ignore<br>

> > 16:18:02 Calculated Transition 4: /var/lib/pacemaker/pengine/pe-<br>

> input-78.bz2<br>

> <br>

> So it looks like to me that the cluster is demoting ms_MariaDB from<br>

> Master to Slave. I'm not sure if I should have waited for something<br>

> else to occur?<br>

> <br>

> I have attached pe-input-76.bz2.<br>

> <br>

> Dirk<br>

> <br>

> On Wed, Jun 5, 2019 at 10:22 AM Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>

> wrote:<br>

> > On Wed, 2019-06-05 at 07:40 -0700, Dirk Gassen wrote:<br>

> > > Hi,<br>

> > > <br>

> > > I have the following CIB:<br>

> > > > primitive AppserverIP IPaddr \<br>

> > > >         params ip=10.1.8.70 cidr_netmask=255.255.255.192<br>

> > nic=eth0 \<br>

> > > >         op monitor interval=30s<br>

> > > > primitive MariaDB mysql \<br>

> > > >         params binary="/usr/bin/mysqld_safe"<br>

> > > pid="/var/run/mysqld/mysqld.pid"<br>

> > socket="/var/run/mysqld/mysqld.sock"<br>

> > > replication_user=repl replication_passwd="r3plic@tion"<br>

> > > max_slave_lag=15 evict_outdated_slaves=false test_user=repl<br>

> > > test_passwd="r3plic@tion" config="/etc/mysql/my.cnf" user=mysql<br>

> > > group=mysql datadir="/opt/mysql" \<br>

> > > >         op monitor interval=27s role=Master OCF_CHECK_LEVEL=1 \<br>

> > > >         op monitor interval=35s timeout=30 role=Slave<br>

> > > OCF_CHECK_LEVEL=1 \<br>

> > > >         op start interval=0 timeout=130 \<br>

> > > >         op stop interval=0 timeout=130<br>

> > > > ms ms_MariaDB MariaDB \<br>

> > > >         meta master-max=1 master-node-max=1 clone-node-max=1<br>

> > > notify=true globally-unique=false target-role=Started is-<br>

> > managed=true<br>

> > > > colocation colo_sm_aip inf: AppserverIP:Started<br>

> > ms_MariaDB:Master<br>

> > > <br>

> > > When I do "crm node testras3 maintenance && systemctl stop<br>

> > pacemaker<br>

> > > && systemctl start pacemaker && crm node testras3 ready" the<br>

> > cluster<br>

> > > decides to demote ms_MariaDB and (because of the colocation) to<br>

> > stop<br>

> > > AppserverIP. it then follows up immediately with promoting<br>

> > ms_MariaDB<br>

> > > and starting AppserverIP again.<br>

> > > <br>

> > > If I leave out restarting pacemaker the cluster does not demote<br>

> > > ms_MariaDB and AppserverIP is left running.<br>

> > > <br>

> > > Why is the demotion happening and is there a way to avoid this?<br>

> > <br>

> > It looks like there isn't enough time between starting pacemaker<br>

> > and<br>

> > taking the node out of maintenance for pacemaker to re-detect the<br>

> > state<br>

> > of all resources. It's best to do that manually, i.e. wait for the<br>

> > status output to show all the resources again, but you could<br>

> > automate<br>

> > it with a fixed sleep or maybe a brief sleep plus crm_resource --<br>

> > wait.<br>

> > <br>

> > > Corosync 2.3.5-3ubuntu2.3 and Pacemaker 1.1.14-2ubuntu1.6<br>

> > > <br>

> > > Sincerely,<br>

> > > Dirk<br>

> > > -- <br>

> > > Dirk Gassen<br>

> > > Senior Software Engineer | GetWellNetwork<br>

> > > o: 240.482.3146<br>

> > > e: <a href="mailto:dgassen@getwellnetwork.com" target="_blank">dgassen@getwellnetwork.com</a><br>

> > > To help people take an active role in their health journey<br>

> > -- <br>

> > Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>

> > <br>

> > _______________________________________________<br>

> > Manage your subscription:<br>

> > <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

> > <br>

> > ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

> <br>

> <br>

> _______________________________________________<br>

> Manage your subscription:<br>

> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

> <br>

> ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

-- <br>

Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>

<br>

_______________________________________________<br>

Manage your subscription:<br>

<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

<br>

ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><p style="color:rgb(0,0,0);font-family:Times;font-size:medium"><span style="font-size:10pt;font-family:Verdana,sans-serif;color:rgb(102,102,102);font-weight:bold">Dirk Gassen</span><br><span style="font-size:10pt;font-family:Verdana,sans-serif;color:rgb(102,102,102)">Senior Software Engineer | <span style="color:rgb(79,45,127);font-weight:bold">GetWellNetwork</span><br>o: 240.482.3146<br>e: <a href="mailto:bnigmann@getwellnetwork.com" target="_blank">dgassen@getwellnetwork.com</a><br></span></p><p style="color:rgb(0,0,0);font-family:Times;font-size:medium"><span style="color:rgb(68,68,68);font-family:"Times New Roman";font-size:10pt;font-style:italic;border-top:1px dotted">To help people take an active role in their health journey</span></p></div></div>