[ClusterLabs] Maintenance & Pacemaker Restart Demotes MS Resources

Fri Jun 7 10:31:26 EDT 2019

Thanks, that seems to have been the problem in my case. (For some reason
the attribute did not reappear on its own, but adding it manually w/
crm_attribute did work).

I assume that this happened since I didn't have another node that could
become the DC while restarting pacemaker? If I do add another node then the
problem doesn't seem to appear.

Dirk

On Wed, Jun 5, 2019 at 3:17 PM Ken Gaillot <kgaillot at redhat.com> wrote:

> On Wed, 2019-06-05 at 13:28 -0700, Dirk Gassen wrote:
> > Thanks for your quick reply. I should have been a bit more verbose in
> > my problem description.
> >
> > After starting up pacemaker again and before "crm node testras3
> > ready" I did actually monitor the cluster with "crm_mon" and waited
> > until it indicated that it knew about the states of the resources.
> >
> > Here is actually the excerpt from syslog:
> > * crm node maintenance testras3
> > > 16:14:50 On loss of CCM Quorum: Ignore
> > > 16:14:50 Forcing unmanaged master MariaDB:0 to remain promoted on
> > testras3
> > > 16:14:50 Calculated Transition 12: /var/lib/pacemaker/pengine/pe-
> > input-72.bz2
> > * systemctl stop pacemaker
> > > 16:15:29 On loss of CCM Quorum: Ignore
> > > 16:15:29 Forcing unmanaged master MariaDB:0 to remain promoted on
> > testras3
>
> Ah, there is no master score for MariaDB, so when the node leaves
> maintenance mode, the resource must be demoted.
>
> Restarting pacemaker clears all transient node attributes (including
> the master score). The next monitor would set it again, but maintenance
> mode cancels monitors, so it won't run until it comes out of
> maintenance mode, at which point it wants to do the demote.
>
> A good way around this would be to unmanage the MariaDB resource before
> putting the node in maintenance. When you take the node out of
> maintenance, the monitor will start up again, but it won't take any
> actions. Once the monitor runs and sets the master score (which you can
> confirm with crm_master --query --resource MariaDB --node <node>), you
> can manage the resource.
>
> > > 16:15:29 Scheduling Node testras3 for shutdown
> > > 16:15:29 Calculated Transition 13: /var/lib/pacemaker/pengine/pe-
> > input-73.bz2
> > > 16:15:29 Invoking handler for signal 15: Terminated
> > * systemctl start pacemaker
> > > 16:15:57 Additional logging available in /var/log/pacemaker.log
> > > 16:16:20 On loss of CCM Quorum: Ignore
> > > 16:16:20 Calculated Transition 0: /var/lib/pacemaker/pengine/pe-
> > input-74.bz2
> > > 16:16:20 On loss of CCM Quorum: Ignore
> > > 16:16:20 Forcing unmanaged master MariaDB:0 to remain promoted on
> > testras3
> > > 16:16:20 Calculated Transition 1: /var/lib/pacemaker/pengine/pe-
> > input-75.bz2
> > * crm node ready testras3
> > > 16:18:01 On loss of CCM Quorum: Ignore
> > > 16:18:01 Stop    AppserverIP#011(testras3)
> > > 16:18:01 Demote  MariaDB:0#011(Master -> Slave testras3)
> > > 16:18:01 Calculated Transition 2: /var/lib/pacemaker/pengine/pe-
> > input-76.bz2
> > > 16:18:01 On loss of CCM Quorum: Ignore
> > > 16:18:01 Start   AppserverIP#011(testras3)
> > > 16:18:01 Promote MariaDB:0#011(Slave -> Master testras3)
> > > 16:18:01 Calculated Transition 3: /var/lib/pacemaker/pengine/pe-
> > input-77.bz2
> > > 16:18:02 On loss of CCM Quorum: Ignore
> > > 16:18:02 Calculated Transition 4: /var/lib/pacemaker/pengine/pe-
> > input-78.bz2
> >
> > So it looks like to me that the cluster is demoting ms_MariaDB from
> > Master to Slave. I'm not sure if I should have waited for something
> > else to occur?
> >
> > I have attached pe-input-76.bz2.
> >
> > Dirk
> >
> > On Wed, Jun 5, 2019 at 10:22 AM Ken Gaillot <kgaillot at redhat.com>
> > wrote:
> > > On Wed, 2019-06-05 at 07:40 -0700, Dirk Gassen wrote:
> > > > Hi,
> > > >
> > > > I have the following CIB:
> > > > > primitive AppserverIP IPaddr \
> > > > >         params ip=10.1.8.70 cidr_netmask=255.255.255.192
> > > nic=eth0 \
> > > > >         op monitor interval=30s
> > > > > primitive MariaDB mysql \
> > > > >         params binary="/usr/bin/mysqld_safe"
> > > > pid="/var/run/mysqld/mysqld.pid"
> > > socket="/var/run/mysqld/mysqld.sock"
> > > > replication_user=repl replication_passwd="r3plic at tion"
> > > > max_slave_lag=15 evict_outdated_slaves=false test_user=repl
> > > > test_passwd="r3plic at tion" config="/etc/mysql/my.cnf" user=mysql
> > > > group=mysql datadir="/opt/mysql" \
> > > > >         op monitor interval=27s role=Master OCF_CHECK_LEVEL=1 \
> > > > >         op monitor interval=35s timeout=30 role=Slave
> > > > OCF_CHECK_LEVEL=1 \
> > > > >         op start interval=0 timeout=130 \
> > > > >         op stop interval=0 timeout=130
> > > > > ms ms_MariaDB MariaDB \
> > > > >         meta master-max=1 master-node-max=1 clone-node-max=1
> > > > notify=true globally-unique=false target-role=Started is-
> > > managed=true
> > > > > colocation colo_sm_aip inf: AppserverIP:Started
> > > ms_MariaDB:Master
> > > >
> > > > When I do "crm node testras3 maintenance && systemctl stop
> > > pacemaker
> > > > && systemctl start pacemaker && crm node testras3 ready" the
> > > cluster
> > > > decides to demote ms_MariaDB and (because of the colocation) to
> > > stop
> > > > AppserverIP. it then follows up immediately with promoting
> > > ms_MariaDB
> > > > and starting AppserverIP again.
> > > >
> > > > If I leave out restarting pacemaker the cluster does not demote
> > > > ms_MariaDB and AppserverIP is left running.
> > > >
> > > > Why is the demotion happening and is there a way to avoid this?
> > >
> > > It looks like there isn't enough time between starting pacemaker
> > > and
> > > taking the node out of maintenance for pacemaker to re-detect the
> > > state
> > > of all resources. It's best to do that manually, i.e. wait for the
> > > status output to show all the resources again, but you could
> > > automate
> > > it with a fixed sleep or maybe a brief sleep plus crm_resource --
> > > wait.
> > >
> > > > Corosync 2.3.5-3ubuntu2.3 and Pacemaker 1.1.14-2ubuntu1.6
> > > >
> > > > Sincerely,
> > > > Dirk
> > > > --
> > > > Dirk Gassen
> > > > Senior Software Engineer | GetWellNetwork
> > > > o: 240.482.3146
> > > > e: dgassen at getwellnetwork.com
> > > > To help people take an active role in their health journey
> > > --
> > > Ken Gaillot <kgaillot at redhat.com>
> > >
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

-- 

Dirk Gassen
Senior Software Engineer | GetWellNetwork
o: 240.482.3146
e: dgassen at getwellnetwork.com <bnigmann at getwellnetwork.com>

To help people take an active role in their health journey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190607/0723e61d/attachment-0001.html>