[ClusterLabs] Maintenance & Pacemaker Restart Demotes MS Resources

Wed Jun 5 16:28:40 EDT 2019

Thanks for your quick reply. I should have been a bit more verbose in my
problem description.

After starting up pacemaker again and before "crm node testras3 ready" I
did actually monitor the cluster with "crm_mon" and waited until it
indicated that it knew about the states of the resources.

Here is actually the excerpt from syslog:
* crm node maintenance testras3
> 16:14:50 On loss of CCM Quorum: Ignore
> 16:14:50 Forcing unmanaged master MariaDB:0 to remain promoted on testras3
> 16:14:50 Calculated Transition 12:
/var/lib/pacemaker/pengine/pe-input-72.bz2
* systemctl stop pacemaker
> 16:15:29 On loss of CCM Quorum: Ignore
> 16:15:29 Forcing unmanaged master MariaDB:0 to remain promoted on testras3
> 16:15:29 Scheduling Node testras3 for shutdown
> 16:15:29 Calculated Transition 13:
/var/lib/pacemaker/pengine/pe-input-73.bz2
> 16:15:29 Invoking handler for signal 15: Terminated
* systemctl start pacemaker
> 16:15:57 Additional logging available in /var/log/pacemaker.log
> 16:16:20 On loss of CCM Quorum: Ignore
> 16:16:20 Calculated Transition 0:
/var/lib/pacemaker/pengine/pe-input-74.bz2
> 16:16:20 On loss of CCM Quorum: Ignore
> 16:16:20 Forcing unmanaged master MariaDB:0 to remain promoted on testras3
> 16:16:20 Calculated Transition 1:
/var/lib/pacemaker/pengine/pe-input-75.bz2
* crm node ready testras3
> 16:18:01 On loss of CCM Quorum: Ignore
> 16:18:01 Stop    AppserverIP#011(testras3)
> 16:18:01 Demote  MariaDB:0#011(Master -> Slave testras3)
> 16:18:01 Calculated Transition 2:
/var/lib/pacemaker/pengine/pe-input-76.bz2
> 16:18:01 On loss of CCM Quorum: Ignore
> 16:18:01 Start   AppserverIP#011(testras3)
> 16:18:01 Promote MariaDB:0#011(Slave -> Master testras3)
> 16:18:01 Calculated Transition 3:
/var/lib/pacemaker/pengine/pe-input-77.bz2
> 16:18:02 On loss of CCM Quorum: Ignore
> 16:18:02 Calculated Transition 4:
/var/lib/pacemaker/pengine/pe-input-78.bz2

So it looks like to me that the cluster is demoting ms_MariaDB from Master
to Slave. I'm not sure if I should have waited for something else to occur?

I have attached pe-input-76.bz2.

Dirk

On Wed, Jun 5, 2019 at 10:22 AM Ken Gaillot <kgaillot at redhat.com> wrote:

> On Wed, 2019-06-05 at 07:40 -0700, Dirk Gassen wrote:
> > Hi,
> >
> > I have the following CIB:
> > > primitive AppserverIP IPaddr \
> > >         params ip=10.1.8.70 cidr_netmask=255.255.255.192 nic=eth0 \
> > >         op monitor interval=30s
> > > primitive MariaDB mysql \
> > >         params binary="/usr/bin/mysqld_safe"
> > pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock"
> > replication_user=repl replication_passwd="r3plic at tion"
> > max_slave_lag=15 evict_outdated_slaves=false test_user=repl
> > test_passwd="r3plic at tion" config="/etc/mysql/my.cnf" user=mysql
> > group=mysql datadir="/opt/mysql" \
> > >         op monitor interval=27s role=Master OCF_CHECK_LEVEL=1 \
> > >         op monitor interval=35s timeout=30 role=Slave
> > OCF_CHECK_LEVEL=1 \
> > >         op start interval=0 timeout=130 \
> > >         op stop interval=0 timeout=130
> > > ms ms_MariaDB MariaDB \
> > >         meta master-max=1 master-node-max=1 clone-node-max=1
> > notify=true globally-unique=false target-role=Started is-managed=true
> > > colocation colo_sm_aip inf: AppserverIP:Started ms_MariaDB:Master
> >
> > When I do "crm node testras3 maintenance && systemctl stop pacemaker
> > && systemctl start pacemaker && crm node testras3 ready" the cluster
> > decides to demote ms_MariaDB and (because of the colocation) to stop
> > AppserverIP. it then follows up immediately with promoting ms_MariaDB
> > and starting AppserverIP again.
> >
> > If I leave out restarting pacemaker the cluster does not demote
> > ms_MariaDB and AppserverIP is left running.
> >
> > Why is the demotion happening and is there a way to avoid this?
>
> It looks like there isn't enough time between starting pacemaker and
> taking the node out of maintenance for pacemaker to re-detect the state
> of all resources. It's best to do that manually, i.e. wait for the
> status output to show all the resources again, but you could automate
> it with a fixed sleep or maybe a brief sleep plus crm_resource --wait.
>
> > Corosync 2.3.5-3ubuntu2.3 and Pacemaker 1.1.14-2ubuntu1.6
> >
> > Sincerely,
> > Dirk
> > --
> > Dirk Gassen
> > Senior Software Engineer | GetWellNetwork
> > o: 240.482.3146
> > e: dgassen at getwellnetwork.com
> > To help people take an active role in their health journey
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

-- 

Dirk Gassen
Senior Software Engineer | GetWellNetwork
o: 240.482.3146
e: dgassen at getwellnetwork.com <bnigmann at getwellnetwork.com>

To help people take an active role in their health journey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190605/1abc5571/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-input-76.bz2
Type: application/x-bzip2
Size: 2292 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190605/1abc5571/attachment.bz2>