[ClusterLabs] Maintenance & Pacemaker Restart Demotes MS Resources

Ken Gaillot kgaillot at redhat.com
Wed Jun 5 18:17:37 EDT 2019


On Wed, 2019-06-05 at 13:28 -0700, Dirk Gassen wrote:
> Thanks for your quick reply. I should have been a bit more verbose in
> my problem description.
> 
> After starting up pacemaker again and before "crm node testras3
> ready" I did actually monitor the cluster with "crm_mon" and waited
> until it indicated that it knew about the states of the resources.
> 
> Here is actually the excerpt from syslog:
> * crm node maintenance testras3
> > 16:14:50 On loss of CCM Quorum: Ignore
> > 16:14:50 Forcing unmanaged master MariaDB:0 to remain promoted on
> testras3
> > 16:14:50 Calculated Transition 12: /var/lib/pacemaker/pengine/pe-
> input-72.bz2
> * systemctl stop pacemaker
> > 16:15:29 On loss of CCM Quorum: Ignore
> > 16:15:29 Forcing unmanaged master MariaDB:0 to remain promoted on
> testras3

Ah, there is no master score for MariaDB, so when the node leaves
maintenance mode, the resource must be demoted.

Restarting pacemaker clears all transient node attributes (including
the master score). The next monitor would set it again, but maintenance
mode cancels monitors, so it won't run until it comes out of
maintenance mode, at which point it wants to do the demote.

A good way around this would be to unmanage the MariaDB resource before
putting the node in maintenance. When you take the node out of
maintenance, the monitor will start up again, but it won't take any
actions. Once the monitor runs and sets the master score (which you can
confirm with crm_master --query --resource MariaDB --node <node>), you
can manage the resource.

> > 16:15:29 Scheduling Node testras3 for shutdown
> > 16:15:29 Calculated Transition 13: /var/lib/pacemaker/pengine/pe-
> input-73.bz2
> > 16:15:29 Invoking handler for signal 15: Terminated
> * systemctl start pacemaker
> > 16:15:57 Additional logging available in /var/log/pacemaker.log
> > 16:16:20 On loss of CCM Quorum: Ignore
> > 16:16:20 Calculated Transition 0: /var/lib/pacemaker/pengine/pe-
> input-74.bz2
> > 16:16:20 On loss of CCM Quorum: Ignore
> > 16:16:20 Forcing unmanaged master MariaDB:0 to remain promoted on
> testras3
> > 16:16:20 Calculated Transition 1: /var/lib/pacemaker/pengine/pe-
> input-75.bz2
> * crm node ready testras3
> > 16:18:01 On loss of CCM Quorum: Ignore
> > 16:18:01 Stop    AppserverIP#011(testras3)
> > 16:18:01 Demote  MariaDB:0#011(Master -> Slave testras3)
> > 16:18:01 Calculated Transition 2: /var/lib/pacemaker/pengine/pe-
> input-76.bz2
> > 16:18:01 On loss of CCM Quorum: Ignore
> > 16:18:01 Start   AppserverIP#011(testras3)
> > 16:18:01 Promote MariaDB:0#011(Slave -> Master testras3)
> > 16:18:01 Calculated Transition 3: /var/lib/pacemaker/pengine/pe-
> input-77.bz2
> > 16:18:02 On loss of CCM Quorum: Ignore
> > 16:18:02 Calculated Transition 4: /var/lib/pacemaker/pengine/pe-
> input-78.bz2
> 
> So it looks like to me that the cluster is demoting ms_MariaDB from
> Master to Slave. I'm not sure if I should have waited for something
> else to occur?
> 
> I have attached pe-input-76.bz2.
> 
> Dirk
> 
> On Wed, Jun 5, 2019 at 10:22 AM Ken Gaillot <kgaillot at redhat.com>
> wrote:
> > On Wed, 2019-06-05 at 07:40 -0700, Dirk Gassen wrote:
> > > Hi,
> > > 
> > > I have the following CIB:
> > > > primitive AppserverIP IPaddr \
> > > >         params ip=10.1.8.70 cidr_netmask=255.255.255.192
> > nic=eth0 \
> > > >         op monitor interval=30s
> > > > primitive MariaDB mysql \
> > > >         params binary="/usr/bin/mysqld_safe"
> > > pid="/var/run/mysqld/mysqld.pid"
> > socket="/var/run/mysqld/mysqld.sock"
> > > replication_user=repl replication_passwd="r3plic at tion"
> > > max_slave_lag=15 evict_outdated_slaves=false test_user=repl
> > > test_passwd="r3plic at tion" config="/etc/mysql/my.cnf" user=mysql
> > > group=mysql datadir="/opt/mysql" \
> > > >         op monitor interval=27s role=Master OCF_CHECK_LEVEL=1 \
> > > >         op monitor interval=35s timeout=30 role=Slave
> > > OCF_CHECK_LEVEL=1 \
> > > >         op start interval=0 timeout=130 \
> > > >         op stop interval=0 timeout=130
> > > > ms ms_MariaDB MariaDB \
> > > >         meta master-max=1 master-node-max=1 clone-node-max=1
> > > notify=true globally-unique=false target-role=Started is-
> > managed=true
> > > > colocation colo_sm_aip inf: AppserverIP:Started
> > ms_MariaDB:Master
> > > 
> > > When I do "crm node testras3 maintenance && systemctl stop
> > pacemaker
> > > && systemctl start pacemaker && crm node testras3 ready" the
> > cluster
> > > decides to demote ms_MariaDB and (because of the colocation) to
> > stop
> > > AppserverIP. it then follows up immediately with promoting
> > ms_MariaDB
> > > and starting AppserverIP again.
> > > 
> > > If I leave out restarting pacemaker the cluster does not demote
> > > ms_MariaDB and AppserverIP is left running.
> > > 
> > > Why is the demotion happening and is there a way to avoid this?
> > 
> > It looks like there isn't enough time between starting pacemaker
> > and
> > taking the node out of maintenance for pacemaker to re-detect the
> > state
> > of all resources. It's best to do that manually, i.e. wait for the
> > status output to show all the resources again, but you could
> > automate
> > it with a fixed sleep or maybe a brief sleep plus crm_resource --
> > wait.
> > 
> > > Corosync 2.3.5-3ubuntu2.3 and Pacemaker 1.1.14-2ubuntu1.6
> > > 
> > > Sincerely,
> > > Dirk
> > > -- 
> > > Dirk Gassen
> > > Senior Software Engineer | GetWellNetwork
> > > o: 240.482.3146
> > > e: dgassen at getwellnetwork.com
> > > To help people take an active role in their health journey
> > -- 
> > Ken Gaillot <kgaillot at redhat.com>
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list