[ClusterLabs] big trouble with a DRBD resource
kgaillot at redhat.com
Mon Aug 7 16:26:27 EDT 2017
On Mon, 2017-08-07 at 21:16 +0200, Lentes, Bernd wrote:
> ----- On Aug 4, 2017, at 10:19 PM, kgaillot kgaillot at redhat.com wrote:
> > The cluster reacted promptly:
> > crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params drbd_resource=idcc-devel \
> > > op monitor interval=60
> > WARNING: prim_drbd_idcc_devel: default timeout 20s for start is smaller than the advised 240
> > WARNING: prim_drbd_idcc_devel: default timeout 20s for stop is smaller than the advised 100
> > WARNING: prim_drbd_idcc_devel: action monitor not advertised in meta-data, it may not be supported by the RA
> > Why is it complaining about missing clone-max ? This is a meta attribute for a clone, but not for a simple resource !?!
> > This message is constantly repeated, it still appears although cluster is in standby since three days.
> > The "ERROR" message is coming from the DRBD resource agent itself, not
> > pacemaker. Between that message and the two separate monitor operations,
> > it looks like the agent will only run as a master/slave clone.
> This message concerning clone-max still appears once per minute in syslog, although both nodes are in standby for days and the drbd resource is unmanaged too.
> With stat i checked that the RA is called once per minute. With strace i found out that it is lrmd which calls the RA with the option "monitor".
> But why is it still checking ? I thought "standby" for the nodes means the cluster does not care anymore about the resources. And "unmanaged" means the same for a dedicated resource.
> So this should mean doubled "don't care anymore about drbd".
Unmanaging doesn't stop monitoring a resource, it only prevents starting
and stopping of the resource. That lets you see the current status, even
if you're in the middle of maintenance or what not. You can disable
monitoring separately by setting the enabled="false" meta-attribute on
the monitor operation.
Standby would normally stop all resources from running on a node (and
thus all monitors as well), but if a resource is unmanaged, standby
won't override that -- it'll prevent the cluster from starting any new
resources on the node, but it won't stop the unmanaged resource (or any
of its monitors).
> crm(live)# status
> Last updated: Mon Aug 7 16:05:13 2017
> Last change: Tue Aug 1 18:54:02 2017 by root via cibadmin on ha-idg-2
> Stack: classic openais (with plugin)
> Current DC: ha-idg-2 - partition with quorum
> Version: 1.1.12-f47ea56
> 2 Nodes configured, 2 expected votes
> 18 Resources configured
> Node ha-idg-1: standby
> Node ha-idg-2: standby
> prim_drbd_idcc_devel (ocf::linbit:drbd): FAILED (unmanaged)[ ha-idg-1 ha-idg-2 ]
> What is interesting: Although saying "action monitor not advertised in meta-data, it may not be supported by the RA", it is:
Pacemaker doesn't print that message, it's probably coming from crm
> case $__OCF_ACTION in
> And the in the "crm ra info ocf:linbit:drbd" mentioned "monitor_Slave" and "monitor_Master" i can't find in the RA. Strange.
> Doesn't "crm ra info ocf:linbit:drbd" retrieve its information from the RA ?
> In the RA i just find:
> <action name="monitor" depth="0" timeout="20" interval="20" role="Slave" />
> <action name="monitor" depth="0" timeout="20" interval="10" role="Master" />
> This is what "crm ra info ocf:linbit:drbd" says:
> Operations' defaults (advisory minimum):
> monitor_Slave timeout=20 interval=20
> monitor_Master timeout=20 interval=10
Ah, this makes more sense to me now ... it looks like the RA actually
supports "monitor" (which is what I expected) and crm ra info is
displaying that differently due to the two supported rules.
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
Ken Gaillot <kgaillot at redhat.com>
More information about the Users