[ClusterLabs] big trouble with a DRBD resource

Wed Aug 16 15:20:03 CEST 2017

> Hi,
> 

> 
> What happened:
> I tried to configure a simple drbd resource following
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296
> I used this simple snip from the doc:
> configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \
>    op monitor interval=60s
> 
> I did it on live cluster, which is in testing currently. I will never do this
> again. Shadow will be my friend.
> 
> The cluster reacted promptly:
> crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params
> drbd_resource=idcc-devel \
>   > op monitor interval=60
> WARNING: prim_drbd_idcc_devel: default timeout 20s for start is smaller than the
> advised 240
> WARNING: prim_drbd_idcc_devel: default timeout 20s for stop is smaller than the
> advised 100
> WARNING: prim_drbd_idcc_devel: action monitor not advertised in meta-data, it
> may not be supported by the RA
> 
> From what i understand until now is that i didn't configure start/stop
> operations, so the cluster chooses the default from default-action-timeout.
> It didn't configure the monitor operation, because this is not in the meta-data.

> 
> The log says:
> Aug  1 14:19:33 ha-idg-1 drbd(prim_drbd_idcc_devel)[11325]: ERROR: meta
> parameter misconfigured, expected clone-max -le 2, but found unset.
>                                                                                                          ^^^^^^^^^
> Aug  1 14:19:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation
> prim_drbd_idcc_devel_monitor_0: not configured (node=ha-idg-1, call=73, rc=6,
> cib-update=37, confirmed=true)
> Aug  1 14:19:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation
> prim_drbd_idcc_devel_stop_0: not configured (node=ha-idg-1, call=74, rc=6,
> cib-update=38, confirmed=true)
> 

> 
> crm_mon said:
> Failed actions:
>    prim_drbd_idcc_devel_stop_0 on ha-idg-1 'not configured' (6): call=6967,
>    status=complete, exit-reason='none', last-rc-change='Tue Aug  1 14:28:33 2017',
>    queued=0ms, exec=41ms
>    prim_drbd_idcc_devel_monitor_60000 on ha-idg-1 'not configured' (6): call=6968,
>    status=complete, exit-reason='none', last-rc-change='Tue Aug  1 14:28:33 2017',
>    queued=0ms, exec=41ms
>    prim_drbd_idcc_devel_stop_0 on ha-idg-2 'not configured' (6): call=6963,
>    status=complete, exit-reason='none', last-rc-change='Tue Aug  1 14:28:33 2017',
>    queued=0ms, exec=40ms
> 
> A big problem was that i have a ClusterMon resource running on each node. It
> triggered about 20000 snmp traps in 193 seconds to my management station, which
> triggered 20000 e-Mails ...
> From where comes this incredible amount of traps ? Nearly all traps said that
> stop is not configured for the drdb resource. Why complaining so often ? And
> why stopping after ~20.000 traps ?
> And complaining about not configured monitor operation just 8 times.

Ok. I configured the drbd resource wrong/completely, and that caused the trouble.
What i would like to know:
- from where does crm_mon retrieves its information ?
- why did i get tons of lines in syslog ? One message that the resource isn't configured correctly/completely would be enough.
I got thousands and thousands lines telling the same.

Bernd

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671