[ClusterLabs] cleanup of a resource leads to restart of Virtual Domains

Tue Oct 1 06:26:24 EDT 2019

On 9/30/19 6:45 PM,  Lentes, Bernd  wrote:
>>>
>>> Hi Yan,
>>> I had a look in the logs and what happened when i issued a "resource cleanup" of
>>> the GFS2 resource is
>>> that the cluster deleted an entry in the status section:
>>>
>>> Sep 26 14:52:52 [9317] ha-idg-2        cib:     info: cib_process_request:
>>> Completed cib_delete operation for section  <=================================================
>>> //node_state[@uname='ha-idg-1']//lrm_resource[@id='dlm']: OK (rc=0,
> 
>>> and soon later on it recognized dlm on ha-idg-1 as stopped (or stops it):
> 
>>> Sep 26 14:52:54 [9321] ha-idg-2    pengine:     info: common_print:
>>> dlm    (ocf::pacemaker:controld):      Stopped   <========================================
> 
>>> Sep 26 14:52:54 [9321] ha-idg-2    pengine:     info: common_print:
>>> clvmd  (ocf::heartbeat:clvm):  Started ha-idg-1
>>> Sep 26 14:52:54 [9321] ha-idg-2    pengine:     info: common_print:
> 
>>>
>>> Following the logs dlm is running before. Does the deletion of that entry leads
>>> to the stop of the dlm resource ?
>>> Is that expected behaviour ?
>> First, unless "force" is specified, cleanup issued
>> for a child resource
>> will do the work for the whole resource group.
> 
> Ah. Then i will use "force" in the future when i just want to do
> a "resource cleanup" for one resource in a group.
> But is the initial deleting of the dlm resource in the status section
> the expected behaviour when i do a "resource cleanup" ?
> Is it because it is the first in the row of that group ?
> Sorry for insisting, but i'm interested in really understanding what was going on.

It's supposed to be a feature (arguable :-)) of "crm_resource -C" that 
intelligently cleans up "relevant" resources all together at once.

The behavior/idea about cleanup makes more sense in pacemaker-2.0 
(SLE-HA 15 releases). It does *real* cleanup only if a resource has any 
failures.

> 
>> Cleanup deletes resources' history which triggers (re-) probe of
>> resources. But before probe of a resource has been finished, the
>> resource will be shown as "Stopped" which doesn't necessarily mean it's
>> actually "Stopped". A running resource will be detected to be "Started"
>> with the probe.
> 
> Deleting history means resetting fail-count and last-failure ?

Fail-count yes, and all the recorded historical operations of the 
resource in cib status rather than just the last-failure.

> 
>> Restart of VM was because pengine/crmd thought the resources it depended
>> on were really "Stopped" and wasn't patient enough to wait for probe of
>> them to finish. That's what the pull request resolved.
>>
> 
> I installed it. is there a way to test it ?

Simply clean up any resources like gfs2 from the resource group that the 
VM depends on as you did, and see if VM can remain untouched?

Regards,
   Yan

> 
> Thanks.
> 
> Bernd
>   
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
>