[ClusterLabs] cleanup of a resource leads to restart of Virtual Domains
Yan Gao
YGao at suse.com
Tue Oct 1 06:26:24 EDT 2019
On 9/30/19 6:45 PM, Lentes, Bernd wrote:
>>>
>>> Hi Yan,
>>> I had a look in the logs and what happened when i issued a "resource cleanup" of
>>> the GFS2 resource is
>>> that the cluster deleted an entry in the status section:
>>>
>>> Sep 26 14:52:52 [9317] ha-idg-2 cib: info: cib_process_request:
>>> Completed cib_delete operation for section <=================================================
>>> //node_state[@uname='ha-idg-1']//lrm_resource[@id='dlm']: OK (rc=0,
>
>>> and soon later on it recognized dlm on ha-idg-1 as stopped (or stops it):
>
>>> Sep 26 14:52:54 [9321] ha-idg-2 pengine: info: common_print:
>>> dlm (ocf::pacemaker:controld): Stopped <========================================
>
>>> Sep 26 14:52:54 [9321] ha-idg-2 pengine: info: common_print:
>>> clvmd (ocf::heartbeat:clvm): Started ha-idg-1
>>> Sep 26 14:52:54 [9321] ha-idg-2 pengine: info: common_print:
>
>>>
>>> Following the logs dlm is running before. Does the deletion of that entry leads
>>> to the stop of the dlm resource ?
>>> Is that expected behaviour ?
>> First, unless "force" is specified, cleanup issued
>> for a child resource
>> will do the work for the whole resource group.
>
> Ah. Then i will use "force" in the future when i just want to do
> a "resource cleanup" for one resource in a group.
> But is the initial deleting of the dlm resource in the status section
> the expected behaviour when i do a "resource cleanup" ?
> Is it because it is the first in the row of that group ?
> Sorry for insisting, but i'm interested in really understanding what was going on.
It's supposed to be a feature (arguable :-)) of "crm_resource -C" that
intelligently cleans up "relevant" resources all together at once.
The behavior/idea about cleanup makes more sense in pacemaker-2.0
(SLE-HA 15 releases). It does *real* cleanup only if a resource has any
failures.
>
>> Cleanup deletes resources' history which triggers (re-) probe of
>> resources. But before probe of a resource has been finished, the
>> resource will be shown as "Stopped" which doesn't necessarily mean it's
>> actually "Stopped". A running resource will be detected to be "Started"
>> with the probe.
>
> Deleting history means resetting fail-count and last-failure ?
Fail-count yes, and all the recorded historical operations of the
resource in cib status rather than just the last-failure.
>
>> Restart of VM was because pengine/crmd thought the resources it depended
>> on were really "Stopped" and wasn't patient enough to wait for probe of
>> them to finish. That's what the pull request resolved.
>>
>
> I installed it. is there a way to test it ?
Simply clean up any resources like gfs2 from the resource group that the
VM depends on as you did, and see if VM can remain untouched?
Regards,
Yan
>
> Thanks.
>
> Bernd
>
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
>
More information about the Users
mailing list