[ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on

Fri Feb 10 17:33:01 EST 2017

On 02/10/2017 06:49 AM, Lentes, Bernd wrote:
> 
> 
> ----- On Feb 10, 2017, at 1:10 AM, Ken Gaillot kgaillot at redhat.com wrote:
> 
>> On 02/09/2017 10:48 AM, Lentes, Bernd wrote:
>>> Hi,
>>>
>>> i have a two node cluster with a vm as a resource. Currently i'm just testing
>>> and playing. My vm boots and shuts down again in 15min gaps.
>>> Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped
>>> (900000ms)" found in the logs. I googled, and it is said that this
>>> is due to time-based rule
>>> (http://oss.clusterlabs.org/pipermail/pacemaker/2009-May/001647.html). OK.
>>> But i don't have any time-based rules.
>>> This is the config for my vm:
>>>
>>> primitive prim_vm_mausdb VirtualDomain \
>>>         params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \
>>>         params hypervisor="qemu:///system" \
>>>         params migration_transport=ssh \
>>>         op start interval=0 timeout=90 \
>>>         op stop interval=0 timeout=95 \
>>>         op monitor interval=30 timeout=30 \
>>>         op migrate_from interval=0 timeout=100 \
>>>         op migrate_to interval=0 timeout=120 \
>>>         meta allow-migrate=true \
>>>         meta target-role=Started \
>>>         utilization cpu=2 hv_memory=4099
>>>
>>> The only constraint concerning the vm i had was a location (which i didn't
>>> create).
>>
>> What is the constraint? If its ID starts with "cli-", it was created by
>> a command-line tool (such as crm_resource, crm shell or pcs, generally
>> for a "move" or "ban" command).
>>
> I deleted the one i mentioned, but now i have two again. I didn't create them.
> Does the crm create constraints itself ?
> 
> location cli-ban-prim_vm_mausdb-on-ha-idg-2 prim_vm_mausdb role=Started -inf: ha-idg-2
> location cli-prefer-prim_vm_mausdb prim_vm_mausdb role=Started inf: ha-idg-2

The command-line tool you use creates them.

If you're using crm_resource, they're created by crm_resource
--move/--ban. If you're using pcs, they're created by pcs resource
move/ban. Etc.

> One location constraint inf, one -inf for the same resource on the same node.
> Isn't that senseless ?

Yes, but that's what you told it to do :-)

The command-line tools move or ban resources by setting constraints to
achieve that effect. Those constraints are permanent until you remove them.

How to clear them again depends on which tool you use ... crm_resource
--clear, pcs resource clear, etc.

> 
> "crm resorce scores" show -inf for that resource on that node:
> native_color: prim_vm_mausdb allocation score on ha-idg-1: 100
> native_color: prim_vm_mausdb allocation score on ha-idg-2: -INFINITY
> 
> Is -inf stronger ?
> Is it true that only the values for "native_color" are notable ?
> 
> A principle question: When i have trouble to start/stop/migrate resources,
> is it senseful to do a "crm resource cleanup" before trying again ?
> (Beneath finding the reason for the trouble).

It's best to figure out what the problem is first, make sure that's
taken care of, then clean up. The cluster might or might not do anything
when you clean up, depending on what stickiness you have, your failure
handling settings, etc.

> Sorry for asking basic stuff. I read a lot before, but in practise it's total different.
> Although i just have a vm as a resource, and i'm only testing, i'm sometimes astonished about the 
> complexity of a simple two node cluster: scores, failcounts, constraints, default values for a lot of variables ...
> you have to keep an eye on a lot of stuff.
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671