[ClusterLabs] Antw: Re: Antw: [EXT] resource cloned group colocations

Gerald Vogt vogt at spamcop.net
Thu Mar 2 11:27:17 EST 2023


On 02.03.23 14:51, Ulrich Windl wrote:
>>>> Gerald Vogt <vogt at spamcop.net> schrieb am 02.03.2023 um 14:43 in Nachricht
> <9ba5cd78-7b3d-32ef-38cf-5c5632c46b9a at spamcop.net>:
>> On 02.03.23 14:30, Ulrich Windl wrote:
>>>>>> Gerald Vogt <vogt at spamcop.net> schrieb am 02.03.2023 um 08:41 in Nachricht
>>> <624d0b70-5983-4d21-6777-55be91688bbe at spamcop.net>:
>>>> Hi,
>>>>
>>>> I am setting up a mail relay cluster which main purpose is to maintain
>>>> the service ips via IPaddr2 and move them between cluster nodes when
>>>> necessary.
>>>>
>>>> The service ips should only be active on nodes which are running all
>>>> necessary mail (systemd) services.
>>>>
>>>> So I have set up a resource for each of those services, put them into a
>>>> group in order they should start, cloned the group as they are normally
>>>> supposed to run on the nodes at all times.
>>>>
>>>> Then I added an order constraint
>>>>      start mail-services-clone then start mail1-ip
>>>>      start mail-services-clone then start mail2-ip
>>>>
>>>> and colocations to prefer running the ips on different nodes but only
>>>> with the clone running:
>>>>
>>>>      colocation add mail2-ip with mail1-ip -1000
>>>>      colocation mail1-ip with mail-services-clone
>>>>      colocation mail2-ip with mail-services-clone
>>>>
>>>> as well as a location constraint to prefer running the first ip on the
>>>> first node and the second on the second
>>>>
>>>>      location mail1-ip prefers ha1=2000
>>>>      location mail2-ip prefers ha2=2000
>>>>
>>>> Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
>>>> fine. mail2-ip will be moved immediately to ha3. Good.
>>>>
>>>> However, if pacemaker on ha2 starts up again, it will immediately remove
>>>> mail2-ip from ha3 and keep it offline, while the services in the group are
>>>> starting on ha2. As the services unfortunately take some time to come
>>>> up, mail2-ip is offline for more than a minute.
>>>
>>> That is because you wanted "mail2-ip prefers ha2=2000", so if the cluster
>> _can_ run it there, then it will, even if it's running elsewhere.
>>>
>>> Maybe explain what you really want.
>>
>> As I wrote before: (and I have "fixed" my copy&paste error above to use
>> consistent resource names now)
>>
>> 1. I want to run all required services on all running nodes at all times.
>>
>> 2. I want two service IPs mail1-ip (ip1) and mail2-ip (ip2) running on
>> the cluster but only on nodes where all required services are already
>> running (and not just starting)
>>
>> 3. Both IPs should be running on two different nodes if possible.
>>
>> 4. Preferably mail1-ip should be on node ha1 if ha1 is running with all
>> required services.
>>
>> 5. Preferably mail2-ip should be on node ha2 if ha1 is running with all
>> required services.
>>
>> So most importantly: I want ip resources mail1-ip and mail2-ip only be
>> active on nodes which are already running all services. They should only
>> be moved to nodes on which all services are already running.
> 
> Hi!
> 
> Usually I prefer simple solutions over hightly complex ones.
> WOuld it work to use a negative colocation for both IPs, as well as a stickiness of maybe 500, then reducing the "prefer" value to something small as 5 or 10.
> Then the IP will stay elsewhere as long as the "basement services" run there.
> 
> This approach does not change the order of resource operations; instead it kind of minimizes them.
> In my experience most people overspecify what the cluster should do.

Well, I guess it's not possible using a group. A group resource seems to 
be good for a colocation the moment the first resource in the group has 
been started (or even: is starting?). For a group which takes longer 
time to completely start, that just doesn't work.

So I suppose the only two options would be to ungroup everything and 
create colocation constraints between each invidual service and the ip 
address. Although I not sure if that would just have the same issue, 
just on a smaller scale.

The other alternative would be to start the services through systemd and 
make pacemaker a dependency to start only after all services are 
running. Pacemaker then only handles the ip addresses...

Thanks,

Gerald


More information about the Users mailing list