[ClusterLabs] pacemaker resources under systemd

Tue Aug 27 10:22:57 EDT 2019

On 27/08/19 15:27 +0200, Ulrich Windl wrote:
> Systemd think he's the boss, doing what he wants: Today I noticed that all
> resources are run inside control group "pacemaker.service" like this:
>   ├─pacemaker.service
>   │ ├─ 26582 isredir-ML1: listening on 172.20.17.238/12503 (2/1)
>   │ ├─ 26601 /usr/bin/perl -w /usr/sbin/ldirectord /etc/ldirectord/mail.conf start
>   │ ├─ 26628 ldirectord tcp:172.20.17.238:25
>   │ ├─ 28963 isredir-DS1: handling 172.20.16.33/10475 -- 172.20.17.200/389
>   │ ├─ 40548 /usr/sbin/pacemakerd -f
>   │ ├─ 40550 /usr/lib/pacemaker/cib
>   │ ├─ 40551 /usr/lib/pacemaker/stonithd
>   │ ├─ 40552 /usr/lib/pacemaker/lrmd
>   │ ├─ 40553 /usr/lib/pacemaker/attrd
>   │ ├─ 40554 /usr/lib/pacemaker/pengine
>   │ ├─ 40555 /usr/lib/pacemaker/crmd
>   │ ├─ 53948 isredir-DS2: handling 172.20.16.33/10570 -- 172.20.17.201/389
>   │ ├─ 92472 isredir-DS1: listening on 172.20.17.204/12511 (13049/3)
> ...
> 
> (that "isredir" stuff is my own resource that forks processes and creates
> threads on demand, thus modifying process (and thread) titles to help
> understanding what's going on...)
> 
> My resources are started via OCF RA (shell script), not a systemd unit.
> 
> Wouldn't it make much more sense if each resource would run in its
> own control group?

While listing like above may be confusing, the main problem perhaps
is that all the resource restrictions you specify in pacemaker service
file will be accounted to the mix of stack-native and stack-managed
resources (unless when of systemd class), hence making all those
containment features and supervision of systemd rather unusable, since
there's no tight (vs. rather open-ended) blackbox to reason about.

There have been some thoughts that pacemaker could become the
delegated controller of its own delegated cgroup subtrees in the
past, however.

There is a nice document detailing various possibilities, but
also looks pretty overwhelming on the first look:
https://systemd.io/CGROUP_DELEGATION
Naively, i-like-continents integration option there looks most
appealing to me at this point.

If anyone has insights into cgroups and how it pairs with systemd
and could pair with pacemaker, please do speak up, it could be
a great help in sketching the design in this area.

> I mean: If systemd thinks everything MUST run in some control group,
> why not pick the "correct " one? Having the pacemaker infrastructure
> in the same control group as all the resources seems to be a bad
> idea IMHO.

No doubts it is suboptimal.

> The other "discussable feature" are "high PIDs" like "92472". While port
> numbers are still 16 bit (in IPv4 at least), I see little sense in having
> millions of processes or threads.

Have seen your questioning this at the systemd ML, but wouldn't think
of any kind of inconveniences in that regard, modulo pre-existing real
bugs.  It actually slightly helps to unbreak firm-guarantees-lacking
design based on PID liveness (risk of process ID recycling is still
better than downright crazy "process grep'ing", totally unsuitable
when chroots, PID namespaces or containers rooted on that very host
get into the picture, but not much better otherwise[1]!).

[1] https://lists.clusterlabs.org/pipermail/users/2019-July/025978.html

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190827/7e358032/attachment.sig>