[ClusterLabs] Antw: Re: pacemaker resources under systemd
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Sep 12 05:40:04 EDT 2019
Hi!
I just discovered an unpleasant side-effect of this:
SLES has "zypper ps" to show processes that use obsoleted binaries. Now if any
resource binary was replaced, zypper suggests to restart pacemaker (which is
nonsense, of course).
Example:
# zypper ps
The following running processes use deleted files:
PID | PPID | UID | User | Command | Service | Files
-------+-------+-----+-------+-------------------+-----------+-----------------------------
2558 | 92480 | 0 | root | isredir (deleted) | pacemaker |
/usr/bin/isredir (deleted)
The file definitely is not a part of pacemaker!
Regards,
Ulrich
>>> Jan Pokorný <jpokorny at redhat.com> schrieb am 27.08.2019 um 16:22 in
Nachricht
<20190827142256.GA26851 at redhat.com>:
> On 27/08/19 15:27 +0200, Ulrich Windl wrote:
>> Systemd think he's the boss, doing what he wants: Today I noticed that all
>> resources are run inside control group "pacemaker.service" like this:
>> ├─pacemaker.service
>> │ ├─ 26582 isredir-ML1: listening on 172.20.17.238/12503 (2/1)
>> │ ├─ 26601 /usr/bin/perl -w /usr/sbin/ldirectord
/etc/ldirectord/mail.conf
> start
>> │ ├─ 26628 ldirectord tcp:172.20.17.238:25
>> │ ├─ 28963 isredir-DS1: handling 172.20.16.33/10475 -- 172.20.17.200/389
>> │ ├─ 40548 /usr/sbin/pacemakerd -f
>> │ ├─ 40550 /usr/lib/pacemaker/cib
>> │ ├─ 40551 /usr/lib/pacemaker/stonithd
>> │ ├─ 40552 /usr/lib/pacemaker/lrmd
>> │ ├─ 40553 /usr/lib/pacemaker/attrd
>> │ ├─ 40554 /usr/lib/pacemaker/pengine
>> │ ├─ 40555 /usr/lib/pacemaker/crmd
>> │ ├─ 53948 isredir-DS2: handling 172.20.16.33/10570 -- 172.20.17.201/389
>> │ ├─ 92472 isredir-DS1: listening on 172.20.17.204/12511 (13049/3)
>> ...
>>
>> (that "isredir" stuff is my own resource that forks processes and creates
>> threads on demand, thus modifying process (and thread) titles to help
>> understanding what's going on...)
>>
>> My resources are started via OCF RA (shell script), not a systemd unit.
>>
>> Wouldn't it make much more sense if each resource would run in its
>> own control group?
>
> While listing like above may be confusing, the main problem perhaps
> is that all the resource restrictions you specify in pacemaker service
> file will be accounted to the mix of stack-native and stack-managed
> resources (unless when of systemd class), hence making all those
> containment features and supervision of systemd rather unusable, since
> there's no tight (vs. rather open-ended) blackbox to reason about.
>
> There have been some thoughts that pacemaker could become the
> delegated controller of its own delegated cgroup subtrees in the
> past, however.
>
> There is a nice document detailing various possibilities, but
> also looks pretty overwhelming on the first look:
> https://systemd.io/CGROUP_DELEGATION
> Naively, i-like-continents integration option there looks most
> appealing to me at this point.
>
> If anyone has insights into cgroups and how it pairs with systemd
> and could pair with pacemaker, please do speak up, it could be
> a great help in sketching the design in this area.
>
>> I mean: If systemd thinks everything MUST run in some control group,
>> why not pick the "correct " one? Having the pacemaker infrastructure
>> in the same control group as all the resources seems to be a bad
>> idea IMHO.
>
> No doubts it is suboptimal.
>
>> The other "discussable feature" are "high PIDs" like "92472". While port
>> numbers are still 16 bit (in IPv4 at least), I see little sense in having
>> millions of processes or threads.
>
> Have seen your questioning this at the systemd ML, but wouldn't think
> of any kind of inconveniences in that regard, modulo pre-existing real
> bugs. It actually slightly helps to unbreak firm-guarantees-lacking
> design based on PID liveness (risk of process ID recycling is still
> better than downright crazy "process grep'ing", totally unsuitable
> when chroots, PID namespaces or containers rooted on that very host
> get into the picture, but not much better otherwise[1]!).
>
> [1] https://lists.clusterlabs.org/pipermail/users/2019-July/025978.html
>
> --
> Jan (Poki)
More information about the Users
mailing list