[ClusterLabs] Antw: Re: pacemaker resources under systemd

Thu Sep 12 05:40:04 EDT 2019

Hi!

I just discovered an unpleasant side-effect of this:
SLES has "zypper ps" to show processes that use obsoleted binaries. Now if any
resource binary was replaced, zypper suggests to restart pacemaker (which is
nonsense, of course).

Example:
# zypper ps
The following running processes use deleted files:

PID    | PPID  | UID | User  | Command           | Service   | Files
-------+-------+-----+-------+-------------------+-----------+-----------------------------
2558   | 92480 | 0   | root  | isredir (deleted) | pacemaker |
/usr/bin/isredir (deleted)

The file definitely is not a part of pacemaker!

Regards,
Ulrich

>>> Jan Pokorný <jpokorny at redhat.com> schrieb am 27.08.2019 um 16:22 in
Nachricht
<20190827142256.GA26851 at redhat.com>:
> On 27/08/19 15:27 +0200, Ulrich Windl wrote:
>> Systemd think he's the boss, doing what he wants: Today I noticed that all
>> resources are run inside control group "pacemaker.service" like this:
>>   ├─pacemaker.service
>>   │ ├─ 26582 isredir-ML1: listening on 172.20.17.238/12503 (2/1)
>>   │ ├─ 26601 /usr/bin/perl -w /usr/sbin/ldirectord
/etc/ldirectord/mail.conf 
> start
>>   │ ├─ 26628 ldirectord tcp:172.20.17.238:25
>>   │ ├─ 28963 isredir-DS1: handling 172.20.16.33/10475 -- 172.20.17.200/389
>>   │ ├─ 40548 /usr/sbin/pacemakerd -f
>>   │ ├─ 40550 /usr/lib/pacemaker/cib
>>   │ ├─ 40551 /usr/lib/pacemaker/stonithd
>>   │ ├─ 40552 /usr/lib/pacemaker/lrmd
>>   │ ├─ 40553 /usr/lib/pacemaker/attrd
>>   │ ├─ 40554 /usr/lib/pacemaker/pengine
>>   │ ├─ 40555 /usr/lib/pacemaker/crmd
>>   │ ├─ 53948 isredir-DS2: handling 172.20.16.33/10570 -- 172.20.17.201/389
>>   │ ├─ 92472 isredir-DS1: listening on 172.20.17.204/12511 (13049/3)
>> ...
>> 
>> (that "isredir" stuff is my own resource that forks processes and creates
>> threads on demand, thus modifying process (and thread) titles to help
>> understanding what's going on...)
>> 
>> My resources are started via OCF RA (shell script), not a systemd unit.
>> 
>> Wouldn't it make much more sense if each resource would run in its
>> own control group?
> 
> While listing like above may be confusing, the main problem perhaps
> is that all the resource restrictions you specify in pacemaker service
> file will be accounted to the mix of stack-native and stack-managed
> resources (unless when of systemd class), hence making all those
> containment features and supervision of systemd rather unusable, since
> there's no tight (vs. rather open-ended) blackbox to reason about.
> 
> There have been some thoughts that pacemaker could become the
> delegated controller of its own delegated cgroup subtrees in the
> past, however.
> 
> There is a nice document detailing various possibilities, but
> also looks pretty overwhelming on the first look:
> https://systemd.io/CGROUP_DELEGATION 
> Naively, i-like-continents integration option there looks most
> appealing to me at this point.
> 
> If anyone has insights into cgroups and how it pairs with systemd
> and could pair with pacemaker, please do speak up, it could be
> a great help in sketching the design in this area.
> 
>> I mean: If systemd thinks everything MUST run in some control group,
>> why not pick the "correct " one? Having the pacemaker infrastructure
>> in the same control group as all the resources seems to be a bad
>> idea IMHO.
> 
> No doubts it is suboptimal.
> 
>> The other "discussable feature" are "high PIDs" like "92472". While port
>> numbers are still 16 bit (in IPv4 at least), I see little sense in having
>> millions of processes or threads.
> 
> Have seen your questioning this at the systemd ML, but wouldn't think
> of any kind of inconveniences in that regard, modulo pre-existing real
> bugs.  It actually slightly helps to unbreak firm-guarantees-lacking
> design based on PID liveness (risk of process ID recycling is still
> better than downright crazy "process grep'ing", totally unsuitable
> when chroots, PID namespaces or containers rooted on that very host
> get into the picture, but not much better otherwise[1]!).
> 
> [1] https://lists.clusterlabs.org/pipermail/users/2019-July/025978.html 
> 
> -- 
> Jan (Poki)