[ClusterLabs] Antw: Re: pacemaker resources under systemd

Thu Sep 12 08:21:26 EDT 2019

On Thu, Sep 12, 2019 at 12:40 PM Ulrich Windl
<Ulrich.Windl at rz.uni-regensburg.de> wrote:
>
> Hi!
>
> I just discovered an unpleasant side-effect of this:
> SLES has "zypper ps" to show processes that use obsoleted binaries. Now if any
> resource binary was replaced, zypper suggests to restart pacemaker (which is
> nonsense, of course).
>
> Example:
> # zypper ps
> The following running processes use deleted files:
>
> PID    | PPID  | UID | User  | Command           | Service   | Files
> -------+-------+-----+-------+-------------------+-----------+-----------------------------
> 2558   | 92480 | 0   | root  | isredir (deleted) | pacemaker |
> /usr/bin/isredir (deleted)
>
> The file definitely is not a part of pacemaker!
>

Neither zypper tells you that it is. All that zypper tells you is that
binary of some process that was started as part of pacemaker *SERVICE*
was deleted and that if you want to refresh it and ensure that process
is using updated binary you need to restart *SERVICE* pacemaker. Which
is absolutely correct. Restarting pacemaker service makes sure
everything used by pacemaker is up to date. That pacemaker is capable
of restarting only some processes is not known to zypper. While it
sure is possible to extend zypper to recognize pacemaker, parse
current configuration and suggest to restart specific resource, this
time is probably better spent somewhere else.

> Regards,
> Ulrich
>
>
> >>> Jan Pokorný <jpokorny at redhat.com> schrieb am 27.08.2019 um 16:22 in
> Nachricht
> <20190827142256.GA26851 at redhat.com>:
> > On 27/08/19 15:27 +0200, Ulrich Windl wrote:
> >> Systemd think he's the boss, doing what he wants: Today I noticed that all
> >> resources are run inside control group "pacemaker.service" like this:
> >>   ├─pacemaker.service
> >>   │ ├─ 26582 isredir-ML1: listening on 172.20.17.238/12503 (2/1)
> >>   │ ├─ 26601 /usr/bin/perl -w /usr/sbin/ldirectord
> /etc/ldirectord/mail.conf
> > start
> >>   │ ├─ 26628 ldirectord tcp:172.20.17.238:25
> >>   │ ├─ 28963 isredir-DS1: handling 172.20.16.33/10475 -- 172.20.17.200/389
> >>   │ ├─ 40548 /usr/sbin/pacemakerd -f
> >>   │ ├─ 40550 /usr/lib/pacemaker/cib
> >>   │ ├─ 40551 /usr/lib/pacemaker/stonithd
> >>   │ ├─ 40552 /usr/lib/pacemaker/lrmd
> >>   │ ├─ 40553 /usr/lib/pacemaker/attrd
> >>   │ ├─ 40554 /usr/lib/pacemaker/pengine
> >>   │ ├─ 40555 /usr/lib/pacemaker/crmd
> >>   │ ├─ 53948 isredir-DS2: handling 172.20.16.33/10570 -- 172.20.17.201/389
> >>   │ ├─ 92472 isredir-DS1: listening on 172.20.17.204/12511 (13049/3)
> >> ...
> >>
> >> (that "isredir" stuff is my own resource that forks processes and creates
> >> threads on demand, thus modifying process (and thread) titles to help
> >> understanding what's going on...)
> >>
> >> My resources are started via OCF RA (shell script), not a systemd unit.
> >>
> >> Wouldn't it make much more sense if each resource would run in its
> >> own control group?
> >
> > While listing like above may be confusing, the main problem perhaps
> > is that all the resource restrictions you specify in pacemaker service
> > file will be accounted to the mix of stack-native and stack-managed
> > resources (unless when of systemd class), hence making all those
> > containment features and supervision of systemd rather unusable, since
> > there's no tight (vs. rather open-ended) blackbox to reason about.
> >
> > There have been some thoughts that pacemaker could become the
> > delegated controller of its own delegated cgroup subtrees in the
> > past, however.
> >
> > There is a nice document detailing various possibilities, but
> > also looks pretty overwhelming on the first look:
> > https://systemd.io/CGROUP_DELEGATION
> > Naively, i-like-continents integration option there looks most
> > appealing to me at this point.
> >
> > If anyone has insights into cgroups and how it pairs with systemd
> > and could pair with pacemaker, please do speak up, it could be
> > a great help in sketching the design in this area.
> >
> >> I mean: If systemd thinks everything MUST run in some control group,
> >> why not pick the "correct " one? Having the pacemaker infrastructure
> >> in the same control group as all the resources seems to be a bad
> >> idea IMHO.
> >
> > No doubts it is suboptimal.
> >
> >> The other "discussable feature" are "high PIDs" like "92472". While port
> >> numbers are still 16 bit (in IPv4 at least), I see little sense in having
> >> millions of processes or threads.
> >
> > Have seen your questioning this at the systemd ML, but wouldn't think
> > of any kind of inconveniences in that regard, modulo pre-existing real
> > bugs.  It actually slightly helps to unbreak firm-guarantees-lacking
> > design based on PID liveness (risk of process ID recycling is still
> > better than downright crazy "process grep'ing", totally unsuitable
> > when chroots, PID namespaces or containers rooted on that very host
> > get into the picture, but not much better otherwise[1]!).
> >
> > [1] https://lists.clusterlabs.org/pipermail/users/2019-July/025978.html
> >
> > --
> > Jan (Poki)
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/