[ClusterLabs] pacemaker systemd resource

Andrei Borzenkov arvidjaar at gmail.com
Wed Jul 22 05:17:22 EDT 2020


On Wed, Jul 22, 2020 at 10:59 AM Хиль Эдуард <i.am.test at mail.ru> wrote:

> Hi there! I have 2 nodes with Pacemaker 2.0.3, corosync 3.0.3 on ubuntu 20
> + 1 qdevice. I want to define new resource as systemd unit *dummy.service
> *:
>
> [Unit]
> Description=Dummy
> [Service]
> Restart=on-failure
> StartLimitInterval=20
> StartLimitBurst=5
> TimeoutStartSec=0
> RestartSec=5
> Environment="HOME=/root"
> SyslogIdentifier=dummy
> ExecStart=/usr/local/sbin/dummy.sh
> [Install]
> WantedBy=multi-user.target
>
> and /usr/local/sbin/dummy.sh :
>
> #!/bin/bash
> CNT=0
> while true; do
>   let CNT++
>   echo "hello world $CNT"
>   sleep 5
> done
>
> and then i try to define it with: pcs resource create dummy.service
> systemd:dummy op monitor interval="10s" timeout="15s"
> after 2 seconds node2 reboot.
>

Node reboots because stop operation failed, no start.



> In logs i see pacemaker in 2 seconds tried to start this unit, and it
> started, but pacemaker somehow think he is «Timed Out» . What i am doing
> wrong? Logs below.
>
>
> Jul 21 15:53:41 node2.local pacemaker-controld[1813]:  notice: Result of
> probe operation for dummy.service on node2.local: 7 (not running)
> Jul 21 15:53:41 node2.local systemd[1]: Reloading.
> Jul 21 15:53:42 node2.local systemd[1]: /lib/systemd/system/dbus.socket:5:
> ListenStream= references a path below legacy directory /var/run/, updating
> /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please
> update the unit file accordingly.
> Jul 21 15:53:42 node2.local systemd[1]:
> /lib/systemd/system/docker.socket:6: ListenStream= references a path below
> legacy directory /var/run/, updating /var/run/docker.sock →
> /run/docker.sock; please update the unit file accordingly.
> Jul 21 15:53:42 node2.local pacemaker-execd[1808]:  notice: Giving up on
> dummy.service start (rc=0): timeout (elapsed=259719ms, remaining=-159719ms)
> Jul 21 15:53:42 node2.local pacemaker-controld[1813]:  error: Result of
> start operation for dummy.service on node2.local: Timed Out
> Jul 21 15:53:42 node2.local systemd[1]: Started Cluster Controlled dummy.
> Jul 21 15:53:42 node2.local dummy[9330]: hello world 1
> Jul 21 15:53:42 node2.local systemd-udevd[922]: Network interface
> NamePolicy= disabled on kernel command line, ignoring.
> Jul 21 15:53:42 node2.local pacemaker-attrd[1809]:  notice: Setting
> fail-count-dummy.service#start_0[node2.local]: (unset) -> INFINITY
> Jul 21 15:53:42 node2.local pacemaker-attrd[1809]:  notice: Setting
> last-failure-dummy.service#start_0[node2.local]: (unset) -> 1595336022
> Jul 21 15:53:42 node2.local systemd[1]: Reloading.
> Jul 21 15:53:42 node2.local systemd[1]: /lib/systemd/system/dbus.socket:5:
> ListenStream= references a path below legacy directory /var/run/, updating
> /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please
> update the unit file accordingly.
> Jul 21 15:53:42 node2.local systemd[1]:
> /lib/systemd/system/docker.socket:6: ListenStream= references a path below
> legacy directory /var/run/, updating /var/run/docker.sock →
> /run/docker.sock; please update the unit file accordingly.
> Jul 21 15:53:42 node2.local pacemaker-execd[1808]:  notice: Giving up on
> dummy.service stop (rc=0): timeout (elapsed=317181ms, remaining=-217181ms)
>

317181ms == 5 minutes. Barring pacemaker bug, you need to show pacemaker
log since the very first start operation so we can see proper timing.
Seeing that systemd was reloaded in between, it is quite possible that
systemd lost track of pending job so any client waiting for confirmation
hangs forever. Such problems were known, not sure what current status is
(if it ever was fixed).



> Jul 21 15:53:42 node2.local pacemaker-controld[1813]:  error: Result of
> stop operation for dummy.service on node2.local: Timed Out
> Jul 21 15:53:42 node2.local systemd[1]: Stopping Daemon for dummy...
> Jul 21 15:53:42 node2.local pacemaker-attrd[1809]:  notice: Setting
> fail-count-dummy.service#stop_0[node2.local]: (unset) -> INFINITY
> Jul 21 15:53:42 node2.local pacemaker-attrd[1809]:  notice: Setting
> last-failure-dummy.service#stop_0[node2.local]: (unset) -> 1595336022
> Jul 21 15:53:42 node2.local systemd[1]: dummy.service: Succeeded.
> Jul 21 15:53:42 node2.local systemd[1]: Stopped Daemon for dummy.
> ... lost connection (node rebooting)
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200722/a6a53237/attachment-0001.htm>


More information about the Users mailing list