<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:arial,sans-serif"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 22, 2020 at 10:59 AM Хиль Эдуард <<a href="mailto:i.am.test@mail.ru">i.am.test@mail.ru</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><div>Hi there! I have 2 nodes with Pacemaker 2.0.3, corosync 3.0.3 on ubuntu 20 + 1 qdevice. I want to define new resource as systemd unit <b>dummy.service </b>:</div><div> </div><div><div>[Unit]<br>Description=Dummy</div><div>[Service]<br>Restart=on-failure<br>StartLimitInterval=20<br>StartLimitBurst=5<br>TimeoutStartSec=0<br>RestartSec=5<br>Environment="HOME=/root"<br>SyslogIdentifier=dummy<br>ExecStart=/usr/local/sbin/dummy.sh</div><div>[Install]<br>WantedBy=multi-user.target</div><div> </div><div>and /usr/local/sbin/dummy.sh :</div><div> </div><div><div>#!/bin/bash</div><div>CNT=0<br>while true; do<br> let CNT++<br> echo "hello world $CNT"<br> sleep 5<br>done</div><div> </div><div>and then i try to define it with: pcs resource create dummy.service systemd:dummy op monitor interval="10s" timeout="15s"</div><div>after 2 seconds node2 reboot.</div></div></div></div></blockquote><div><br></div><div><div style="font-family:arial,sans-serif" class="gmail_default">Node reboots because stop operation failed, no start.</div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div><div> In logs i see pacemaker in 2 seconds tried to start this unit, and it started, but pacemaker somehow think he is «Timed Out» . What i am doing wrong? Logs below.</div><div> </div><div> </div><div>Jul 21 15:53:41 node2.local pacemaker-controld[1813]: notice: Result of probe operation for dummy.service on node2.local: 7 (not running) <br>Jul 21 15:53:41 node2.local systemd[1]: Reloading.<br>Jul 21 15:53:42 node2.local systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.<br>Jul 21 15:53:42 node2.local systemd[1]: /lib/systemd/system/docker.socket:6: ListenStream= references a path below legacy directory /var/run/, updating /var/run/docker.sock → /run/docker.sock; please update the unit file accordingly.<br>Jul 21 15:53:42 node2.local pacemaker-execd[1808]: notice: Giving up on dummy.service start (rc=0): timeout (elapsed=259719ms, remaining=-159719ms)<br>Jul 21 15:53:42 node2.local pacemaker-controld[1813]: error: Result of start operation for dummy.service on node2.local: Timed Out <br>Jul 21 15:53:42 node2.local systemd[1]: Started Cluster Controlled dummy.<br>Jul 21 15:53:42 node2.local dummy[9330]: hello world 1<br>Jul 21 15:53:42 node2.local systemd-udevd[922]: Network interface NamePolicy= disabled on kernel command line, ignoring.<br>Jul 21 15:53:42 node2.local pacemaker-attrd[1809]: notice: Setting fail-count-dummy.service#start_0[node2.local]: (unset) -> INFINITY <br>Jul 21 15:53:42 node2.local pacemaker-attrd[1809]: notice: Setting last-failure-dummy.service#start_0[node2.local]: (unset) -> 1595336022 <br>Jul 21 15:53:42 node2.local systemd[1]: Reloading.<br>Jul 21 15:53:42 node2.local systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.<br>Jul 21 15:53:42 node2.local systemd[1]: /lib/systemd/system/docker.socket:6: ListenStream= references a path below legacy directory /var/run/, updating /var/run/docker.sock → /run/docker.sock; please update the unit file accordingly.<br>Jul 21 15:53:42 node2.local pacemaker-execd[1808]: notice: Giving up on dummy.service stop (rc=0): timeout (elapsed=317181ms, remaining=-217181ms)<br></div></div></div></div></blockquote><div><br></div><div><div style="font-family:arial,sans-serif" class="gmail_default">317181ms == 5 minutes. Barring pacemaker bug, you need to show pacemaker log since the very first start operation so we can see proper timing. Seeing that systemd was reloaded in between, it is quite possible that systemd lost track of pending job so any client waiting for confirmation hangs forever. Such problems were known, not sure what current status is (if it ever was fixed).</div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div><div>Jul 21 15:53:42 node2.local pacemaker-controld[1813]: error: Result of stop operation for dummy.service on node2.local: Timed Out <br>Jul 21 15:53:42 node2.local systemd[1]: Stopping Daemon for dummy...<br>Jul 21 15:53:42 node2.local pacemaker-attrd[1809]: notice: Setting fail-count-dummy.service#stop_0[node2.local]: (unset) -> INFINITY <br>Jul 21 15:53:42 node2.local pacemaker-attrd[1809]: notice: Setting last-failure-dummy.service#stop_0[node2.local]: (unset) -> 1595336022 <br>Jul 21 15:53:42 node2.local systemd[1]: dummy.service: Succeeded.<br>Jul 21 15:53:42 node2.local systemd[1]: Stopped Daemon for dummy.<br>... lost connection (node rebooting)</div></div></div><div> </div><div> </div></div>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
</blockquote></div></div>