[ClusterLabs] A systemd resource monitor is still in progress: re-scheduling

Mon Jun 14 11:20:00 EDT 2021

On Sun, 2021-06-13 at 22:43 +0800, Acewind wrote:
> Dear guys,
>  I'm using pacemaker-1.1.20 to construct an openstack HA system.
> After I stop&start the cluster, pcs monitor operation always be in
> progress for cinder-volume & cinder-scheduler service. But the
> systemd service is active and openstack is working well. How does
> pacemaker monitor a normal systemd resource?

Pacemaker does something similar to "systemctl status" (using systemd's
native DBus interface rather than that command directly).

> > pcs resource show r_systemd_openstack-cinder-scheduler
>  Resource: r_systemd_openstack-cinder-scheduler (class=systemd
> type=openstack-cinder-scheduler)
>   Operations: monitor interval=10s timeout=100s (r_systemd_openstack-
> cinder-scheduler-monitor-interval-10s)
>               stop interval=0s timeout=100s (r_systemd_openstack-
> cinder-scheduler-stop-interval-0s)
>  
> 2021-06-13 20:50:42 pcs cluster stop --all
> 2021-06-13 20:50:56 pcs cluster start --all 
> 
> Jun 13 20:53:16 [4057851] host001       lrmd:     info:
> action_complete: r_systemd_openstack-cinder-scheduler monitor is
> still in progress: re-scheduling (elapsed=54372ms, remaining=45628ms,
> start_delay=2000ms)
> Jun 13 20:53:18 [4057851] host001       lrmd:     info:
> action_complete: r_systemd_openstack-cinder-scheduler monitor is
> still in progress: re-scheduling (elapsed=56374ms, remaining=43626ms,
> start_delay=2000ms)
> Jun 13 20:53:20 [4057851] host001       lrmd:     info:
> action_complete: r_systemd_openstack-cinder-scheduler monitor is
> still in progress: re-scheduling (elapsed=58375ms, remaining=41625ms,
> start_delay=2000ms)
> Jun 13 20:53:22 [4057854] host001       crmd:   notice:
> process_lrm_event: Result of stop operation for r_systemd_openstack-
> cinder-scheduler on host001: 0 (ok) | call=71
> key=r_systemd_openstack-cinder-scheduler_stop_0 confirmed=true cib-
> update=59

I don't see anything wrong in the above. The final line says this was a
successful stop. When doing a start or stop for a systemd resource,
pacemaker will repeatedly do a status check until the service is
actually up or down, before returning success for the start or stop.

It would be more efficient and less error-prone to use DBus's signal
feature to get notified when the action finishes, rather than
repeatedly poll the status, but that's a big project that we haven't
gotten to yet.

> The whole log file is included in attachment. Thanks!
-- 
Ken Gaillot <kgaillot at redhat.com>