[ClusterLabs] VirtualDomain craziness

Wed Apr 28 07:40:33 EDT 2021

Hi!

I just discovered a problem after re-locating the configuration file of a running VirtualDomain:
Cluster wanted to restart VM v14 on h18 due to configuration change (which is correct).
Stop went OK, start failed with error "already running". !!??
Still the cluster insisted on "recovering" v14 on h18, but stop "suceeded" with
INFO: Configuration file /etc/libvirt/libxl/v14.xml not readable, resource considered stopped.
(the path was the old path,so that part was OK again)

Then the cluster moved v14 away to h16 (and later from h16 to h19), all successful. (which is OK)
Still the cluster continued despite of "fail-count=1000000" complaing on h18:
Apr 21 15:59:20 h18 pacemaker-schedulerd[7031]:  warning: Forcing prm_xen_v14 away from h18 after 1000000 failures (max=1000000)

But v14 wasn't running on h18 at that time.
The messages continue up to now...

That's absolutely not OK.

Seen in SLES15 SP2 with pacemaker-2.0.4+20200616.2deceaa3a-3.3.1.21516.1.PTF.1182607.x86_64
resource-agents-4.4.0+git57.70549516-3.12.1.x86_64

Regards,
Ulrich