[ClusterLabs] cluster log not unambiguous about state of VirtualDomains

Ken Gaillot kgaillot at redhat.com
Wed Aug 3 12:29:51 EDT 2022


The "found ... active" messages mean that it was the case at some
point, not necessarily currently. Newer versions log much better
messages like:

 info: Probe found rsc1 active on node1 at Aug  1 15:41:34 2022

so you can see it was a historical result. The later "Started" messages
are where the cluster believes the resources are currently.

On Wed, 2022-08-03 at 17:01 +0200, Lentes, Bernd wrote:
> Hi,
> 
> i have a strange behaviour found in the cluster log
> (/var/log/cluster/corosync.log).
> I KNOW that i put one node (ha-idg-2) in standby mode and stopped the
> pacemaker service on that node:
> The history of the shell says:
> 993  2022-08-02 18:28:25 crm node standby ha-idg-2
> 994  2022-08-02 18:28:58 systemctl stop pacemaker.service
> 
> Later on i had some trouble with high load.
> I found contradictory entries in the log on the DC (ha-idg-1):
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-
> documents-oo active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-
> documents-oo active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-mausdb
> active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-mausdb
> active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-
> photoshop active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-
> photoshop active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-encore
> active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-encore
> active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource dlm:1
> active on ha-idg-2
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-seneca
> active on ha-idg-2    <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-pathway
> active on ha-idg-2   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-dietrich 
> active on ha-idg-2  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-sim
> active on ha-idg-2       <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-ssh
> active on ha-idg-2       <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-
> nextcloud active on ha-idg-2 <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource fs_ocfs2:1
> active on ha-idg-2
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource
> gfs2_share:1 active on ha-idg-2
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-geneious 
> active on ha-idg-2  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource gfs2_snap:1
> active on ha-idg-2  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource vm-
> geneious-license-mcd active on ha-idg-2 <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> determine_op_status:     Operation monitor found resource clvmd:1
> active on ha-idg-2
> 
> The log says some VirtualDomains are running on ha-idg-2 !?!
> 
> But just a few lines later the log says all VirtualDomains are
> running on ha-idg-1:
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> mausdb       (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-sim  (ocf::lentes:VirtualDomain):    Started ha-
> idg-1          <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> geneious     (ocf::lentes:VirtualDomain):    Started ha-idg-1  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-idcc-
> devel   (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> genetrap     (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-mouseidgenes
> (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> greensql     (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> severin      (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    ping_19216810010        (ocf::pacemaker:ping):  Stop
> ped (disabled)
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    ping_19216810020        (ocf::pacemaker:ping):  Stop
> ped (disabled)
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm_crispor      (ocf::heartbeat:VirtualDomain):
> Stopped (unmanaged)
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> dietrich     (ocf::lentes:VirtualDomain):    Started ha-idg-1   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> pathway      (ocf::lentes:VirtualDomain):    Started ha-idg-1   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-crispor-
> server       (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-geneious-
> license     (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> nextcloud    (ocf::lentes:VirtualDomain):    Started ha-idg-1   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-amok (ocf::lentes:VirtualDomain):    Started ha-
> idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-geneious-license-mcd
> (ocf::lentes:VirtualDomain):    Started ha-idg-1  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-documents-oo
> (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    fs_test_ocfs2   (ocf::lentes:Filesystem.new):   Star
> ted ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-ssh  (ocf::lentes:VirtualDomain):    Started ha-
> idg-1           <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm_snipanalysis
> (ocf::lentes:VirtualDomain):    Stopped (disabled, unmanaged)
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> seneca       (ocf::lentes:VirtualDomain):    Started ha-idg-1   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> photoshop    (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-check-
> mk     (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info:
> common_print:    vm-
> encore       (ocf::lentes:VirtualDomain):    Started ha-idg-1
> 
> Why contradictory information ?
> 
> 
> Bernd
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list