[ClusterLabs] Antw: [EXT] cluster log not unambiguous about state of VirtualDomains

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Aug 4 02:11:31 EDT 2022


>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 03.08.2022
um
17:01 in Nachricht
<987208047.150503130.1659538882681.JavaMail.zimbra at helmholtz-muenchen.de>:
> Hi,
> 
> i have a strange behaviour found in the cluster log 
> (/var/log/cluster/corosync.log).
> I KNOW that i put one node (ha-idg-2) in standby mode and stopped the 
> pacemaker service on that node:
> The history of the shell says:
> 993  2022-08-02 18:28:25 crm node standby ha-idg-2
> 994  2022-08-02 18:28:58 systemctl stop pacemaker.service

I think the logs of ha-idg-2 around these commands would be interesting, as
wel las any pacemaker messages (if any) in between.

BTW, a different, but also confusing message:
Aug 03 15:16:58 h12 pengine[14727]:  warning: Processing failed start of
prm_cwd_w49_sap on h12: unknown error
Aug 03 15:16:58 h12 pengine[14727]:  warning: Forcing prm_cwd_w49_sap away
from h12 after 1000000 failures (max=3)

The point here is that h12 is the only active node in a two-node cluster, so
"forcing away" probably means "stop":
# crm_mon -1Arfj |grep w49
     prm_cwd_w49_sap    (ocf::heartbeat:SAPInstance):   Stopped

Sometimes these messages can be confusing. You could have provided the crm_mon
output too, BTW.

Regards,
Ulrich

> 
> Later on i had some trouble with high load.
> I found contradictory entries in the log on the DC (ha-idg-1):
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-documents-oo active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-documents-oo active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-mausdb active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-mausdb active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-photoshop active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-photoshop active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-encore active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-encore active on ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource dlm:1 active on ha-idg-2
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-seneca active on ha-idg-2    <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-pathway active on ha-idg-2   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-dietrich active on ha-idg-2  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-sim active on ha-idg-2       <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-ssh active on ha-idg-2       <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-nextcloud active on ha-idg-2 <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource fs_ocfs2:1 active on ha-idg-2
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource gfs2_share:1 active on ha-idg-2
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-geneious active on ha-idg-2  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource gfs2_snap:1 active on ha-idg-2  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource vm-geneious-license-mcd active on 
> ha-idg-2 <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: determine_op_status: 

>    Operation monitor found resource clvmd:1 active on ha-idg-2
> 
> The log says some VirtualDomains are running on ha-idg-2 !?!
> 
> But just a few lines later the log says all VirtualDomains are running on 
> ha-idg-1:
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-mausdb       (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-sim  (ocf::lentes:VirtualDomain):    Started ha-idg-1          <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-geneious     (ocf::lentes:VirtualDomain):    Started ha-idg-1  <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-idcc-devel   (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-genetrap     (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-mouseidgenes (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-greensql     (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-severin      (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> ping_19216810010        (ocf::pacemaker:ping):  Stopped (disabled)
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> ping_19216810020        (ocf::pacemaker:ping):  Stopped (disabled)
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm_crispor      (ocf::heartbeat:VirtualDomain): Stopped (unmanaged)
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-dietrich     (ocf::lentes:VirtualDomain):    Started ha-idg-1   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-pathway      (ocf::lentes:VirtualDomain):    Started ha-idg-1   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-crispor-server       (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-geneious-license     (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-nextcloud    (ocf::lentes:VirtualDomain):    Started ha-idg-1   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-amok (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-geneious-license-mcd (ocf::lentes:VirtualDomain):    Started ha-idg-1 
<===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-documents-oo (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> fs_test_ocfs2   (ocf::lentes:Filesystem.new):   Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-ssh  (ocf::lentes:VirtualDomain):    Started ha-idg-1           <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm_snipanalysis (ocf::lentes:VirtualDomain):    Stopped (disabled,
unmanaged)
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-seneca       (ocf::lentes:VirtualDomain):    Started ha-idg-1   <===
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-photoshop    (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-check-mk     (ocf::lentes:VirtualDomain):    Started ha-idg-1
> Aug 03 00:14:04 [19367] ha-idg-1    pengine:     info: common_print:    
> vm-encore       (ocf::lentes:VirtualDomain):    Started ha-idg-1
> 
> Why contradictory information ?
> 
> 
> Bernd
> 
> 
> -- 
> Bernd Lentes 
> System Administrator 
> Institute for Metabolism and Cell Death (MCD) 
> Building 25 - office 122 
> HelmholtzZentrum München 
> bernd.lentes at helmholtz-muenchen.de 
> phone: +49 89 3187 1241
>        +49 89 3187 49123 
> fax:   +49 89 3187 2294 
> http://www.helmholtz-muenchen.de/mcd 
> 
> Public key: 
> 30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff 
> 6c 3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82
fc 
> cc 96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11 b3
a7 
> 48 f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32 83 92
67 
> 9e ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79 56 be 53
89 
> 70 51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71 8e 90 ef b2
e3 
> 22 f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4 4b 6a 23 d3 d2
fa 
> 27 ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90 d0 f9 92 2d a7 d2
67 
> 53 e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60 94 f8 e3 03 0b 09 85
08 
> d0 6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b 83 04 b4 0a 9f 37 b8 ac
58 
> f1 38 43 0e 72 af 02 03 01 00 01





More information about the Users mailing list