[ClusterLabs] Cluster does not start resources

Tue Aug 23 21:51:20 EDT 2022

Hi,

currently i can't start resources on our 2-node-cluster.
Cluster seems to be ok:

Stack: corosync
Current DC: ha-idg-1 (version 1.1.24+20210811.f5abda0ee-3.21.9-1.1.24+20210811.f5abda0ee) - partition with quorum
Last updated: Wed Aug 24 02:56:46 2022
Last change: Wed Aug 24 02:56:41 2022 by hacluster via crmd on ha-idg-1

2 nodes configured
40 resource instances configured (26 DISABLED)

Node ha-idg-1: online
Node ha-idg-2: online

Inactive resources:

fence_ilo_ha-idg-2      (stonith:fence_ilo2):   Stopped
fence_ilo_ha-idg-1      (stonith:fence_ilo4):   Stopped
 Clone Set: cl_share [gr_share]
     Stopped: [ ha-idg-1 ha-idg-2 ]
 Clone Set: ClusterMon-clone [ClusterMon-SMTP]
     Stopped (disabled): [ ha-idg-1 ha-idg-2 ]
vm-mausdb       (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-sim  (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-geneious     (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-idcc-devel   (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-genetrap     (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-mouseidgenes (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-greensql     (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-severin      (ocf::lentes:VirtualDomain):    Stopped (disabled)
ping_19216810010        (ocf::pacemaker:ping):  Stopped (disabled)
ping_19216810020        (ocf::pacemaker:ping):  Stopped (disabled)
vm_crispor      (ocf::heartbeat:VirtualDomain): Stopped (unmanaged)
vm-dietrich     (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-pathway      (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-crispor-server       (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-geneious-license     (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-nc-mcd       (ocf::lentes:VirtualDomain):    Stopped (disabled, unmanaged)
vm-amok (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-geneious-license-mcd (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-documents-oo (ocf::lentes:VirtualDomain):    Stopped (disabled)
fs_test_ocfs2   (ocf::lentes:Filesystem.new):   Stopped
vm-ssh  (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm_snipanalysis (ocf::lentes:VirtualDomain):    Stopped (disabled, unmanaged)
vm-seneca       (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-photoshop    (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-check-mk     (ocf::lentes:VirtualDomain):    Stopped (disabled)
vm-encore       (ocf::lentes:VirtualDomain):    Stopped (disabled)

Migration Summary:
* Node ha-idg-1:
* Node ha-idg-2:

Fencing History:
* Off of ha-idg-2 successful: delegate=ha-idg-1, client=crmd.27356, origin=ha-idg-1,
    last-successful='Wed Aug 24 01:53:49 2022'

Trying to start e.g. cl_share, which is a prerequisite for the virtual domains ... nothing happens.
I did a "crm resource cleanup" (although crm_mon shows no error) hoping this will help ... it didn't.
my command history:
 1471  2022-08-24 03:11:27 crm resource cleanup
 1472  2022-08-24 03:11:52 crm resource cleanup cl_share
 1473  2022-08-24 03:12:45 crm resource start cl_share
(to correlate with the log)

I found some weird entries in the log after the "crm resource cleanup":

Aug 24 03:11:28 [27351] ha-idg-1        cib:  warning: do_local_notify: A-Sync reply to crmd failed: No message of desired type
Aug 24 03:11:33 [27351] ha-idg-1        cib:     info: cib_process_ping:        Reporting our current digest to ha-idg-1: ed5bb7d32532ebf1ce3c45d0067c55b3 for 7.28627.70 (0x15073e0 0)
Aug 24 03:11:52 [27353] ha-idg-1       lrmd:     info: process_lrmd_get_rsc_info:       Resource 'dlm:0' not found (0 active resources)
Aug 24 03:11:52 [27356] ha-idg-1       crmd:   notice: do_lrm_invoke:   Not registering resource 'dlm:0' for a delete event | get-rc=-19 (No such device) transition-key=(null)

What does that mean "Resource not found" ?

 ...
Aug 24 03:11:57 [27351] ha-idg-1        cib:     info: cib_process_ping:        Reporting our current digest to ha-idg-1: 0b3e9ad9ad8103ce2da3b6b8d41e6716 for 7.28628.0 (0x1352bf0 0)
Aug 24 03:11:58 [27356] ha-idg-1       crmd:    error: do_pe_invoke_callback:   Could not retrieve the Cluster Information Base: Timer expired | rc=-62 call=222
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: register_fsa_error_adv:  Resetting the current action list
Aug 24 03:11:58 [27356] ha-idg-1       crmd:    error: do_log:  Input I_ERROR received in state S_POLICY_ENGINE from do_pe_invoke_callback
Aug 24 03:11:58 [27356] ha-idg-1       crmd:  warning: do_state_transition:     State transition S_POLICY_ENGINE -> S_RECOVERY | input=I_ERROR cause=C_FSA_INTERNAL origin=do_pe_invoke_callback
Aug 24 03:11:58 [27356] ha-idg-1       crmd:  warning: do_recover:      Fast-tracking shutdown in response to errors
Aug 24 03:11:58 [27356] ha-idg-1       crmd:  warning: do_election_vote:        Not voting in election, we're in state S_RECOVERY
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_dc_release:   DC role released
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: pe_ipc_destroy:  Connection to the Policy Engine released
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_te_control:   Transitioner is now inactive
Aug 24 03:11:58 [27356] ha-idg-1       crmd:    error: do_log:  Input I_TERMINATE received in state S_RECOVERY from do_recover
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_state_transition:     State transition S_RECOVERY -> S_TERMINATE | input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_shutdown:     Disconnecting STONITH...
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: tengine_stonith_connection_destroy:      Fencing daemon disconnected
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_lrm_control:  Disconnecting from the LRM
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: lrmd_api_disconnect:     Disconnecting IPC LRM connection to local
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: lrmd_ipc_connection_destroy:     IPC connection destroyed
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: lrm_connection_destroy:  LRM Connection disconnected
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: lrmd_api_disconnect:     Disconnecting IPC LRM connection to local
Aug 24 03:11:58 [27356] ha-idg-1       crmd:   notice: do_lrm_control:  Disconnected from the LRM
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crm_cluster_disconnect:  Disconnecting from cluster infrastructure: corosync
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: terminate_cs_connection: Disconnecting from Corosync
Aug 24 03:11:58 [27356] ha-idg-1       crmd:   notice: terminate_cs_connection: Disconnected from Corosync
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crm_cluster_disconnect:  Disconnected from corosync
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_ha_control:   Disconnected from the cluster
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_cib_control:  Disconnecting CIB
Aug 24 03:11:58 [27351] ha-idg-1        cib:     info: cib_process_readwrite:   We are now in R/O mode
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crmd_cib_connection_destroy:     Connection to the CIB terminated...
Aug 24 03:11:58 [27356] ha-idg-1       crmd:   notice: do_cib_control:  Disconnected from the CIB
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: qb_ipcs_us_withdraw:     withdrawing server sockets
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: do_exit: [crmd] stopped (0)
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crmd_exit:       Dropping I_PENDING: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_election_vote ]
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crmd_exit:       Dropping I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_dc_release ]
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crmd_exit:       Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crmd_quorum_destroy:     connection closed
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crmd_cs_destroy: connection closed
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crmd_init:       27356 stopped: OK (0)
Aug 24 03:11:58 [27356] ha-idg-1       crmd:    error: crmd_fast_exit:  Could not recover from internal error
Aug 24 03:11:58 [27356] ha-idg-1       crmd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
Aug 24 03:11:58 [27350] ha-idg-1 pacemakerd:    error: pcmk_child_exit: crmd[27356] exited with status 201 (Generic Pacemaker error)
Aug 24 03:11:58 [27350] ha-idg-1 pacemakerd:     info: pcmk__ipc_is_authentic_process_active:   Could not connect to crmd IPC: Connection refused
Aug 24 03:11:58 [27350] ha-idg-1 pacemakerd:   notice: pcmk_process_exit:       Respawning failed child process: crmd
Aug 24 03:11:58 [27350] ha-idg-1 pacemakerd:     info: start_child:     Using uid=90 and group=90 for process crmd
Aug 24 03:11:58 [27350] ha-idg-1 pacemakerd:     info: start_child:     Forked child 18222 for process crmd
Aug 24 03:11:58 [27350] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: Ignoring process list sent by peer for local node
Aug 24 03:11:58 [27350] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: Ignoring process list sent by peer for local node
Aug 24 03:11:58 [18222] ha-idg-1       crmd:     info: crm_log_init:    Changed active directory to /var/lib/pacemaker/cores
Aug 24 03:11:58 [18222] ha-idg-1       crmd:     info: main:    CRM Git Version: 1.1.24+20210811.f5abda0ee-3.21.9 (1.1.24+20210811.f5abda0ee)
Aug 24 03:11:58 [18222] ha-idg-1       crmd:     info: get_cluster_type:        Verifying cluster type: 'corosync'

I appreciate any help.

Thanks.

Bernd
-- 
Bernd Lentes 
System Administrator 
Institute for Metabolism and Cell Death (MCD) 
Building 25 - office 122 
HelmholtzZentrum München 
bernd.lentes at helmholtz-muenchen.de 
phone: +49 89 3187 1241
       +49 89 3187 49123 
fax:   +49 89 3187 2294 
http://www.helmholtz-muenchen.de/mcd 

Public key: 
30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff 6c 3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82 fc cc 96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11 b3 a7 48 f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32 83 92 67 9e ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79 56 be 53 89 70 51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71 8e 90 ef b2 e3 22 f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4 4b 6a 23 d3 d2 fa 27 ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90 d0 f9 92 2d a7 d2 67 53 e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60 94 f8 e3 03 0b 09 85 08 d0 6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b 83 04 b4 0a 9f 37 b8 ac 58 f1 38 43 0e 72 af 02 03 01 00 01
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2217 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220824/014ffff5/attachment.p7s>