[ClusterLabs] pacemaker startup problem

Ken Gaillot kgaillot at redhat.com
Fri Jul 24 18:46:52 EDT 2020


On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
> Hello,
>  
> after a long time I'm back to run heartbeat/pacemaker/corosync on our
> XStreamOS/illumos distro.
> I rebuilt the original components I did in 2016 on our latest release
> (probably a bit outdated, but I want to start from where I left).
> Looks like pacemaker is having trouble starting up showin this logs:
> 
> Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
> Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
> Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
> (e174ec8)
> Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in
> state S_STARTING from crmd_init
> Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying
> cluster type: 'heartbeat'
> Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an
> active 'heartbeat' cluster
> Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect:
> Connecting to cluster infrastructure: heartbeat


> Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not
> start lrmd IPC server: Operation not supported (-48)

This is repeated for all the subdaemons ... the error is coming from
qb_ipcs_run(), which looks like the issue is an invalid PCMK_ipc_type
for illumos. If you set it to "socket" it should work.


> Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server:
> shutting down and inhibiting respawn
> Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory
> from libxml2
> Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster
> type: 'heartbeat'
> Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an
> active 'heartbeat' cluster
> Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-
> system "pengine"
> Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry
> 25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1
> total)
> Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid
> d426a730-5229-6758-853a-99d4d491514a
> Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
> Hostname: xstorage1
> Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
> UUID: d426a730-5229-6758-853a-99d4d491514a
> Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting
> to cluster infrastructure: heartbeat
> Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could
> not start attrd IPC server: Operation not supported (-48)
> Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to
> create attrd servers: exiting and inhibiting respawn.
> Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify
> pacemaker and pacemaker_remote are not both enabled.
> Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could
> not start pengine IPC server: Operation not supported (-48)
> Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC
> server: shutting down and inhibiting respawn
> Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up
> memory from libxml2
> Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 1 times... pause and retry
> Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process
> pengine exited (pid=972, rc=100)
> Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 2 times... pause and retry
> Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 3 times... pause and retry
> Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 4 times... pause and retry
> Jul 24 18:21:42 [968] stonith-ng: error: setup_cib: Could not connect
> to the CIB service: Transport endpoint is not connected (-134)
> Jul 24 18:21:42 [968] stonith-ng: error: mainloop_add_ipc_server:
> Could not start stonith-ng IPC server: Operation not supported (-48)
> Jul 24 18:21:42 [968] stonith-ng: error: stonith_ipc_server_init:
> Failed to create stonith-ng servers: exiting and inhibiting respawn.
> Jul 24 18:21:42 [968] stonith-ng: warning: stonith_ipc_server_init:
> Verify pacemaker and pacemaker_remote are not both enabled.
>  
> Any idea what's happening?
> Gabriele
> 
> 
>  
>  
> Sonicle S.r.l. : http://www.sonicle.com
> Music: http://www.gabrielebulfon.com
> Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list