[ClusterLabs] pacemaker startup problem

Gabriele Bulfon gbulfon at sonicle.com
Sun Jul 26 05:52:58 EDT 2020


Sorry, actually the problem is not gone yet.
Now corosync and pacemaker are running happily, but those IPC errors are coming out of heartbeat and crmd as soon as I start it.
The pacemakerd process has PCMK_ipc_type=socket, what's wrong with heartbeat or crmd?
 
Here's the env of the process:
 
sonicle at xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
4222: /usr/sbin/pacemakerd
envp[0]: PCMK_respawned=true
envp[1]: PCMK_watchdog=false
envp[2]: HA_LOGFACILITY=none
envp[3]: HA_logfacility=none
envp[4]: PCMK_logfacility=none
envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
envp[7]: HA_debug=0
envp[8]: PCMK_debug=0
envp[9]: HA_quorum_type=corosync
envp[10]: PCMK_quorum_type=corosync
envp[11]: HA_cluster_type=corosync
envp[12]: PCMK_cluster_type=corosync
envp[13]: HA_use_logd=off
envp[14]: PCMK_use_logd=off
envp[15]: HA_mcp=true
envp[16]: PCMK_mcp=true
envp[17]: HA_LOGD=no
envp[18]: LC_ALL=C
envp[19]: PCMK_service=pacemakerd
envp[20]: PCMK_ipc_type=socket
envp[21]: SMF_ZONENAME=global
envp[22]: PWD=/
envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
envp[24]: _=/usr/sbin/pacemakerd
envp[25]: TZ=Europe/Rome
envp[26]: LANG=en_US.UTF-8
envp[27]: SMF_METHOD=start
envp[28]: SHLVL=2
envp[29]: PATH=/usr/sbin:/usr/bin
envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[31]: A__z="*SHLVL
 
 
Here are crmd complaints:
 
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: Node xstorage1 state is now member
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not start crmd IPC server: Operation not supported (-48)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Failed to create IPC server: shutting down and inhibiting respawn
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: The local CRM is operational
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input I_ERROR received in state S_STARTING from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: State transition S_STARTING -S_RECOVERY
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: Fast-tracking shutdown in response to errors
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: Input I_PENDING received in state S_RECOVERY from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input I_TERMINATE received in state S_RECOVERY from do_recover
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: Disconnected from the LRM
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Child process pengine exited (pid=4316, rc=100)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not recover from internal error
Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]: WARN: Managed /usr/libexec/pacemaker/crmd process 4315 exited with return code 201.
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
----------------------------------------------------------------------------------
Da: Ken Gaillot
A: Cluster Labs - All topics related to open-source clustering welcomed
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back to run heartbeat/pacemaker/corosync on our
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release
(probably a bit outdated, but I want to start from where I left).
Looks like pacemaker is having trouble starting up showin this logs:
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
(e174ec8)
Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in
state S_STARTING from crmd_init
Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying
cluster type: 'heartbeat'
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an
active 'heartbeat' cluster
Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect:
Connecting to cluster infrastructure: heartbeat
Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not
start lrmd IPC server: Operation not supported (-48)
This is repeated for all the subdaemons ... the error is coming from
qb_ipcs_run(), which looks like the issue is an invalid PCMK_ipc_type
for illumos. If you set it to "socket" it should work.
Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server:
shutting down and inhibiting respawn
Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory
from libxml2
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster
type: 'heartbeat'
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an
active 'heartbeat' cluster
Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-
system "pengine"
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry
25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1
total)
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid
d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
Hostname: xstorage1
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
UUID: d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting
to cluster infrastructure: heartbeat
Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could
not start attrd IPC server: Operation not supported (-48)
Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to
create attrd servers: exiting and inhibiting respawn.
Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify
pacemaker and pacemaker_remote are not both enabled.
Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could
not start pengine IPC server: Operation not supported (-48)
Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC
server: shutting down and inhibiting respawn
Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up
memory from libxml2
Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 1 times... pause and retry
Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process
pengine exited (pid=972, rc=100)
Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 2 times... pause and retry
Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 3 times... pause and retry
Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 4 times... pause and retry
Jul 24 18:21:42 [968] stonith-ng: error: setup_cib: Could not connect
to the CIB service: Transport endpoint is not connected (-134)
Jul 24 18:21:42 [968] stonith-ng: error: mainloop_add_ipc_server:
Could not start stonith-ng IPC server: Operation not supported (-48)
Jul 24 18:21:42 [968] stonith-ng: error: stonith_ipc_server_init:
Failed to create stonith-ng servers: exiting and inhibiting respawn.
Jul 24 18:21:42 [968] stonith-ng: warning: stonith_ipc_server_init:
Verify pacemaker and pacemaker_remote are not both enabled.
Any idea what's happening?
Gabriele
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200726/ee462e3a/attachment.htm>


More information about the Users mailing list