[ClusterLabs] pacemaker startup problem

Gabriele Bulfon gbulfon at sonicle.com
Mon Jul 27 11:31:52 EDT 2020


Solved this, actually I don't need heartbeat component and service running.
I just use corosync and pacemaker, and this seems to work.
Now going on with crm configuration.
 
Thanks!
Gabriele
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Reid Wahl
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
26 luglio 2020 12.25.20 CEST
Oggetto:
Re: [ClusterLabs] pacemaker startup problem
Hmm. If it's reading PCMK_ipc_type and matching the server type to QB_IPC_SOCKET, then the only other place I see it could be coming from is qb_ipc_auth_creds.
 
qb_ipcs_run -qb_ipcs_us_publish -qb_ipcs_us_connection_acceptor -qb_ipcs_uc_recv_and_auth -process_auth -qb_ipc_auth_creds -
 
static int32_t
qb_ipc_auth_creds(struct ipc_auth_data *data)
{
...
#ifdef HAVE_GETPEERUCRED
        /*
         * Solaris and some BSD systems
...
#elif defined(HAVE_GETPEEREID)
        /*
        * Usually MacOSX systems
...
#elif defined(SO_PASSCRED)
        /*
        * Usually Linux systems
...
#else /* no credentials */
        data-ugp.pid = 0;
        data-ugp.uid = 0;
        data-ugp.gid = 0;
        res = -ENOTSUP;
#endif /* no credentials */
        return res;
 
I'll leave it to Ken to say whether that's likely and what it implies if so.
On Sun, Jul 26, 2020 at 2:53 AM Gabriele Bulfon
gbulfon at sonicle.com
wrote:
Sorry, actually the problem is not gone yet.
Now corosync and pacemaker are running happily, but those IPC errors are coming out of heartbeat and crmd as soon as I start it.
The pacemakerd process has PCMK_ipc_type=socket, what's wrong with heartbeat or crmd?
 
Here's the env of the process:
 
sonicle at xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
4222: /usr/sbin/pacemakerd
envp[0]: PCMK_respawned=true
envp[1]: PCMK_watchdog=false
envp[2]: HA_LOGFACILITY=none
envp[3]: HA_logfacility=none
envp[4]: PCMK_logfacility=none
envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
envp[7]: HA_debug=0
envp[8]: PCMK_debug=0
envp[9]: HA_quorum_type=corosync
envp[10]: PCMK_quorum_type=corosync
envp[11]: HA_cluster_type=corosync
envp[12]: PCMK_cluster_type=corosync
envp[13]: HA_use_logd=off
envp[14]: PCMK_use_logd=off
envp[15]: HA_mcp=true
envp[16]: PCMK_mcp=true
envp[17]: HA_LOGD=no
envp[18]: LC_ALL=C
envp[19]: PCMK_service=pacemakerd
envp[20]: PCMK_ipc_type=socket
envp[21]: SMF_ZONENAME=global
envp[22]: PWD=/
envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
envp[24]: _=/usr/sbin/pacemakerd
envp[25]: TZ=Europe/Rome
envp[26]: LANG=en_US.UTF-8
envp[27]: SMF_METHOD=start
envp[28]: SHLVL=2
envp[29]: PATH=/usr/sbin:/usr/bin
envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[31]: A__z="*SHLVL
 
 
Here are crmd complaints:
 
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: Node xstorage1 state is now member
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not start crmd IPC server: Operation not supported (-48)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Failed to create IPC server: shutting down and inhibiting respawn
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: The local CRM is operational
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input I_ERROR received in state S_STARTING from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: State transition S_STARTING -S_RECOVERY
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: Fast-tracking shutdown in response to errors
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: Input I_PENDING received in state S_RECOVERY from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input I_TERMINATE received in state S_RECOVERY from do_recover
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: Disconnected from the LRM
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Child process pengine exited (pid=4316, rc=100)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not recover from internal error
Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]: WARN: Managed /usr/libexec/pacemaker/crmd process 4315 exited with return code 201.
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
----------------------------------------------------------------------------------
Da: Ken Gaillot
kgaillot at redhat.com
A: Cluster Labs - All topics related to open-source clustering welcomed
users at clusterlabs.org
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back to run heartbeat/pacemaker/corosync on our
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release
(probably a bit outdated, but I want to start from where I left).
Looks like pacemaker is having trouble starting up showin this logs:
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
(e174ec8)
Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in
state S_STARTING from crmd_init
Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying
cluster type: 'heartbeat'
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an
active 'heartbeat' cluster
Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect:
Connecting to cluster infrastructure: heartbeat
Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not
start lrmd IPC server: Operation not supported (-48)
This is repeated for all the subdaemons ... the error is coming from
qb_ipcs_run(), which looks like the issue is an invalid PCMK_ipc_type
for illumos. If you set it to "socket" it should work.
Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server:
shutting down and inhibiting respawn
Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory
from libxml2
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster
type: 'heartbeat'
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an
active 'heartbeat' cluster
Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-
system "pengine"
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry
25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1
total)
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid
d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
Hostname: xstorage1
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
UUID: d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting
to cluster infrastructure: heartbeat
Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could
not start attrd IPC server: Operation not supported (-48)
Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to
create attrd servers: exiting and inhibiting respawn.
Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify
pacemaker and pacemaker_remote are not both enabled.
Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could
not start pengine IPC server: Operation not supported (-48)
Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC
server: shutting down and inhibiting respawn
Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up
memory from libxml2
Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 1 times... pause and retry
Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process
pengine exited (pid=972, rc=100)
Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 2 times... pause and retry
Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 3 times... pause and retry
Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 4 times... pause and retry
Jul 24 18:21:42 [968] stonith-ng: error: setup_cib: Could not connect
to the CIB service: Transport endpoint is not connected (-134)
Jul 24 18:21:42 [968] stonith-ng: error: mainloop_add_ipc_server:
Could not start stonith-ng IPC server: Operation not supported (-48)
Jul 24 18:21:42 [968] stonith-ng: error: stonith_ipc_server_init:
Failed to create stonith-ng servers: exiting and inhibiting respawn.
Jul 24 18:21:42 [968] stonith-ng: warning: stonith_ipc_server_init:
Verify pacemaker and pacemaker_remote are not both enabled.
Any idea what's happening?
Gabriele
Sonicle S.r.l. :
http://www.sonicle.com
Music:
http://www.gabrielebulfon.com
Quantum Mechanics :
http://www.cdbaby.com/cd/gabrielebulfon
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home:
https://www.clusterlabs.org/
--
Ken Gaillot
kgaillot at redhat.com
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home:
https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home:
https://www.clusterlabs.org/
--
Regards,
Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
_______________________________________________Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200727/f976b36c/attachment-0001.htm>


More information about the Users mailing list