[ClusterLabs] pacemaker startup problem

Sun Jul 26 05:41:09 EDT 2020

Sorry, I was using wrong hostnames for that networks, using debug log I found it was not finding "this node" in conf file.

Gabriele

Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Gabriele Bulfon
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
26 luglio 2020 11.23.53 CEST
Oggetto:
Re: [ClusterLabs] pacemaker startup problem

Thanks, I ran it manually so I got those errors, running from service script it correctly set PCMK_ipc_type to socket.

But now I see these now:
Jul 26 11:08:16 [4039] pacemakerd: info: crm_log_init: Changed active directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 26 11:08:16 [4039] pacemakerd: info: mcp_read_config: cmap connection setup failed: CS_ERR_LIBRARY. Retrying in 1s
Jul 26 11:08:17 [4039] pacemakerd: info: mcp_read_config: cmap connection setup failed: CS_ERR_LIBRARY. Retrying in 2s
Jul 26 11:08:19 [4039] pacemakerd: info: mcp_read_config: cmap connection setup failed: CS_ERR_LIBRARY. Retrying in 3s
Jul 26 11:08:22 [4039] pacemakerd: info: mcp_read_config: cmap connection setup failed: CS_ERR_LIBRARY. Retrying in 4s
Jul 26 11:08:26 [4039] pacemakerd: info: mcp_read_config: cmap connection setup failed: CS_ERR_LIBRARY. Retrying in 5s
Jul 26 11:08:31 [4039] pacemakerd: warning: mcp_read_config: Could not connect to Cluster Configuration Database API, error 2
Jul 26 11:08:31 [4039] pacemakerd: notice: main: Could not obtain corosync config data, exiting
Jul 26 11:08:31 [4039] pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2

So I think I need to start corosync first (right?) but it dies with this:

Jul 26 11:07:06 [4027] xstorage1 corosync notice [MAIN ] Corosync Cluster Engine ('2.4.1'): started and ready to provide service.
Jul 26 11:07:06 [4027] xstorage1 corosync info [MAIN ] Corosync built-in features: bindnow
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] The network interface [10.100.100.1] is now up.
Jul 26 11:07:06 [4027] xstorage1 corosync notice [SERV ] Service engine loaded: corosync configuration map access [0]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: corosync configuration service [1]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: corosync profile loading service [4]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [QUORUM] Using quorum provider corosync_votequorum
Jul 26 11:07:06 [4027] xstorage1 corosync crit [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Jul 26 11:07:06 [4027] xstorage1 corosync error [SERV ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Jul 26 11:07:06 [4027] xstorage1 corosync error [MAIN ] Corosync Cluster Engine exiting with status 20 at /data/sources/sonicle/xstream-storage-gate/components/cluster/corosync/corosync-2.4.1/exec/service.c:356.
My corosync conf has nodelist configured! Here it is:

service {    ver: 1    name: pacemaker    use_mgmtd: no    use_logd: no}totem {        version: 2        crypto_cipher: none        crypto_hash: none        interface {                ringnumber: 0                bindnetaddr: 10.100.100.0                mcastaddr: 239.255.1.1                mcastport: 5405                ttl: 1        }}nodelist {   node {         ring0_addr: xstorage1         nodeid: 1        }   node {         ring0_addr: xstorage2         nodeid: 2        }}quorum {        provider: corosync_votequorum        two_node: 1}logging {        fileline: off        to_stderr: no        to_logfile: yes        logfile: /sonicle/var/log/cluster/corosync.log        to_syslog: no        debug: off        timestamp: on        logger_subsys {                subsys: QUORUM                debug: off        }}

Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
----------------------------------------------------------------------------------
Da: Ken Gaillot
A: Cluster Labs - All topics related to open-source clustering welcomed
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back to run heartbeat/pacemaker/corosync on our
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release
(probably a bit outdated, but I want to start from where I left).
Looks like pacemaker is having trouble starting up showin this logs:
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
(e174ec8)
Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in
state S_STARTING from crmd_init
Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying
cluster type: 'heartbeat'
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an
active 'heartbeat' cluster
Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect:
Connecting to cluster infrastructure: heartbeat
Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not
start lrmd IPC server: Operation not supported (-48)
This is repeated for all the subdaemons ... the error is coming from
qb_ipcs_run(), which looks like the issue is an invalid PCMK_ipc_type
for illumos. If you set it to "socket" it should work.
Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server:
shutting down and inhibiting respawn
Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory
from libxml2
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster
type: 'heartbeat'
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an
active 'heartbeat' cluster
Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-
system "pengine"
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry
25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1
total)
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid
d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
Hostname: xstorage1
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
UUID: d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting
to cluster infrastructure: heartbeat
Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could
not start attrd IPC server: Operation not supported (-48)
Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to
create attrd servers: exiting and inhibiting respawn.
Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify
pacemaker and pacemaker_remote are not both enabled.
Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could
not start pengine IPC server: Operation not supported (-48)
Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC
server: shutting down and inhibiting respawn
Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up
memory from libxml2
Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 1 times... pause and retry
Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process
pengine exited (pid=972, rc=100)
Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 2 times... pause and retry
Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 3 times... pause and retry
Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)
Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect
to the CIB service: Transport endpoint is not connected
Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't
complete CIB registration 4 times... pause and retry
Jul 24 18:21:42 [968] stonith-ng: error: setup_cib: Could not connect
to the CIB service: Transport endpoint is not connected (-134)
Jul 24 18:21:42 [968] stonith-ng: error: mainloop_add_ipc_server:
Could not start stonith-ng IPC server: Operation not supported (-48)
Jul 24 18:21:42 [968] stonith-ng: error: stonith_ipc_server_init:
Failed to create stonith-ng servers: exiting and inhibiting respawn.
Jul 24 18:21:42 [968] stonith-ng: warning: stonith_ipc_server_init:
Verify pacemaker and pacemaker_remote are not both enabled.
Any idea what's happening?
Gabriele
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200726/7d05b542/attachment-0001.htm>