[ClusterLabs] getting "Totem is unable to form a cluster" error

Muhammad Sharfuddin M.Sharfuddin at nds.com.pk
Thu Apr 7 14:24:13 EDT 2016


pacemaker 1.1.12-11.12
openais 1.1.4-5.24.5
corosync 1.4.7-0.23.5

Its a two node active/passive cluster and we just upgraded the SLES 11 
SP 3 to SLES 11 SP 4(nothing  else) but when we try to start the cluster 
service we get the following error:

"Totem is unable to form a cluster because of an operating system or 
network fault."

Firewall is stopped and disabled on both the nodes. Both nodes can 
ping/ssh/vnc each other.

corosync.conf:
aisexec {
     group:    root
     user:    root
}
service {
     use_mgmtd:    yes
     use_logd:    yes
     ver:    0
     name:    pacemaker
}
totem {
     rrp_mode:    none
     join:    60
     max_messages:    20
     vsftype:    none
     token:    5000
     consensus:    6000

     interface {
         bindnetaddr:    192.168.150.0

         member {
             memberaddr:     192.168.150.12
         }
         member {
             memberaddr:      192.168.150.13
         }
         mcastport:    5405

         ringnumber:    0

     }
     secauth:    off
     version:    2
     transport:    udpu
     token_retransmits_before_loss_const:    10
     clear_node_high_bit:    new
}
logging {
     to_logfile:    no
     to_syslog:    yes
     debug:    off
     timestamp:    off
     to_stderr:    no
     fileline:    off
     syslog_facility:    daemon
}
amf {
     mode:    disable
}

/var/log/messages:
Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Corosync Cluster Engine 
('1.4.7'): started and ready to provide service.
Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Corosync built-in 
features: nss
Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Successfully configured 
openais services to load
Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Successfully read main 
configuration file '/etc/corosync/corosync.conf'.
Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] Initializing transport 
(UDP/IP Unicast).
Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] Initializing 
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] The network interface is 
down.
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
openais cluster membership service B.01.01
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
openais event service B.01.01
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
openais checkpoint service B.01.01
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
openais availability management framework B.01.01
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
openais message service B.03.01
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
openais distributed locking service B.03.01
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
openais timer service A.01.01
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: process_ais_conf: 
Reading configure
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_init: 
Local handle: 7685269064754659330 for logging
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional logging options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: 
Found 'off' for option: debug
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: 
Found 'no' for option: to_logfile
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: 
Found 'yes' for option: to_syslog
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: 
Found 'daemon' for option: syslog_facility
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_init: 
Local handle: 8535092201842016259 for quorum
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
No additional configuration supplied for: quorum
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: No 
default for option: provider
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_init: 
Local handle: 8054506479773810692 for service
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: 
Found '0' for option: ver
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: 
Defaulting to 'pcmk' for option: clustername
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: 
Found 'yes' for option: use_logd
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: get_config_opt: 
Found 'yes' for option: use_mgmtd
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: pcmk_startup: CRM: 
Initialized
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] Logging: Initialized 
pcmk_startup
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: pcmk_startup: 
Maximum core file size is: 18446744073709551615
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: pcmk_startup: 
Service: 9
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: pcmk_startup: Local 
hostname: prd1
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: pcmk_update_nodeid: 
Local node id: 2130706433
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: update_member: 
Creating entry for node 2130706433 born on 0
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: update_member: 
0x64c9c0 Node 2130706433 now known as prd1 (was: (null))
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: update_member: Node 
prd1 now has 1 quorum votes (was 0)
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: update_member: Node 
2130706433/prd1 is now: member
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Using 
uid=90 and group=90 for process cib
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Forked 
child 8677 for process cib
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Forked 
child 8678 for process stonith-ng
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Forked 
child 8679 for process lrmd
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Using 
uid=90 and group=90 for process attrd
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Forked 
child 8680 for process attrd
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Using 
uid=90 and group=90 for process pengine
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Forked 
child 8681 for process pengine
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Using 
uid=90 and group=90 for process crmd
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Forked 
child 8682 for process crmd
Apr  6 17:51:49 prd1 corosync[8672]:  [pcmk  ] info: spawn_child: Forked 
child 8683 for process mgmtd
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
Pacemaker Cluster Manager 1.1.12
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
corosync extended virtual synchrony service
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
corosync configuration service
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
corosync cluster closed process group service v1.01
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
corosync cluster config database access v1.01
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
corosync profile loading service
Apr  6 17:51:49 prd1 corosync[8672]:  [SERV  ] Service engine loaded: 
corosync cluster quorum service v0.1
Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Compatibility mode set to 
whitetank.  Using V1 and V2 of the synchronization engine.
Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] adding new UDPU member 
{192.168.150.12}
Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] adding new UDPU member 
{192.168.150.13}
Apr  6 17:51:50 prd1 lrmd[8679]:   notice: crm_add_logfile: Additional 
logging available in /var/log/pacemaker.log
Apr  6 17:51:50 prd1 mgmtd: [8683]: info: Pacemaker-mgmt Git Version: 
969d213
Apr  6 17:51:50 prd1 mgmtd: [8683]: WARN: Core dumps could be lost if 
multiple dumps occur.
Apr  6 17:51:50 prd1 mgmtd: [8683]: WARN: Consider setting non-default 
value in /proc/sys/kernel/core_pattern (or equivalent) for maximum 
supportability
Apr  6 17:51:50 prd1 mgmtd: [8683]: WARN: Consider setting 
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum 
supportability
Apr  6 17:51:50 prd1 attrd[8680]:   notice: crm_add_logfile: Additional 
logging available in /var/log/pacemaker.log
Apr  6 17:51:50 prd1 pengine[8681]:   notice: crm_add_logfile: 
Additional logging available in /var/log/pacemaker.log
Apr  6 17:51:50 prd1 attrd[8680]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  6 17:51:50 prd1 cib[8677]:   notice: crm_add_logfile: Additional 
logging available in /var/log/pacemaker.log
Apr  6 17:51:50 prd1 crmd[8682]:   notice: crm_add_logfile: Additional 
logging available in /var/log/pacemaker.log
Apr  6 17:51:50 prd1 attrd[8680]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:50 prd1 corosync[8672]:  [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x7f944c04acf0 for attrd/8680
Apr  6 17:51:50 prd1 crmd[8682]:   notice: main: CRM Git Version: f47ea56
Apr  6 17:51:50 prd1 attrd[8680]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:50 prd1 attrd[8680]:   notice: main: Starting mainloop...
Apr  6 17:51:50 prd1 stonith-ng[8678]:   notice: crm_add_logfile: 
Additional logging available in /var/log/pacemaker.log
Apr  6 17:51:50 prd1 stonith-ng[8678]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  6 17:51:50 prd1 stonith-ng[8678]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:50 prd1 corosync[8672]:  [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x658190 for stonith-ng/8678
Apr  6 17:51:50 prd1 corosync[8672]:  [pcmk  ] info: update_member: Node 
prd1 now has process list: 00000000000000000000000000151312 (1381138)
Apr  6 17:51:50 prd1 corosync[8672]:  [pcmk  ] info: pcmk_ipc: Sending 
membership update 0 to stonith-ng
Apr  6 17:51:50 prd1 stonith-ng[8678]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:50 prd1 cib[8677]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  6 17:51:50 prd1 cib[8677]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:50 prd1 corosync[8672]:  [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x65d450 for cib/8677
Apr  6 17:51:50 prd1 corosync[8672]:  [pcmk  ] info: pcmk_ipc: Sending 
membership update 0 to cib
Apr  6 17:51:50 prd1 cib[8677]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:50 prd1 cib[8677]:   notice: crm_update_peer_state: 
cib_peer_update_callback: Node prd1[2130706433] - state is now lost (was 
(null))
Apr  6 17:51:50 prd1 cib[8677]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node prd1[2130706433] - state is now member 
(was lost)
Apr  6 17:51:50 prd1 mgmtd: [8683]: info: Started.
Apr  6 17:51:51 prd1 crmd[8682]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  6 17:51:51 prd1 crmd[8682]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:51 prd1 corosync[8672]:  [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x661b00 for crmd/8682
Apr  6 17:51:51 prd1 corosync[8672]:  [pcmk  ] info: pcmk_ipc: Sending 
membership update 0 to crmd
Apr  6 17:51:51 prd1 crmd[8682]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:51 prd1 stonith-ng[8678]:   notice: setup_cib: Watching for 
stonith topology changes
Apr  6 17:51:51 prd1 stonith-ng[8678]:   notice: crm_update_peer_state: 
st_peer_update_callback: Node prd1[2130706433] - state is now lost (was 
(null))
Apr  6 17:51:51 prd1 stonith-ng[8678]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node prd1[2130706433] - state is now member 
(was lost)
Apr  6 17:51:51 prd1 crmd[8682]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node prd1[2130706433] - state is now member 
(was (null))
Apr  6 17:51:51 prd1 crmd[8682]:   notice: do_started: The local CRM is 
operational
Apr  6 17:51:51 prd1 crmd[8682]:   notice: do_state_transition: State 
transition S_STARTING -> S_PENDING [ input=I_PENDING 
cause=C_FSA_INTERNAL origin=do_started ]
Apr  6 17:51:51 prd1 stonith-ng[8678]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Apr  6 17:52:12 prd1 crmd[8682]:  warning: do_log: FSA: Input 
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Apr  6 17:52:35 prd1 corosync[8672]:  [MAIN  ] Totem is unable to form a 
cluster because of an operating system or network fault. The most common 
cause of this message is that the local firewall is configured improperly.
Apr  6 17:52:36 prd1 corosync[8672]:  [MAIN  ] Totem is unable to form a 
cluster because of an operating system or network fault. The most common 
cause of this message is that the local firewall is configured improperly.


-- 
Regards,

Muhammad Sharfuddin
<http://www.nds.com.pk>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160407/4fab02f8/attachment-0002.html>


More information about the Users mailing list