<div dir="ltr"><div><div>Hello Bernardo<br><br></div>I don't know if this is the problem, but try this option<br><br> clear_node_high_bit<br> This configuration option is optional and is only relevant when no nodeid is specified. Some openais clients require a signed 32 bit nodeid that is<br>
greater than zero however by default openais uses all 32 bits of the IPv4 address space when generating a nodeid. Set this option to yes to force the high<br> bit to be zero and therefor ensure the nodeid is a positive signed 32 bit integer.<br>
<br> WARNING: The clusters behavior is undefined if this option is enabled on only a subset of the cluster (for example during a rolling upgrade).<br><br></div>Thanks<br></div><div class="gmail_extra"><br><br>
<div class="gmail_quote">2013/6/27 Bernardo Cabezas Serra <span dir="ltr"><<a href="mailto:bcabezas@apsl.net" target="_blank">bcabezas@apsl.net</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello,<br>
<br>
Our cluster was working OK on corosync stack, with corosync 2.3.0 and<br>
pacemaker 1.1.8.<br>
<br>
After upgrading (full versions and configs below), we began to have<br>
problems with node names.<br>
It's a two node cluster, with node names "turifel" (DC) and "selavi".<br>
<br>
When selavi joins cluster, we have this warning at selavi log:<br>
<br>
-----<br>
Jun 27 11:54:29 selavi attrd[11998]: notice: corosync_node_name:<br>
Unable to get node name for nodeid <a href="tel:168385827" value="+49168385827">168385827</a><br>
Jun 27 11:54:29 selavi attrd[11998]: notice: get_node_name: Defaulting<br>
to uname -n for the local corosync node name<br>
-----<br>
<br>
This is ok, and also happenned with version 1.1.8.<br>
<br>
At corosync level, all seems ok:<br>
----<br>
Jun 27 11:51:18 turifel corosync[6725]: [TOTEM ] A processor joined or<br>
left the membership and a new membership (<a href="http://10.9.93.35:1184" target="_blank">10.9.93.35:1184</a>) was formed.<br>
Jun 27 11:51:18 turifel corosync[6725]: [QUORUM] Members[2]: <a href="tel:168385827" value="+49168385827">168385827</a><br>
<a href="tel:168385835" value="+49168385835">168385835</a><br>
Jun 27 11:51:18 turifel corosync[6725]: [MAIN ] Completed service<br>
synchronization, ready to provide service.<br>
Jun 27 11:51:18 turifel crmd[19526]: notice: crm_update_peer_state:<br>
pcmk_quorum_notification: Node selavi[168385827] - state is now member<br>
(was lost)<br>
-------<br>
<br>
But when starting pacemaker on selavi (the new node), turifel log shows<br>
this:<br>
<br>
----<br>
Jun 27 11:54:28 turifel crmd[19526]: notice: do_state_transition:<br>
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN<br>
cause=C_FSA_INTERNAL origin=peer_update_callback ]<br>
Jun 27 11:54:28 turifel crmd[19526]: warning: crm_get_peer: Node<br>
'selavi' and 'selavi' share the same cluster nodeid: 168385827<br>
Jun 27 11:54:28 turifel crmd[19526]: warning: crmd_cs_dispatch:<br>
Recieving messages from a node we think is dead: selavi[0]<br>
Jun 27 11:54:29 turifel crmd[19526]: warning: crm_get_peer: Node<br>
'selavi' and 'selavi' share the same cluster nodeid: 168385827<br>
Jun 27 11:54:29 turifel crmd[19526]: warning: do_state_transition: Only<br>
1 of 2 cluster nodes are eligible to run resources - continue 0<br>
Jun 27 11:54:29 turifel attrd[19524]: notice: attrd_local_callback:<br>
Sending full refresh (origin=crmd)<br>
----<br>
<br>
And selavi remains on pending state. Some times turifel (DC) fences<br>
selavi, but other times remains pending forever.<br>
<br>
On turifel node, all resources gives warnings like this one:<br>
warning: custom_action: Action p_drbd_ha0:0_monitor_0 on selavi is<br>
unrunnable (pending)<br>
<br>
On both nodes, uname -n and crm_node -n gives correct node names (selavi<br>
and turifel respectively)<br>
<br>
¿Do you think it's a configuration problem?<br>
<br>
<br>
Below I give information about versions and configurations.<br>
<br>
Best regards,<br>
Bernardo.<br>
<br>
<br>
-----<br>
Versions (git/hg compiled versions):<br>
<br>
corosync: 2.3.0.66-615d<br>
pacemaker: 1.1.9-61e4b8f<br>
cluster-glue: 1.0.11<br>
libqb: 0.14.4.43-bb4c3<br>
resource-agents: 3.9.5.98-3b051<br>
crmsh: 1.2.5<br>
<br>
Cluster also has drbd, dlm and gfs2, but I think versions are unrelevant<br>
here.<br>
<br>
--------<br>
Output of pacemaker configuration:<br>
./configure --prefix=/opt/ha --without-cman \<br>
--without-heartbeat --with-corosync \<br>
--enable-fatal-warnings=no --with-lcrso-dir=/opt/ha/libexec/lcrso<br>
<br>
pacemaker configuration:<br>
Version = 1.1.9 (Build: 61e4b8f)<br>
Features = generated-manpages ascii-docs ncurses<br>
libqb-logging libqb-ipc lha-fencing upstart nagios corosync-native snmp<br>
libesmtp<br>
<br>
Prefix = /opt/ha<br>
Executables = /opt/ha/sbin<br>
Man pages = /opt/ha/share/man<br>
Libraries = /opt/ha/lib<br>
Header files = /opt/ha/include<br>
Arch-independent files = /opt/ha/share<br>
State information = /opt/ha/var<br>
System configuration = /opt/ha/etc<br>
Corosync Plugins = /opt/ha/lib<br>
<br>
Use system LTDL = yes<br>
<br>
HA group name = haclient<br>
HA user name = hacluster<br>
<br>
CFLAGS = -I/opt/ha/include -I/opt/ha/include<br>
-I/opt/ha/include/heartbeat -I/opt/ha/include -I/opt/ha/include<br>
-ggdb -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return<br>
-Wbad-function-cast -Wcast-align -Wdeclaration-after-statement<br>
-Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security<br>
-Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations<br>
-Wnested-externs -Wno-long-long -Wno-strict-aliasing<br>
-Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes<br>
-Wwrite-strings<br>
Libraries = -lgnutls -lcorosync_common -lplumb -lpils<br>
-lqb -lbz2 -lxslt -lxml2 -lc -luuid -lpam -lrt -ldl -lglib-2.0 -lltdl<br>
-L/opt/ha/lib -lqb -ldl -lrt -lpthread<br>
Stack Libraries = -L/opt/ha/lib -lqb -ldl -lrt -lpthread<br>
-L/opt/ha/lib -lcpg -L/opt/ha/lib -lcfg -L/opt/ha/lib -lcmap<br>
-L/opt/ha/lib -lquorum<br>
<br>
----<br>
Corosync config:<br>
<br>
totem {<br>
version: 2<br>
crypto_cipher: none<br>
crypto_hash: none<br>
cluster_name: fiestaha<br>
interface {<br>
ringnumber: 0<br>
ttl: 1<br>
bindnetaddr: 10.9.93.0<br>
mcastaddr: 226.94.1.1<br>
mcastport: 5405<br>
}<br>
}<br>
logging {<br>
fileline: off<br>
to_stderr: yes<br>
to_logfile: no<br>
to_syslog: yes<br>
syslog_facility: local7<br>
debug: off<br>
timestamp: on<br>
logger_subsys {<br>
subsys: QUORUM<br>
debug: off<br>
}<br>
}<br>
quorum {<br>
provider: corosync_votequorum<br>
expected_votes: 2<br>
two_node: 1<br>
wait_for_all: 0<br>
}<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
--<br>
APSL<br>
*Bernardo Cabezas Serra*<br>
*Responsable Sistemas*<br>
Camí Vell de Bunyola 37, esc. A, local 7<br>
07009 Polígono de Son Castelló, Palma<br>
Mail: <a href="mailto:bcabezas@apsl.net">bcabezas@apsl.net</a><br>
Skype: bernat.cabezas<br>
Tel: 971439771<br>
<br>
<br>
_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</blockquote></div><br><br clear="all"><br>-- <br>esta es mi vida e me la vivo hasta que dios quiera
</div>