[Pacemaker] Two cloned VM, only one of the both shows online when starting corosync/pacemaker

Guillaume Chanaud guillaume.chanaud at connecting-nature.com
Fri Jul 16 11:44:27 EDT 2010


Hello,

I have a working config for a drbd/nfs active/passive cluster.

I added a webserver to the cluster. This webserver is hosted in a Fedora 
13 VM.
Nodes communicate in a vlan (physical and virtual nodes communicate 
directly through the vlan).
I configured a first webserver node, and everything went fine. It goes 
online, start httpd (and pureftpd). A i said, this httpd node is a kvm 
virtual machine.
DRBD/nfs active/passive are both physical nodes (under gentoo)
Every node are running corosync 1.2.3, fedora use pacemaker 1.1.1 and 
gentoo pacemaker 1.0.8

Here is the cib :
node filer1.connecting-nature.com \
         attributes standby="off"
node filer2.connecting-nature.com \
         attributes standby="off"
node www01.connecting-nature.com \
         attributes standby="off"
node www02.connecting-nature.com \
         attributes standby="off"
primitive drbd_nfs ocf:linbit:drbd \
         params drbd_resource="r0" \
         op monitor interval="59s" role="Master" timeout="60s" \
         op monitor interval="60s" role="Slave" timeout="60s"
primitive fs_nfs ocf:heartbeat:Filesystem \
         op monitor interval="120s" timeout="60s" \
         params device="/dev/drbd0" directory="/data" fstype="ext4"
primitive httpd_server ocf:heartbeat:apache \
         op monitor interval="40s" timeout="20s" \
         params configfile="/etc/httpd/conf/httpd.conf"
primitive ip_failover heartbeat:OVHfailover.py \
         op monitor interval="120s" timeout="60s" \
         params 1="xxxxxx-ovh" 2="xxxxxx" 3="nsxxxxx.ovh.net" 
4="ip.fail.ov.er" \
         meta target-role="Started"
primitive ip_nfs ocf:heartbeat:IPaddr2 \
         op monitor interval="60s" timeout="20s" \
         params ip="192.168.0.20" cidr_netmask="24" nic="vlan2019"
primitive nfs_server lsb:nfs \
         op monitor interval="120s" timeout="60s"
primitive pureftpd_server ocf:heartbeat:Pure-FTPd \
         op monitor interval="30s" timeout="10s" \
         params script="/usr/sbin/pure-config.pl" daemon_type="ldap"
group group_nfs ip_nfs fs_nfs nfs_server ip_failover \
         meta target-role="Started"
ms ms_drbd_nfs drbd_nfs \
         meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" tar
ms ms_pureftpd_server pureftpd_server \
         meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" tar
clone httpd-clone httpd_server
location drbd-filer1 ms_drbd_nfs \
         rule $id="drbd-filer1-rule" 90: #uname eq 
filer1.connecting-nature.com
location drbd-filer2 ms_drbd_nfs \
         rule $id="drbd-filer2-rule" 90: #uname eq 
filer2.connecting-nature.com
location httpd-www01 httpd-clone \
         rule $id="httpd-www01-rule" 90: #uname eq 
www01.connecting-nature.com
location httpd-www02 httpd-clone \
         rule $id="httpd-www02-rule" 90: #uname eq 
www02.connecting-nature.com
location nfs-filer1 group_nfs \
         rule $id="nfs-filer1-rule" 90: #uname eq 
filer1.connecting-nature.com
location nfs-filer2 group_nfs \
         rule $id="nfs-filer2-rule" 90: #uname eq 
filer2.connecting-nature.com
location pureftpd_www01 ms_pureftpd_server \
         rule $id="pureftpd-www01-rule" 90: #uname eq 
www01.connecting-nature.com
location pureftpd_www02 ms_pureftpd_server \
         rule $id="pureftpd-www02-rule" 90: #uname eq 
www02.connecting-nature.com
colocation nfs_on_drbd inf: group_nfs ms_drbd_nfs:Master
order nfs_after_drbd inf: ms_drbd_nfs:promote group_nfs:start
property $id="cib-bootstrap-options" \
         symmetric-cluster="false" \
         no-quorum-policy="ignore" \
         default-resource-stickiness="100" \
         default-resource-failure-stickiness="10" \
         stonith-enabled="false" \
         stonith-action="reboot" \
         stop-orphan-resources="true" \
         stop-orphan-actions="true" \
         remove-after-stop="false" \
         short-resource-names="true" \
         transition-idle-timeout="3min" \
         default-action-timeout="30s" \
         is-managed-default="true" \
         startup-fencing="true" \
         cluster-delay="60s" \
         expected-nodes="1" \
         expected-quorum-votes="4" \
         dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
         cluster-infrastructure="openais" \
         last-lrm-refresh="1279241420" \
         election-timeout="50s"

Ok when starting www01 (for the first time ) crm_mon show :
Online: [ filer2.connecting-nature.com filer1.connecting-nature.com 
www01.connecting-nature.com ]
Resource Group: group_nfs
      ip_nfs     (ocf::heartbeat:IPaddr2):       Started 
filer1.connecting-nature.com
      fs_nfs     (ocf::heartbeat:Filesystem):    Started 
filer1.connecting-nature.com
      nfs_server (lsb:nfs):      Started filer1.connecting-nature.com
      ip_failover        (heartbeat:OVHfailover.py):     Started 
filer1.connecting-nature.com
  Master/Slave Set: ms_drbd_nfs
      Masters: [ filer1.connecting-nature.com ]
      Slaves: [ filer2.connecting-nature.com ]
  Clone Set: httpd-clone
      Started: [ www01.connecting-nature.com ]
      Stopped: [ httpd_server:1]
  Master/Slave Set: ms_pureftpd_server
      Slaves: [ www01.connecting-nature.com ]
      Stopped: [ pureftpd_server:1 ]

Nice, everything's ok.
Now, i want to start www02, which is a vm cloned from www01. Only the 
hostname and ipaddress changed !
Online: [ filer2.connecting-nature.com filer1.connecting-nature.com 
www01.connecting-nature.com ]
OFFLINE: [ www02.connecting-nature.com ]
...

The machine correctly join the cluster, but always stay stopped...So i 
did the test by stopping www02, stopping www01, and then starting 
www02... It works, www02 goes online, but now, if i want to start www01, 
it stays offline.
I can't start the the two node, each time one or the other will stay 
offline.
If i shutdown the online node, after some time, the offline node will 
come online, and when i restart the shooted node, it'll stay offline. 
Sometimes services are started on the offline node but it remains 
offline for corosync ! (it happens when i shutdown/launch www node in 
random order to see what happens)
I tried without any services, that doesn't change anything.

Here are my corosync.conf :
# Please read the corosync.conf.5 manual page
compatibility: none
aisexec {
         # Run as root - this is necessary to be able to manage 
resources with Pacemaker
         user: root
         group: root
}
service {
         # Load the Pacemaker Cluster Resource Manager
         name: pacemaker
         ver: 0
         use_logd: yes
}
totem {
         version: 2
         # How long before declaring a token lost (ms)
         token: 5000
         hold: 40
         # How many token retransmits before forming a new configuration
         token_retransmits_before_loss_const: 30
         # How long to wait for join messages in the membership protocol 
(ms)
         join: 1000
         # How long to wait for consensus to be achieved before
         # starting a new round of membership configuration (ms)
         consensus: 7500
         # Turn off the virtual synchrony filter
        vsftype: none
         # Number of messages that may be sent by one processor on 
receipt of the token
         max_messages: 20
         # Limit generated nodeids to 31-bits (positive signed integers)
         clear_node_high_bit: yes
         # Enable encryption
         secauth: off
         threads: 0
         # Optionally assign a fixed node id (integer)
         nodeid: 30283707487
         interface {
                 ringnumber: 0
                 bindnetaddr: 192.168.0.60
                 mcastport: 5406
                 mcastaddr: 225.0.0.1
         }
}
logging {
         fileline: on
         to_stderr: yes
         to_logfile: yes
         to_syslog: yes
         logfile: /var/log/corosync.log
         debug: off
         timestamp: on
         logger_subsys {
                 subsys: AMF
  debug: off
         }
}
amf {
         mode: disabled
}

the "bindnetaddress" change for every node, and the "nodeid" too (i 
tried to put a fixed value to see if it was not a matter of auto 
generated node id value, but it didn't change anything).
After some time, on the offline node, the only two processes running are :
/usr/lib64/heartbeat/lrmd
/usr/lib64/heartbeat/pengine

(there should be
corosync
  /usr/lib64/heartbeat/stonithd
  /usr/lib64/heartbeat/cib
/usr/lib64/heartbeat/lrmd
/usr/lib64/heartbeat/attrd
/usr/lib64/heartbeat/pengine
/usr/lib64/heartbeat/crmd
)

I join the log here :

Jul 16 17:37:57 www01 corosync[25654]:   [MAIN  ] main.c:1307 Corosync 
Cluster Engine ('1.2.3'): started and ready to provide service.
Jul 16 17:37:57 www01 corosync[25654]:   [MAIN  ] main.c:1308 Corosync 
built-in features: nss rdma
Jul 16 17:37:57 www01 corosync[25654]:   [MAIN  ] main.c:1383 
Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jul 16 17:37:57 www01 corosync[25654]:   [TOTEM ] totemnet.c:204 
Initializing transport (UDP/IP).
Jul 16 17:37:57 www01 corosync[25654]:   [TOTEM ] totemudp.c:321 
Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC 
(mode 0).
Jul 16 17:37:58 www01 corosync[25654]:   [TOTEM ] totemudp.c:1378 The 
network interface [192.168.0.60] is now up.
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:330 info: 
process_ais_conf: Reading configure
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:556 info: 
config_find_init: Local handle: 2013064636357672962 for logging
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:582 info: 
config_find_next: Processing additional logging options...
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:608 info: 
get_config_opt: Found 'on' for option: debug
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:622 info: 
get_config_opt: Defaulting to 'off' for option: to_file
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:608 info: 
get_config_opt: Found 'yes' for option: to_syslog
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:622 info: 
get_config_opt: Defaulting to 'daemon' for option: syslog_facility
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:556 info: 
config_find_init: Local handle: 4730966301143465987 for service
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:582 info: 
config_find_next: Processing additional service options...
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:622 info: 
get_config_opt: Defaulting to 'pcmk' for option: clustername
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:608 info: 
get_config_opt: Found 'yes' for option: use_logd
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:622 info: 
get_config_opt: Defaulting to 'no' for option: use_mgmtd
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:554 info: 
pcmk_startup: CRM: Initialized
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:555 Logging: 
Initialized pcmk_startup
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:573 info: 
pcmk_startup: Maximum core file size is: 18446744073709551615
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:594 info: 
pcmk_startup: Service: 9
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:595 info: 
pcmk_startup: Local hostname: www01.connecting-nature.com
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:510 info: 
pcmk_update_nodeid: Local node id: 218936415
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:243 info: 
update_member: Creating entry for node 218936415 born on 0
Jul 16 17:37:58 www01 stonith-ng: [25661]: WARN: Initializing connection 
to logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: Invoked: 
/usr/lib64/heartbeat/stonithd
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: 
G_main_add_SignalHandler: Added signal handler for signal 17
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: crm_cluster_connect: 
Connecting to OpenAIS
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: init_ais_connection: 
Creating connection to our AIS plugin
Jul 16 17:37:58 www01 lrmd: [25663]: WARN: Initializing connection to 
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 lrmd: [25663]: info: G_main_add_SignalHandler: 
Added signal handler for signal 15
Jul 16 17:37:58 www01 cib: [25662]: WARN: Initializing connection to 
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 cib: [25662]: info: Invoked: /usr/lib64/heartbeat/cib
Jul 16 17:37:58 www01 cib: [25662]: info: G_main_add_TriggerHandler: 
Added signal manual handler
Jul 16 17:37:58 www01 cib: [25662]: info: G_main_add_SignalHandler: 
Added signal handler for signal 17
Jul 16 17:37:58 www01 lrmd: [25663]: info: G_main_add_SignalHandler: 
Added signal handler for signal 17
Jul 16 17:37:58 www01 attrd: [25664]: WARN: Initializing connection to 
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 pengine: [25665]: WARN: Initializing connection to 
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:270 info: 
update_member: 0x1259e10 Node 218936415 now known as 
www01.connecting-nature.com (was: (null))
Jul 16 17:37:58 www01 crmd: [25666]: WARN: Initializing connection to 
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: init_ais_connection: 
AIS connection established
Jul 16 17:37:58 www01 cib: [25662]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
/var/lib/heartbeat/crm/cib.xml.sig)
Jul 16 17:37:58 www01 attrd: [25664]: info: Invoked: 
/usr/lib64/heartbeat/attrd
Jul 16 17:37:58 www01 pengine: [25665]: info: Invoked: 
/usr/lib64/heartbeat/pengine
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:286 info: 
update_member: Node www01.connecting-nature.com now has 1 quorum votes 
(was 0)
Jul 16 17:37:58 www01 crmd: [25666]: info: Invoked: 
/usr/lib64/heartbeat/crmd
Jul 16 17:37:58 www01 cib: [25662]: WARN: retrieveCib: Cluster 
configuration not found: /var/lib/heartbeat/crm/cib.xml
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: get_ais_nodeid: Server 
details: id=218936415 uname=www01.connecting-nature.com cname=pcmk
Jul 16 17:37:58 www01 lrmd: [25663]: info: G_main_add_SignalHandler: 
Added signal handler for signal 10
Jul 16 17:37:58 www01 attrd: [25664]: info: main: Starting up
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:296 info: 
update_member: Node 218936415/www01.connecting-nature.com is now: member
Jul 16 17:37:58 www01 crmd: [25666]: info: main: CRM Hg Version: 
972b9a5f68606f632893fceed658efa085062f55
Jul 16 17:37:58 www01 cib: [25662]: WARN: readCibXmlFile: Primary 
configuration corrupt or unusable, trying backup...
Jul 16 17:37:58 www01 lrmd: [25663]: info: G_main_add_SignalHandler: 
Added signal handler for signal 12
Jul 16 17:37:58 www01 attrd: [25664]: info: crm_cluster_connect: 
Connecting to OpenAIS
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:132 info: 
spawn_child: Forked child 25661 for process stonith-ng
Jul 16 17:37:58 www01 crmd: [25666]: info: crmd_init: Starting crmd
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: crm_new_peer: Node 
www01.connecting-nature.com now has id: 218936415
Jul 16 17:37:58 www01 attrd: [25664]: info: init_ais_connection: 
Creating connection to our AIS plugin
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:132 info: 
spawn_child: Forked child 25662 for process cib
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: crm_new_peer: Node 
218936415 is now known as www01.connecting-nature.com
Jul 16 17:37:58 www01 lrmd: [25663]: info: Started.
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:132 info: 
spawn_child: Forked child 25663 for process lrmd
Jul 16 17:37:58 www01 attrd: [25664]: info: init_ais_connection: AIS 
connection established
Jul 16 17:37:58 www01 cib: [25662]: WARN: readCibXmlFile: Continuing 
with an empty configuration.
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: main: Starting 
stonith-ng mainloop
Jul 16 17:37:58 www01 pengine: [25665]: info: main: Starting pengine
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:132 info: 
spawn_child: Forked child 25664 for process attrd
Jul 16 17:37:58 www01 attrd: [25664]: info: get_ais_nodeid: Server 
details: id=218936415 uname=www01.connecting-nature.com cname=pcmk
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:132 info: 
spawn_child: Forked child 25665 for process pengine
Jul 16 17:37:58 www01 attrd: [25664]: info: crm_new_peer: Node 
www01.connecting-nature.com now has id: 218936415
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:132 info: 
spawn_child: Forked child 25666 for process crmd
Jul 16 17:37:58 www01 attrd: [25664]: info: crm_new_peer: Node 218936415 
is now known as www01.connecting-nature.com
Jul 16 17:37:58 www01 corosync[25654]:   [SERV  ] service.c:265 Service 
engine loaded: Pacemaker Cluster Manager 1.1.1
Jul 16 17:37:58 www01 attrd: [25664]: info: main: Cluster connection active
Jul 16 17:37:58 www01 corosync[25654]:   [SERV  ] service.c:265 Service 
engine loaded: corosync extended virtual synchrony service
Jul 16 17:37:58 www01 crmd: [25666]: info: G_main_add_SignalHandler: 
Added signal handler for signal 17
Jul 16 17:37:58 www01 attrd: [25664]: info: main: Accepting attribute 
updates
Jul 16 17:37:58 www01 corosync[25654]:   [SERV  ] service.c:265 Service 
engine loaded: corosync configuration service
Jul 16 17:37:58 www01 attrd: [25664]: info: main: Starting mainloop...
Jul 16 17:37:58 www01 corosync[25654]:   [SERV  ] service.c:265 Service 
engine loaded: corosync cluster closed process group service v1.01
Jul 16 17:37:58 www01 corosync[25654]:   [SERV  ] service.c:265 Service 
engine loaded: corosync cluster config database access v1.01
Jul 16 17:37:58 www01 corosync[25654]:   [SERV  ] service.c:265 Service 
engine loaded: corosync profile loading service
Jul 16 17:37:58 www01 corosync[25654]:   [SERV  ] service.c:265 Service 
engine loaded: corosync cluster quorum service v0.1
Jul 16 17:37:58 www01 corosync[25654]:   [MAIN  ] main.c:1220 
Compatibility mode set to none.  Using V2 of the synchronization engine.
Jul 16 17:37:58 www01 cib: [25662]: info: startCib: CIB Initialization 
completed successfully
Jul 16 17:37:58 www01 cib: [25662]: info: crm_cluster_connect: 
Connecting to OpenAIS
Jul 16 17:37:58 www01 cib: [25662]: info: init_ais_connection: Creating 
connection to our AIS plugin
Jul 16 17:37:58 www01 cib: [25662]: info: init_ais_connection: AIS 
connection established
Jul 16 17:37:58 www01 cib: [25662]: info: get_ais_nodeid: Server 
details: id=218936415 uname=www01.connecting-nature.com cname=pcmk
Jul 16 17:37:58 www01 cib: [25662]: info: crm_new_peer: Node 
www01.connecting-nature.com now has id: 218936415
Jul 16 17:37:58 www01 cib: [25662]: info: crm_new_peer: Node 218936415 
is now known as www01.connecting-nature.com
Jul 16 17:37:58 www01 cib: [25662]: info: cib_init: Starting cib mainloop
Jul 16 17:37:58 www01 cib: [25662]: info: ais_dispatch: Membership 
145204: quorum still lost
Jul 16 17:37:58 www01 cib: [25662]: info: crm_update_peer: Node 
www01.connecting-nature.com: id=218936415 state=member (new) addr=r(0) 
ip(192.168.0.60)  (new) votes=1 (new) born=0 seen=145204 
proc=00000000000000000000000000111312 (new)
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:699 notice: 
pcmk_peer_update: Transitional membership event on ring 145204: memb=0, 
new=0, lost=0
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:699 notice: 
pcmk_peer_update: Stable membership event on ring 145204: memb=1, new=1, 
lost=0
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:727 info: 
pcmk_peer_update: NEW:  www01.connecting-nature.com 218936415
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: www01.connecting-nature.com 218936415
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] utils.c:279 info: 
update_member: Node www01.connecting-nature.com now has process list: 
00000000000000000000000000111312 (1118994)
Jul 16 17:37:58 www01 corosync[25654]:   [TOTEM ] totemsrp.c:1786 A 
processor joined or left the membership and a new membership was formed.
Jul 16 17:37:58 www01 corosync[25654]:   [MAIN  ] main.c:261 Completed 
service synchronization, ready to provide service.
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:973 info: 
pcmk_ipc: Recorded connection 0x1263600 for stonith-ng/25661
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:973 info: 
pcmk_ipc: Recorded connection 0x1265f70 for attrd/25664
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:973 info: 
pcmk_ipc: Recorded connection 0x1266a20 for cib/25662
Jul 16 17:37:58 www01 corosync[25654]:   [pcmk  ] plugin.c:982 info: 
pcmk_ipc: Sending membership update 145204 to cib
Jul 16 17:37:58 www01 cib: [25670]: info: write_cib_contents: Wrote 
version 0.0.0 of the CIB to disk (digest: d73fab8054ae39ca7743184041837550)
Jul 16 17:37:58 www01 cib: [25670]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.zo9HRa (digest: 
/var/lib/heartbeat/crm/cib.2sBGDK)
Jul 16 17:37:58 www01 cib: [25662]: info: Managed write_cib_contents 
process 25670 exited with return code 0.
Jul 16 17:37:59 www01 crmd: [25666]: info: do_cib_control: CIB 
connection established
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_cluster_connect: 
Connecting to OpenAIS
Jul 16 17:37:59 www01 crmd: [25666]: info: init_ais_connection: Creating 
connection to our AIS plugin
Jul 16 17:37:59 www01 crmd: [25666]: info: init_ais_connection: AIS 
connection established
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:973 info: 
pcmk_ipc: Recorded connection 0x12659e0 for crmd/25666
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:982 info: 
pcmk_ipc: Sending membership update 145204 to crmd
Jul 16 17:37:59 www01 crmd: [25666]: info: get_ais_nodeid: Server 
details: id=218936415 uname=www01.connecting-nature.com cname=pcmk
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node 
www01.connecting-nature.com now has id: 218936415
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node 218936415 
is now known as www01.connecting-nature.com
Jul 16 17:37:59 www01 crmd: [25666]: info: do_ha_control: Connected to 
the cluster
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:699 notice: 
pcmk_peer_update: Transitional membership event on ring 145212: memb=1, 
new=0, lost=0
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:710 info: 
pcmk_peer_update: memb: www01.connecting-nature.com 218936415
Jul 16 17:37:59 www01 cib: [25662]: info: ais_dispatch: Membership 
145212: quorum still lost
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:699 notice: 
pcmk_peer_update: Stable membership event on ring 145212: memb=4, new=3, 
lost=0
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:243 info: 
update_member: Creating entry for node 83929280 born on 145212
Jul 16 17:37:59 www01 cib: [25662]: info: crm_new_peer: Node <null> now 
has id: 83929280
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:296 info: 
update_member: Node 83929280/unknown is now: member
Jul 16 17:37:59 www01 cib: [25662]: info: crm_update_peer: Node (null): 
id=83929280 state=member (new) addr=r(0) ip(192.168.0.5)  votes=0 born=0 
seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:727 info: 
pcmk_peer_update: NEW:  .pending. 83929280
Jul 16 17:37:59 www01 crmd: [25666]: info: do_started: Delaying start, 
CCM (0000000000100000) not connected
Jul 16 17:37:59 www01 cib: [25662]: info: crm_new_peer: Node <null> now 
has id: 100706496
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:243 info: 
update_member: Creating entry for node 100706496 born on 145212
Jul 16 17:37:59 www01 cib: [25662]: info: crm_update_peer: Node (null): 
id=100706496 state=member (new) addr=r(0) ip(192.168.0.6)  votes=0 
born=0 seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:296 info: 
update_member: Node 100706496/unknown is now: member
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:727 info: 
pcmk_peer_update: NEW:  .pending. 100706496
Jul 16 17:37:59 www01 cib: [25662]: info: crm_new_peer: Node <null> now 
has id: 218437511
Jul 16 17:37:59 www01 crmd: [25666]: info: crmd_init: Starting crmd's 
mainloop
Jul 16 17:37:59 www01 cib: [25662]: info: crm_update_peer: Node (null): 
id=218437511 state=member (new) addr=r(0) ip(192.168.0.70)  votes=0 
born=0 seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:243 info: 
update_member: Creating entry for node 218437511 born on 145212
Jul 16 17:37:59 www01 crmd: [25666]: info: ais_dispatch: Membership 
145204: quorum still lost
Jul 16 17:37:59 www01 cib: [25662]: notice: ais_dispatch: Membership 
145212: quorum acquired
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:296 info: 
update_member: Node 218437511/unknown is now: member
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_update_peer: Node 
www01.connecting-nature.com: id=218936415 state=member (new) addr=r(0) 
ip(192.168.0.60)  (new) votes=1 (new) born=0 seen=145204 
proc=00000000000000000000000000111312 (new)
Jul 16 17:37:59 www01 cib: [25662]: info: crm_get_peer: Node 100706496 
is now known as filer2.connecting-nature.com
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:727 info: 
pcmk_peer_update: NEW:  .pending. 218437511
Jul 16 17:37:59 www01 cib: [25662]: info: crm_update_peer: Node 
filer2.connecting-nature.com: id=100706496 state=member addr=r(0) 
ip(192.168.0.6)  votes=1 (new) born=15948 seen=145212 
proc=00000000000000000000000000013312 (new)
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: .pending. 83929280
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: .pending. 100706496
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: www01.connecting-nature.com 218936415
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: .pending. 218437511
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:1288 info: 
send_member_notification: Sending membership update 145212 to 2 children
Jul 16 17:37:59 www01 crmd: [25666]: info: do_started: Delaying start, 
Config not read (0000000000000040)
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:259 info: 
update_member: 0x1259e10 Node 218936415 ((null)) born on: 145212
Jul 16 17:37:59 www01 corosync[25654]:   [TOTEM ] totemsrp.c:1786 A 
processor joined or left the membership and a new membership was formed.
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:259 info: 
update_member: 0x1265450 Node 100706496 (filer2.connecting-nature.com) 
born on: 15948
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:270 info: 
update_member: 0x1265450 Node 100706496 now known as 
filer2.connecting-nature.com (was: (null))
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:279 info: 
update_member: Node filer2.connecting-nature.com now has process list: 
00000000000000000000000000013312 (78610)
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] utils.c:286 info: 
update_member: Node filer2.connecting-nature.com now has 1 quorum votes 
(was 0)
Jul 16 17:37:59 www01 crmd: [25666]: info: config_query_callback: 
Checking for expired actions every 900000ms
Jul 16 17:37:59 www01 corosync[25654]:   [pcmk  ] plugin.c:1288 info: 
send_member_notification: Sending membership update 145212 to 2 children
Jul 16 17:37:59 www01 crmd: [25666]: info: config_query_callback: 
Sending expected-votes=2 to corosync
Jul 16 17:37:59 www01 crmd: [25666]: info: ais_dispatch: Membership 
145212: quorum still lost
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node <null> now 
has id: 83929280
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_update_peer: Node (null): 
id=83929280 state=member (new) addr=r(0) ip(192.168.0.5)  votes=0 born=0 
seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node <null> now 
has id: 100706496
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_update_peer: Node (null): 
id=100706496 state=member (new) addr=r(0) ip(192.168.0.6)  votes=0 
born=0 seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node <null> now 
has id: 218437511
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_update_peer: Node (null): 
id=218437511 state=member (new) addr=r(0) ip(192.168.0.70)  votes=0 
born=0 seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 crmd: [25666]: info: do_started: The local CRM is 
operational
Jul 16 17:37:59 www01 crmd: [25666]: info: do_state_transition: State 
transition S_STARTING -> S_PENDING [ input=I_PENDING 
cause=C_FSA_INTERNAL origin=do_started ]
Jul 16 17:38:00 www01 crmd: [25666]: notice: ais_dispatch: Membership 
145212: quorum acquired
Jul 16 17:38:00 www01 crmd: [25666]: info: crm_get_peer: Node 100706496 
is now known as filer2.connecting-nature.com
Jul 16 17:38:00 www01 crmd: [25666]: info: crm_update_peer: Node 
filer2.connecting-nature.com: id=100706496 state=member addr=r(0) 
ip(192.168.0.6)  votes=1 (new) born=15948 seen=145212 
proc=00000000000000000000000000013312 (new)
Jul 16 17:38:00 www01 crmd: [25666]: info: ais_dispatch: Membership 
145212: quorum retained
Jul 16 17:38:00 www01 crmd: [25666]: info: te_connect_stonith: 
Attempting connection to fencing daemon...
Jul 16 17:38:01 www01 crmd: [25666]: info: te_connect_stonith: Connected
Jul 16 17:38:03 www01 attrd: [25664]: info: cib_connect: Connected to 
the CIB after 1 signon attempts
Jul 16 17:38:03 www01 attrd: [25664]: info: cib_connect: Sending full 
refresh
Jul 16 17:38:08 www01 cib: [25662]: info: ais_dispatch: Membership 
145212: quorum retained
Jul 16 17:38:08 www01 crmd: [25666]: info: ais_dispatch: Membership 
145212: quorum retained
Jul 16 17:38:08 www01 cib: [25662]: info: crm_get_peer: Node 218437511 
is now known as www02.connecting-nature.com
Jul 16 17:38:08 www01 crmd: [25666]: info: crm_get_peer: Node 218437511 
is now known as www02.connecting-nature.com
Jul 16 17:38:08 www01 cib: [25662]: info: crm_update_peer: Node 
www02.connecting-nature.com: id=218437511 state=member addr=r(0) 
ip(192.168.0.70)  votes=1 (new) born=145144 seen=145212 
proc=00000000000000000000000000111312 (new)
Jul 16 17:38:08 www01 crmd: [25666]: info: crm_update_peer: Node 
www02.connecting-nature.com: id=218437511 state=member addr=r(0) 
ip(192.168.0.70)  votes=1 (new) born=145144 seen=145212 
proc=00000000000000000000000000111312 (new)
Jul 16 17:38:08 www01 cib: [25662]: info: ais_dispatch: Membership 
145212: quorum retained
Jul 16 17:38:08 www01 cib: [25662]: info: crm_get_peer: Node 83929280 is 
now known as filer1.connecting-nature.com
Jul 16 17:38:08 www01 cib: [25662]: info: crm_update_peer: Node 
filer1.connecting-nature.com: id=83929280 state=member addr=r(0) 
ip(192.168.0.5)  votes=1 (new) born=15948 seen=145212 
proc=00000000000000000000000000013312 (new)
Jul 16 17:38:08 www01 cib: [25662]: info: cib_process_diff: Diff 
0.755.35 -> 0.755.36 not applied to 0.0.0: current "epoch" is less than 
required
Jul 16 17:38:08 www01 cib: [25662]: info: cib_server_process_diff: 
Requesting re-sync from peer
Jul 16 17:38:08 www01 cib: [25662]: WARN: cib_diff_notify: Local-only 
Change (client:crmd, call: 5456): 0.0.0 (Application of an update diff 
failed, requesting a full refresh)
Jul 16 17:38:08 www01 crmd: [25666]: info: ais_dispatch: Membership 
145212: quorum retained
Jul 16 17:38:08 www01 crmd: [25666]: info: crm_get_peer: Node 83929280 
is now known as filer1.connecting-nature.com
Jul 16 17:38:08 www01 crmd: [25666]: info: crm_update_peer: Node 
filer1.connecting-nature.com: id=83929280 state=member addr=r(0) 
ip(192.168.0.5)  votes=1 (new) born=15948 seen=145212 
proc=00000000000000000000000000013312 (new)
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] utils.c:259 info: 
update_member: 0x1265700 Node 218437511 (www02.connecting-nature.com) 
born on: 145144
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] utils.c:270 info: 
update_member: 0x1265700 Node 218437511 now known as 
www02.connecting-nature.com (was: (null))
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] utils.c:279 info: 
update_member: Node www02.connecting-nature.com now has process list: 
00000000000000000000000000111312 (1118994)
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] utils.c:286 info: 
update_member: Node www02.connecting-nature.com now has 1 quorum votes 
(was 0)
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:317 info: 
update_expected_votes: Expected quorum votes 2 -> 3
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:1288 info: 
send_member_notification: Sending membership update 145212 to 2 children
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] utils.c:259 info: 
update_member: 0x1265140 Node 83929280 (filer1.connecting-nature.com) 
born on: 15948
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] utils.c:270 info: 
update_member: 0x1265140 Node 83929280 now known as 
filer1.connecting-nature.com (was: (null))
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] utils.c:279 info: 
update_member: Node filer1.connecting-nature.com now has process list: 
00000000000000000000000000013312 (78610)
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] utils.c:286 info: 
update_member: Node filer1.connecting-nature.com now has 1 quorum votes 
(was 0)
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:317 info: 
update_expected_votes: Expected quorum votes 3 -> 4
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:1288 info: 
send_member_notification: Sending membership update 145212 to 2 children
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:699 notice: 
pcmk_peer_update: Transitional membership event on ring 145224: memb=3, 
new=0, lost=1
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:710 info: 
pcmk_peer_update: memb: filer2.connecting-nature.com 100706496
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:710 info: 
pcmk_peer_update: memb: www01.connecting-nature.com 218936415
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:710 info: 
pcmk_peer_update: memb: www02.connecting-nature.com 218437511
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:715 info: 
pcmk_peer_update: lost: filer1.connecting-nature.com 83929280
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:699 notice: 
pcmk_peer_update: Stable membership event on ring 145224: memb=4, new=1, 
lost=0
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:727 info: 
pcmk_peer_update: NEW:  filer1.connecting-nature.com 83929280
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: filer1.connecting-nature.com 83929280
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: filer2.connecting-nature.com 100706496
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: www01.connecting-nature.com 218936415
Jul 16 17:38:08 www01 corosync[25654]:   [pcmk  ] plugin.c:743 info: 
pcmk_peer_update: MEMB: www02.connecting-nature.com 218437511
Jul 16 17:38:08 www01 corosync[25654]:   [TOTEM ] totemsrp.c:1786 A 
processor joined or left the membership and a new membership was formed.
Jul 16 17:38:11 www01 stonith-ng: [25661]: ERROR: ais_dispatch: 
Receiving message body failed: (2) Library error: Resource temporarily 
unavailable (11)
Jul 16 17:38:11 www01 cib: [25662]: ERROR: ais_dispatch: Receiving 
message body failed: (2) Library error: Resource temporarily unavailable 
(11)
Jul 16 17:38:11 www01 stonith-ng: [25661]: ERROR: ais_dispatch: AIS 
connection failed
Jul 16 17:38:11 www01 cib: [25662]: ERROR: ais_dispatch: AIS connection 
failed
Jul 16 17:38:11 www01 cib: [25662]: ERROR: cib_ais_destroy: AIS 
connection terminated
Jul 16 17:38:11 www01 stonith-ng: [25661]: ERROR: 
stonith_peer_ais_destroy: AIS connection terminated
Jul 16 17:38:11 www01 attrd: [25664]: ERROR: ais_dispatch: Receiving 
message body failed: (2) Library error: Resource temporarily unavailable 
(11)
Jul 16 17:38:11 www01 attrd: [25664]: ERROR: ais_dispatch: AIS 
connection failed
Jul 16 17:38:11 www01 attrd: [25664]: CRIT: attrd_ais_destroy: Lost 
connection to OpenAIS service!
Jul 16 17:38:11 www01 attrd: [25664]: info: main: Exiting...
Jul 16 17:38:11 www01 crmd: [25666]: info: cib_native_msgready: Lost 
connection to the CIB service [25662].
Jul 16 17:38:11 www01 crmd: [25666]: CRIT: cib_native_dispatch: Lost 
connection to the CIB service [25662/callback].
Jul 16 17:38:11 www01 crmd: [25666]: CRIT: cib_native_dispatch: Lost 
connection to the CIB service [25662/command].
Jul 16 17:38:11 www01 attrd: [25664]: ERROR: 
attrd_cib_connection_destroy: Connection to the CIB terminated...
Jul 16 17:38:11 www01 crmd: [25666]: ERROR: crmd_cib_connection_destroy: 
Connection to the CIB terminated...
Jul 16 17:38:11 www01 crmd: [25666]: ERROR: ais_dispatch: Receiving 
message body failed: (2) Library error: Invalid argument (22)
Jul 16 17:38:11 www01 crmd: [25666]: ERROR: ais_dispatch: AIS connection 
failed
Jul 16 17:38:11 www01 crmd: [25666]: ERROR: crm_ais_destroy: AIS 
connection terminated

sorry for this looooooonnnggg post, but i tried to be the most exhaustive


Thanks for your help if possible
Guillaume
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100716/12c34936/attachment.html>


More information about the Pacemaker mailing list