[Pacemaker] Master/Slave resource cannot start

Diego Remolina diego.remolina at physics.gatech.edu
Wed Aug 12 08:13:23 EDT 2009


> Can you define "not correctly" please?
> I'd rather not ignore such behavior.

The machine would come up and not join the cluster. Checking the status 
of openais would show as "Running". crm status would show:

Connection to cluster failed: connection failed

A look at the log file shows:

Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] AIS Executive Service 
RELEASE 'subrev 1152 version 0.80'
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] Copyright (C) 
2002-2006 MontaVista Software, Inc and contributors.
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] Copyright (C) 2006 
Red Hat, Inc.
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] AIS Executive 
Service: started and ready to provide service.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Token Timeout (3000 
ms) retransmit timeout (294 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] token hold (225 ms) 
retransmits before loss (10 retrans)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] join (60 ms) 
send_join (0 ms) consensus (1500 ms) merge (200 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] downcheck (1000 ms) 
fail to recv const (50 msgs)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] seqno unchanged const 
(30 rotations) Maximum network MTU 1500
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] window size per 
rotation (50 messages) maximum messages per rotation (20 messages)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] send threads (0 threads)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP token expired 
timeout (294 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP token problem 
counter (2000 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP threshold (10 
problem count)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP mode set to passive.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] 
heartbeat_failures_allowed (0)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] max_network_delay (50 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] HeartBeat is 
Disabled. To enable set heartbeat_failures_allowed > 0
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Receive multicast 
socket recv buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Transmit multicast 
socket send buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] The network interface 
[10.0.0.22] is now up.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Created or loaded 
sequence id 112.10.0.0.22 for this ring.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Receive multicast 
socket recv buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Transmit multicast 
socket send buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] The network interface 
[10.0.1.22] is now up.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] entering GATHER state 
from 15.
Aug 12 07:57:17 phys-file02 openais[9380]: [crm  ] info: 
process_ais_conf: Reading configure
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: 
config_find_next: Processing additional logging options...
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt: 
Found 'on' for option: debug
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt: 
Defaulting to 'off' for option: to_file
Aug 12 07:57:21 phys-file02 crm_shadow: [9396]: info: Invoked: crm_shadow

I try to stop ais but it fails, the dots just keep appearing on the stop 
command progress:

[root at phys-file02 log]# /etc/init.d/openais stop
Stopping OpenAIS daemon (aisexec): 
..............................................

I have to Ctrl+C out of it and then

[root at phys-file02 log]# pkill -9 aisexec
[root at phys-file02 log]# ps -ef | grep ais
root      9639  5760  0 08:01 pts/1    00:00:00 grep ais

Then I start openais again and crm starts correctly.

[root at phys-file02 log]# /etc/init.d/openais start
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[root at phys-file02 log]# crm status


============
Last updated: Wed Aug 12 08:01:33 2009
Stack: openais
Current DC: phys-file01.physics.gatech.edu - partition with quorum
Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ phys-file01.physics.gatech.edu phys-file02.physics.gatech.edu ]

Master/Slave Set: ms-drbd_export
         Masters: [ phys-file01.physics.gatech.edu ]
         Slaves: [ phys-file02.physics.gatech.edu ]
Master/Slave Set: ms-drbd_scratch
         Masters: [ phys-file01.physics.gatech.edu ]
         Slaves: [ phys-file02.physics.gatech.edu ]
Resource Group: fileserver
     fs_export   (ocf::heartbeat:Filesystem):    Started 
phys-file01.physics.gatech.edu
     fs_scratch  (ocf::heartbeat:Filesystem):    Started 
phys-file01.physics.gatech.edu
     virtual-ip-1        (ocf::heartbeat:IPaddr2):       Started 
phys-file01.physics.gatech.edu
     nfs (lsb:nfs):      Started phys-file01.physics.gatech.edu
     samba       (lsb:smb):      Started phys-file01.physics.gatech.edu
Clone Set: pingd-clone
         Started: [ phys-file01.physics.gatech.edu 
phys-file02.physics.gatech.edu ]

I am not quite sure how to fix this to guarantee that openais always 
starts crm correctly. My drbd interfaces are bonded, but they are set to 
mode 2 which is failover, no round robing nor teaming, etc.

[root at phys-file02 log]# cat /proc/net/bonding/bond1 | grep Mode
Bonding Mode: fault-tolerance (active-backup)
[root at phys-file02 log]# cat /proc/net/bonding/bond2 | grep Mode
Bonding Mode: fault-tolerance (active-backup)
[root at phys-file02 log]# ifconfig bond1 | grep "inet addr"
           inet addr:10.0.0.22  Bcast:10.0.0.255  Mask:255.255.255.0
[root at phys-file02 log]# ifconfig bond2 | grep "inet addr"
           inet addr:10.0.1.22  Bcast:10.0.1.255  Mask:255.255.255.0

[root at phys-file02 log]# grep addr /etc/ais/openais.conf
                 bindnetaddr: 10.0.0.0
                 mcastaddr: 226.94.0.1
                 bindnetaddr: 10.0.1.0
                 mcastaddr: 226.94.1.1

On the other node:

[root at phys-file01 ~]# /etc/init.d/openais restart
Stopping OpenAIS daemon (aisexec): ..........OK
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[root at phys-file01 ~]# crm status

Connection to cluster failed: connection failed

Diego




More information about the Pacemaker mailing list