[Pacemaker] Corosync & IPAddr problems(?)

Stephan-Frank Henry Frank.Henry at gmx.net
Wed Feb 9 08:28:38 EST 2011


Howdy,

> On Mon, 7 Feb 2011 16:36:46 +0100, Dejan Muhamedagic wrote:> Hi,
> 
> On Mon, Feb 07, 2011 at 02:01:11PM +0100, Stephan-Frank Henry wrote:
> > Hello again,
> > 
> > I am having some possible problems with Corosync and IPAddr.
> > To be more specific, when I do a /etc/init.d/corosync stop, while
> everything shuts down more or less gracefully, the virtual ip never is released
> (still visible with ifconfig).
> > 
> > if I do a 'sudo ifdown --force eth0:0' it works. So there should be no
> direct reason for this.
> > 
> > This might not by itself be a problem, but I fear it could also be
> related to a 'split-brain' corosync handling due to network cable disconnect.
> > Though that might be something else, I'd rather remove all other
> problems and then see if it fixes itself.
> > 
> > I have checked syslog, but nothing really jumps out.
> > Are there any other logs or places where I can look?
> > 
> > thanks everyone!
> > 
> > Frank
> > 
> > (pls scream if more or other info is needed)
> > 
> > -------------------------------------------------------------
> > 
> > OS: Debian Lenny 64bit, kernel version: 2.6.33.3
> > Corosnyc: 1.2.1-1~bpo50+1
> > cluster-glue: 1.0.6-1~bpo50+1
> > libheartbeat2: 1:3.0.3-2~bpo50+1
> > 
> > relevant cib.xml entry:
> > <primitive id="ip_resource" class="ocf" type="IPaddr"
> provider="heartbeat">
> >   <instance_attributes id="virtual-ip-attribs">
> >     <attributes>
> >       <nvpair id="virtual-ip-addr" name="ip" value="150.158.183.30"/>
> >       <nvpair id="virtual-ip-addr-nic" name="nic" value="eth0"/>
> >       <nvpair id="virtual-ip-addr-netmask" name="cidr_netmask"
> value="22"/>
> >     </attributes>
> >   </instance_attributes>
> >   <operations>
> >     <op id="virtual-ip-monitor-10s" interval="10s" name="monitor"/>
> >   </operations>
> > </primitive>
> > 
> > here is a reduced log (only the ip stuff):
> > Feb  7 13:39:40 serverA pengine: [8695]: notice: unpack_rsc_op:
> Operation ip_resource_monitor_0 found resource ip_resource active on serverA
> > Feb  7 13:39:40 serverA pengine: [8695]: notice: native_print:     
> ip_resource#011(ocf::heartbeat:IPaddr):#011Started serverA
> > Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights:
> ms_drbd0: Rolling back scores from ip_resource
> > Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights:
> ms_drbd0: Rolling back scores from ip_resource
> > Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights:
> ip_resource: Rolling back scores from fs0
> > Feb  7 13:39:40 serverA pengine: [8695]: info: native_color: Resource
> ip_resource cannot run anywhere
> > Feb  7 13:39:40 serverA pengine: [8695]: notice: LogActions: Stop
> resource ip_resource#011(serverA)
> > Feb  7 13:39:40 serverA crmd: [8696]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> > Feb  7 13:39:42 serverA crmd: [8696]: info: te_rsc_command: Initiating
> action 33: stop ip_resource_stop_0 on serverA (local)
> > Feb  7 13:39:42 serverA lrmd: [8693]: info: cancel_op: operation
> monitor[7] on ocf::IPaddr::ip_resource for client 8696, its parameters:
> CRM_meta_interval=[10000] ip=[150.158.183.30] 
> > Feb  7 13:39:42 serverA crmd: [8696]: info: do_lrm_rsc_op: Performing
> key=33:13:0:0dff3321-22f5-411c-a50a-e95fcfa4dd6f op=ip_resource_stop_0 )
> > Feb  7 13:39:42 serverA lrmd: [8693]: info: rsc:ip_resource:14: stop
> > Feb  7 13:39:42 serverA crmd: [8696]: info: process_lrm_event: LRM
> operation ip_resource_monitor_10000 (call=7, status=1, cib-update=0,
> confirmed=true) Cancelled
> > Feb  7 13:40:02 serverA lrmd: [8693]: WARN: ip_resource:stop process
> (PID 10541) timed out (try 1).  Killing with signal SIGTERM (15).
> 
> The stop action times out. You should check why. Note that
> ifdown ... is not what IPaddr uses, but ifconfig down. You can
> also test the resource using ocf-tester outside of cluster.

Yeah, I had seen that but was at a loss to what the cause was.
Would there have been any way to find out what the reasons were?

For now I followed Shravan's suggestion and switched to IPaddr2. I had it in in my first versions, but the interface did not show up in ifconfig.
After some googling I also added the iflabel. Yay.

Now everything is working good, I just have some 'issues' with how Corosync & Drbd work, or rather my expectations and how they might differ from my config. :D
I'll write a new post for this.

Thanks again!

Frank
-- 
NEU: FreePhone - kostenlos mobil telefonieren und surfen!			
Jetzt informieren: http://www.gmx.net/de/go/freephone




More information about the Pacemaker mailing list