[Pacemaker] Corosync & IPAddr problems(?)

Dejan Muhamedagic dejanmm at fastmail.fm
Mon Feb 7 10:36:46 EST 2011


Hi,

On Mon, Feb 07, 2011 at 02:01:11PM +0100, Stephan-Frank Henry wrote:
> Hello again,
> 
> I am having some possible problems with Corosync and IPAddr.
> To be more specific, when I do a /etc/init.d/corosync stop, while everything shuts down more or less gracefully, the virtual ip never is released (still visible with ifconfig).
> 
> if I do a 'sudo ifdown --force eth0:0' it works. So there should be no direct reason for this.
> 
> This might not by itself be a problem, but I fear it could also be related to a 'split-brain' corosync handling due to network cable disconnect.
> Though that might be something else, I'd rather remove all other problems and then see if it fixes itself.
> 
> I have checked syslog, but nothing really jumps out.
> Are there any other logs or places where I can look?
> 
> thanks everyone!
> 
> Frank
> 
> (pls scream if more or other info is needed)
> 
> -------------------------------------------------------------
> 
> OS: Debian Lenny 64bit, kernel version: 2.6.33.3
> Corosnyc: 1.2.1-1~bpo50+1
> cluster-glue: 1.0.6-1~bpo50+1
> libheartbeat2: 1:3.0.3-2~bpo50+1
> 
> relevant cib.xml entry:
> <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
>   <instance_attributes id="virtual-ip-attribs">
>     <attributes>
>       <nvpair id="virtual-ip-addr" name="ip" value="150.158.183.30"/>
>       <nvpair id="virtual-ip-addr-nic" name="nic" value="eth0"/>
>       <nvpair id="virtual-ip-addr-netmask" name="cidr_netmask" value="22"/>
>     </attributes>
>   </instance_attributes>
>   <operations>
>     <op id="virtual-ip-monitor-10s" interval="10s" name="monitor"/>
>   </operations>
> </primitive>
> 
> here is a reduced log (only the ip stuff):
> Feb  7 13:39:40 serverA pengine: [8695]: notice: unpack_rsc_op: Operation ip_resource_monitor_0 found resource ip_resource active on serverA
> Feb  7 13:39:40 serverA pengine: [8695]: notice: native_print:      ip_resource#011(ocf::heartbeat:IPaddr):#011Started serverA
> Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights: ms_drbd0: Rolling back scores from ip_resource
> Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights: ms_drbd0: Rolling back scores from ip_resource
> Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights: ip_resource: Rolling back scores from fs0
> Feb  7 13:39:40 serverA pengine: [8695]: info: native_color: Resource ip_resource cannot run anywhere
> Feb  7 13:39:40 serverA pengine: [8695]: notice: LogActions: Stop resource ip_resource#011(serverA)
> Feb  7 13:39:40 serverA crmd: [8696]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Feb  7 13:39:42 serverA crmd: [8696]: info: te_rsc_command: Initiating action 33: stop ip_resource_stop_0 on serverA (local)
> Feb  7 13:39:42 serverA lrmd: [8693]: info: cancel_op: operation monitor[7] on ocf::IPaddr::ip_resource for client 8696, its parameters: CRM_meta_interval=[10000] ip=[150.158.183.30] 
> Feb  7 13:39:42 serverA crmd: [8696]: info: do_lrm_rsc_op: Performing key=33:13:0:0dff3321-22f5-411c-a50a-e95fcfa4dd6f op=ip_resource_stop_0 )
> Feb  7 13:39:42 serverA lrmd: [8693]: info: rsc:ip_resource:14: stop
> Feb  7 13:39:42 serverA crmd: [8696]: info: process_lrm_event: LRM operation ip_resource_monitor_10000 (call=7, status=1, cib-update=0, confirmed=true) Cancelled
> Feb  7 13:40:02 serverA lrmd: [8693]: WARN: ip_resource:stop process (PID 10541) timed out (try 1).  Killing with signal SIGTERM (15).

The stop action times out. You should check why. Note that
ifdown ... is not what IPaddr uses, but ifconfig down. You can
also test the resource using ocf-tester outside of cluster.

Thanks,

Dejan

> Feb  7 13:40:02 serverA lrmd: [8693]: WARN: operation stop[14] on ocf::IPaddr::ip_resource for client 8696, its parameters: ip=[150.158.183.30] cidr_netmask=[22] CRM_meta_timeout=[20000] 
> Feb  7 13:40:02 serverA lrmd: [8693]: info: record_op_completion: cannot record operation stop[14] on ocf::IPaddr::ip_resource for client 8696: the client is gone
> Feb  7 13:40:02 serverA lrmd: [8693]: WARN: notify_client: client for the operation operation stop[14] on ocf::IPaddr::ip_resource for client 8696, its parameters: ip=[150.158.183.30] 
> 
> -- 
> Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
> Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list