[Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

Steven Dake sdake at redhat.com
Thu Aug 4 16:21:37 UTC 2011


On 08/04/2011 05:46 AM, Sebastian Kaps wrote:
> Hello,
> 
> here's another problem we're having:
> 
> Jul 31 03:51:02 node01 corosync[5870]:  [TOTEM ] Process pause detected
> for 11149 ms, flushing membership messages.

This process pause message indicates the scheduler doesn't schedule
corosync for 11 seconds which is greater then the failure detection
timeouts.  What does your config file look like?  What load are you running?

Regards
-steve

> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] CLM CONFIGURATION CHANGE
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] New Configuration:
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ]   r(0) ip(192.168.1.1)
> r(1) ip(x.y.z.3)
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Left:
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ]   r(0) ip(192.168.1.2)
> r(1) ip(x.y.z.1)
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Joined:
> Jul 31 03:51:11 node01 corosync[5870]:  [pcmk  ] notice:
> pcmk_peer_update: Transitional membership event on ring 9708: memb=1,
> new=0, lost=1
> Jul 31 03:51:11 node01 corosync[5870]:  [pcmk  ] info: pcmk_peer_update:
> memb: node01 16885952
> Jul 31 03:51:11 node01 corosync[5870]:  [pcmk  ] info: pcmk_peer_update:
> lost: node02 33663168
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] CLM CONFIGURATION CHANGE
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] New Configuration:
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ]   r(0) ip(192.168.1.1)
> r(1) ip(x.y.z.3)
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Left:
> Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Joined:
> Jul 31 03:51:11 node01 crmd: [5912]: notice: ais_dispatch_message:
> Membership 9708: quorum lost
> 
> Node01 gets Stonith'd shortly after that. There is no indication
> whatsoever that this would happen in the logs.
> For at least half an hour before that there's only the normal
> status-message noise from monitor ops etc.
> 
> Jul 31 03:51:01 node02 corosync[5810]:  [TOTEM ] A processor failed,
> forming new configuration.
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] CLM CONFIGURATION CHANGE
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] New Configuration:
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ]   r(0) ip(192.168.1.2)
> r(1) ip(x.y.z.1)
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Left:
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ]   r(0) ip(192.168.1.1)
> r(1) ip(x.y.z.3)
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Joined:
> Jul 31 03:51:11 node02 corosync[5810]:  [pcmk  ] notice:
> pcmk_peer_update: Transitional membership event on ring 9708: memb=1,
> new=0, lost=1
> Jul 31 03:51:11 node02 corosync[5810]:  [pcmk  ] info: pcmk_peer_update:
> memb: node02 33663168
> Jul 31 03:51:11 node02 corosync[5810]:  [pcmk  ] info: pcmk_peer_update:
> lost: node01 16885952
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] CLM CONFIGURATION CHANGE
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] New Configuration:
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ]   r(0) ip(192.168.1.2)
> r(1) ip(x.y.z.1)
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Left:
> Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Joined:
> 
> What does "Process pause detected" mean?
> 
> Quoting from my other recent post regarding the backup ring being marked
> faulty sporadically:
> 
> |We're running a two-node cluster with redundant rings.
> |Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GB
> interfaces that are bonded in
> |active-backup mode and routed through two independent switches for each
> node. The ring 1 network
> |is our "normal" 1G LAN and should only be used in case the direct 10G
> connection should fail.
> |
> |Corosync Cluster Engine, version '1.3.1'
> |Copyright (c) 2006-2009 Red Hat, Inc.
> |
> |It's the version that comes with SLES11-SP1-HA.
> 
> Thanks in advance!
> 





More information about the Pacemaker mailing list