[Pacemaker] server lockup failures

Bernd Schubert bs_lists at aakef.fastmail.fm
Wed Oct 28 12:44:42 UTC 2009


On Wednesday 28 October 2009, Andrew Beekhof wrote:
> On Wed, Oct 28, 2009 at 1:05 PM, Bernd Schubert
> 
> <bs_lists at aakef.fastmail.fm> wrote:
> > Hello,
> >
> > I think there is a severe server failure pacemaker doesn't detect. Over
> > night a Lustre server failed in shrink_icache_memory() and probably it
> > had a lock on dcache_lock. Now this is a global filesystem lock and when
> > a filesystem fails while this is locked, any IO on this system just
> > hangs.
> 
> And the FS in question was / so Pacemaker basically hung?

I couldn't login any more, but my guess is 'yes it hung'. But no, it was not 
the root (/) FS. But if any FS crashes while it holds dcache_lock, any other 
filesystem will hang as well. There is nothing we can do about that except of 
rewriting the linux vfs ;) My question is just what can we do to get Pacemaker 
fixed to stonith that node.


Cheers,
Bernd



-- 
Bernd Schubert
DataDirect Networks




More information about the Pacemaker mailing list