[Pacemaker] Cluster Volume Group is stuck

Karl Rößmann K.Roessmann at fkf.mpg.de
Thu May 12 09:16:52 EDT 2011


This is an Update to my last Mail:

SBD is running on one Node normally:

Online: [ multix246 multix244 multix245 ]

  Clone Set: dlm_clone [dlm]
      Started: [ multix244 multix245 multix246 ]
  Clone Set: clvm_clone [clvm]
      Started: [ multix244 multix245 multix246 ]
  Clone Set: vgsmet_clone [vgsmet]
      Started: [ multix244 multix245 multix246 ]
  smetserv       (ocf::heartbeat:Xen):   Started multix244
  SBD_Stonith    (stonith:external/sbd): Started multix245  <----------

but after powering off Node multix246, it is running on two nodes:


Node multix246: UNCLEAN (offline)
Online: [ multix244 multix245 ]

  Clone Set: dlm_clone [dlm]
      Started: [ multix244 multix245 ]
      Stopped: [ dlm:2 ]
  Clone Set: clvm_clone [clvm]
      Started: [ multix244 multix245 ]
      Stopped: [ clvm:2 ]
  Clone Set: vgsmet_clone [vgsmet]
      Started: [ multix244 multix245 ]
      Stopped: [ vgsmet:2 ]
  smetserv       (ocf::heartbeat:Xen):   Started multix244
  SBD_Stonith    (stonith:external/sbd) Started [ multix245  multix246 ] <-----

If I power on the lost node again, everything recovers: only one SBD
is running, and:

  sbd -d /dev/disk/by-id/scsi-3600a0b8000420d5a00001cf14dc3a9a2-part1 list
0       multix244       clear
1       multix245       clear
2       multix246       clear



> On 2011-05-12T09:51:21, Karl Rößmann <K.Roessmann at fkf.mpg.de> wrote:
>
>> Hi David,
>>
>>
>> startup-fencing is true
>> stonith is enabled
>> stonith-timeout is 60s
>> stonith-action is reboot
>>
>> We have a Fibre Channel SAN with multipath driver as common device
>> for the Volume Groups.
>>
>> I have SBD Stonith
>> --------------- This is the SBD Setting: --------------------------
>>
>> multix244:~ # sbd -d
>> /dev/disk/by-id/scsi-3600a0b8000420d5a00001cf14dc3a9a2-part1 dump
>> Header version     : 2
>> Number of slots    : 255
>> Sector size        : 512
>> Timeout (watchdog) : 60
>> Timeout (allocate) : 2
>> Timeout (loop)     : 1
>> Timeout (msgwait)  : 120
>>
>> on a similar cluster with iSCSI device and no multipath driver
>> there is no problem.
>
> Is the sbd daemon running on all nodes?
>
> What does "sbd -d ... list" show?
>
>
> Regards,
>     Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix  
> Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:  
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Karl Rößmann				Tel. +49-711-689-1657
Max-Planck-Institut FKF       		Fax. +49-711-689-1632
Postfach 800 665
70506 Stuttgart				email K.Roessmann at fkf.mpg.de




More information about the Pacemaker mailing list