No subject
     
    
       
    Sun Apr  3 06:52:37 UTC 2011
    
    
  
> sbd -d /dev/disk/by-id/scsi-3600a0b8000420d5a00001cf14dc3a9a2-part1 list
> 0       multix244       clear
> 1       multix245       clear
> 2       multix246       reset   multix245
suggests that multix246 actually was sent the request; and thus, should
be considered 'fenced' by the remaining cluster.
Looking back in your mails further:
>> /dev/disk/by-id/scsi-3600a0b8000420d5a00001cf14dc3a9a2-part1 dump
>> Header version     : 2
>> Number of slots    : 255
>> Sector size        : 512
>> Timeout (watchdog) : 60
>> Timeout (allocate) : 2
>> Timeout (loop)     : 1
>> Timeout (msgwait)  : 120
You've set extremely long timeouts for the watchdog, and in particular
for the msgwait - this means that a fence will only be considered
completed after 120s by sbd. At the same time, you've set
stonith-timeout to 60s, so if the fence takes longer than that, it'll be
considered failed.
You've set up your cluster so that it can never complete a successful
fence - congratulations! ;-)
If you've got a legitimate reason for setting the msgwait timeout to
120s, you need to set the stonith-timeout to >120s - 140s, for example.
Regards,
    Lars
-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
    
    
More information about the Pacemaker
mailing list