[Pacemaker] Failover when storage fails

Sat May 14 06:14:49 EDT 2011

On 13/05/11 18:54, Max Williams wrote:
> Well this is not what I am seeing here. Perhaps a bug?
> I also tried adding "op stop interval=0 timeout=10" to the LVM
> resources but still when the storage disappears the cluster just
> stops where it is and those log entries (below) just get printed
> in a loop.
> Cheers,
> Max

OK, that's just weird (unless I'm missing something - anyone else seen 
this?).  Do you mind sending me an hb_report tarball (offlist)?  I'd 
suggest starting everything up cleanly, knocking the storage over, 
waiting a few minutes, then getting the hb_report for that entire time 
period.

Regards,

Tim

>
> -----Original Message-----
> From: Tim Serong [mailto:tserong at novell.com]
> Sent: 13 May 2011 04:22
> To: The Pacemaker cluster resource manager (pacemaker at oss.clusterlabs.org)
> Subject: Re: [Pacemaker] Failover when storage fails
>
> On 5/12/2011 at 02:28 AM, Max Williams<Max.Williams at betfair.com>  wrote:
>> After further testing even with stonith enabled the cluster still gets
>> stuck in this state, presumably waiting on IO. I can get around it by
>> setting "on-fail=fence" on the LVM resources but shouldn't Pacemaker
>> be smart enough to realise the host is effectively offline?
>
> If you've got STONITH enabled, nodes should just get fenced when this occurs, without your having to specify on-fail=fence for the monitor op.
> What *should* happen is, the monitor fails or times out, then pacemaker will try to stop the resource.  If the stop also fails or times out, the node will be fenced.  See:
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-operations.html
>
> Also, http://ourobengr.com/ha#causes is relevant here.
>
> Regards,
>
> Tim
>
>> Or am I missing some timeout
>> value that would fix this situation?
>>
>> pacemaker-1.1.2-7.el6.x86_64
>> corosync-1.2.3-21.el6.x86_64
>> RHEL 6.0
>>
>> Config:
>>
>> node host001.domain \
>>          attributes standby="off"
>> node host002.domain \
>>          attributes standby="off"
>> primitive MyApp_IP ocf:heartbeat:IPaddr \
>>          params ip="192.168.104.26" \
>>          op monitor interval="10s"
>> primitive MyApp_fs_graph ocf:heartbeat:Filesystem \
>>          params device="/dev/VolGroupB00/AppLV2" directory="/naab1"
>> fstype="ext4" \
>>          op monitor interval="10" timeout="10"
>> primitive MyApp_fs_landing ocf:heartbeat:Filesystem \
>>          params device="/dev/VolGroupB01/AppLV1" directory="/naab2"
>> fstype="ext4" \
>>          op monitor interval="10" timeout="10"
>> primitive MyApp_lvm_graph ocf:heartbeat:LVM \
>>          params volgrpname="VolGroupB00" exclusive="yes" \
>>          op monitor interval="10" timeout="10" on-fail="fence" depth="0"
>> primitive MyApp_lvm_landing ocf:heartbeat:LVM \
>>          params volgrpname="VolGroupB01" exclusive="yes" \
>>          op monitor interval="10" timeout="10" on-fail="fence" depth="0"
>> primitive MyApp_scsi_reservation ocf:heartbeat:sg_persist \
>>          params sg_persist_resource="scsi_reservation0" devs="/dev/dm-6
>> /dev/dm-7" required_devs_nof="2" reservation_type="1"
>> primitive MyApp_init_script lsb:AppInitScript \
>>          op monitor interval="10" timeout="10"
>> primitive fence_host001.domain stonith:fence_ipmilan \
>>          params ipaddr="192.168.16.148" passwd="password" login="root"
>> pcmk_host_list="host001.domain" pcmk_host_check="static-list" \
>>          meta target-role="Started"
>> primitive fence_host002.domain stonith:fence_ipmilan \
>>          params ipaddr="192.168.16.149" passwd="password" login="root"
>> pcmk_host_list="host002.domain" pcmk_host_check="static-list" \
>>          meta target-role="Started"
>> group MyApp_group MyApp_lvm_graph MyApp_lvm_landing MyApp_fs_graph
>> MyApp_fs_landing MyApp_IP MyApp_init_script \
>>          meta target-role="Started" migration-threshold="2" on-fail="restart"
>> failure-timeout="300s"
>> ms ms_MyApp_scsi_reservation MyApp_scsi_reservation \
>>          meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>> colocation MyApp_group_on_scsi_reservation inf: MyApp_group
>> ms_MyApp_scsi_reservation:Master order
>> MyApp_group_after_scsi_reservation inf:
>> ms_MyApp_scsi_reservation:promote MyApp_group:start property
>> $id="cib-bootstrap-options" \
>>          dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
>>          cluster-infrastructure="openais" \
>>          expected-quorum-votes="2" \
>>          no-quorum-policy="ignore" \
>>          stonith-enabled="true" \
>>          last-lrm-refresh="1305129673"
>> rsc_defaults $id="rsc-options" \
>>          resource-stickiness="1"
>>
>>
>>
>>
>>
>> From: Max Williams [mailto:Max.Williams at betfair.com]
>> Sent: 11 May 2011 13:55
>> To: The Pacemaker cluster resource manager
>> (pacemaker at oss.clusterlabs.org)
>> Subject: [Pacemaker] Failover when storage fails
>>
>> Hi,
>> I want to configure pacemaker to failover a group of resources and
>> sg_persist (master/slave) when there is a problem with the storage but
>> when I cause the iSCSI LUN to disappear simulating a failure, the
>> cluster always gets stuck in this state:
>>
>> Last updated: Wed May 11 10:52:43 2011
>> Stack: openais
>> Current DC: host001.domain - partition with quorum
>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
>> 2 Nodes configured, 2 expected votes
>> 4 Resources configured.
>> ============
>>
>> Online: [ host002.domain host001.domain ]
>>
>> fence_host002.domain     (stonith:fence_ipmilan):        Started
>> host001.domain
>> fence_host001.domain     (stonith:fence_ipmilan):        Started
>> host001.domain
>> Resource Group: MyApp_group
>>       MyApp_lvm_graph    (ocf::heartbeat:LVM):   Started host002.domain
>> FAILED
>>       MyApp_lvm_landing  (ocf::heartbeat:LVM):   Started host002.domain
>> FAILED
>>       MyApp_fs_graph     (ocf::heartbeat:Filesystem):    Started
>> host002.domain
>>       MyApp_fs_landing   (ocf::heartbeat:Filesystem):    Started
>> host002.domain
>>       MyApp_IP   (ocf::heartbeat:IPaddr):        Stopped
>>       MyApp_init_script   (lsb:abworkload):              Stopped
>> Master/Slave Set: ms_MyApp_scsi_reservation
>>       Masters: [ host002.domain ]
>>       Slaves: [ host001.domain ]
>>
>> Failed actions:
>>      MyApp_lvm_graph_monitor_10000 (node=host002.domain, call=129,
>> rc=-2, status=Timed Out): unknown exec error
>>      MyApp_lvm_landing_monitor_10000 (node=host002.domain, call=130,
>> rc=-2, status=Timed Out): unknown exec error
>>
>> This is printed over and over in the logs:
>>
>> May 11 12:34:56 host002 lrmd: [2561]: info: perform_op:2884: operation
>> stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its
>> parameters: fstype=[ext4] crm_feature_set=[3.0.2]
>> device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000]
>> directory=[/naab1]  for rsc is already running.
>> May 11 12:34:56 host002 lrmd: [2561]: info: perform_op:2894:
>> postponing all ops on resource MyApp_fs_graph by 1000 ms May 11
>> 12:34:57 host002 lrmd: [2561]: info: perform_op:2884: operation
>> stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its
>> parameters: fstype=[ext4] crm_feature_set=[3.0.2]
>> device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000]
>> directory=[/naab1]  for rsc is already running.
>> May 11 12:34:57 host002 lrmd: [2561]: info: perform_op:2894:
>> postponing all ops on resource MyApp_fs_graph by 1000 ms May 11
>> 12:34:58 host002 lrmd: [2561]: info: perform_op:2884: operation
>> stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its
>> parameters: fstype=[ext4] crm_feature_set=[3.0.2]
>> device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000]
>> directory=[/naab1]  for rsc is already running.
>> May 11 12:34:58 host002 lrmd: [2561]: info: perform_op:2894:
>> postponing all ops on resource MyApp_fs_graph by 1000 ms May 11
>> 12:34:58 host002 lrmd: [2561]: WARN: MyApp_lvm_graph:monitor process
>> (PID 1938) timed out (try 1).  Killing with signal SIGTERM (15).
>> May 11 12:34:58 host002 lrmd: [2561]: WARN: MyApp_lvm_landing:monitor
>> process (PID 1939) timed out (try 1).  Killing with signal SIGTERM (15).
>> May 11 12:34:58 host002 lrmd: [2561]: WARN: operation monitor[190] on
>> ocf::LVM::MyApp_lvm_graph for client 31850, its parameters:
>> CRM_meta_depth=[0] depth=[0] exclusive=[yes] crm_feature_set=[3.0.2]
>> volgrpname=[VolGroupB00] CRM_meta_on_fail=[standby]
>> CRM_meta_name=[monitor] CRM_meta_interval=[10000]
>> CRM_meta_timeout=[10000] : pid [1938] timed out May 11 12:34:58
>> host002 lrmd: [2561]: WARN: operation monitor[191] on ocf::LVM::MyApp_lvm_landing for client 31850, its parameters:
>> CRM_meta_depth=[0] depth=[0] exclusive=[yes] crm_feature_set=[3.0.2]
>> volgrpname=[VolGroupB01] CRM_meta_on_fail=[standby]
>> CRM_meta_name=[monitor] CRM_meta_interval=[10000]
>> CRM_meta_timeout=[10000] : pid [1939] timed out May 11 12:34:58
>> host002 crmd: [31850]: ERROR: process_lrm_event: LRM operation
>> MyApp_lvm_graph_monitor_10000 (190) Timed Out (timeout=10000ms) May 11
>> 12:34:58 host002 crmd: [31850]: ERROR: process_lrm_event: LRM
>> operation MyApp_lvm_landing_monitor_10000 (191) Timed Out
>> (timeout=10000ms) May 11 12:34:59 host002 lrmd: [2561]: info:
>> perform_op:2884: operation stop[202] on
>> ocf::Filesystem::MyApp_fs_graph for client 31850, its
>> parameters: fstype=[ext4] crm_feature_set=[3.0.2]
>> device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000]
>> directory=[/naab1]  for rsc is already running.
>> May 11 12:34:59 host002 lrmd: [2561]: info: perform_op:2894:
>> postponing all ops on resource MyApp_fs_graph by 1000 ms
>>
>> And I noticed there are about 100 vgdisplay processes running in D state.
>>
>> How can I configure Pacemaker so the other host forces sg_persist to
>> be a master and then just takes the whole resource group without fencing?
>>
>> I've tried "on-fail=standby" or "migration-threshold=0" but it just
>> always gets stuck in this state. If I reconnect the LUN everything
>> resumes and it instantly fails over but this is less than ideal.
>>
>> Thanks,
>> Max
>>
>>
>>
>>
>>
>>
>>
>> ______________________________________________________________________
>> __ In order to protect our email recipients, Betfair Group use SkyScan
>> from MessageLabs to scan all Incoming and Outgoing mail for viruses.
>>
>> ______________________________________________________________________
>> __
>>
>> ______________________________________________________________________
>> __ In order to protect our email recipients, Betfair Group use SkyScan
>> from MessageLabs to scan all Incoming and Outgoing mail for viruses.
>>
>> ______________________________________________________________________
>> __
>
>
>
>
> --
> Tim Serong<tserong at novell.com>
> Senior Clustering Engineer, OPS Engineering, Novell Inc.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> ________________________________________________________________________
> In order to protect our email recipients, Betfair Group use SkyScan from
> MessageLabs to scan all Incoming and Outgoing mail for viruses.
>
> ________________________________________________________________________
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

-- 
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.