[Pacemaker] R: R: R: R: Stonith external/sbd problem

Mon May 10 07:21:03 UTC 2010

Hi,

I have solved my problem.

I find a little problem in the script ‘/usr/lib64/stonith/plugins/external/sbd’ when it retrieve the hosts list.

I substitute this lines:

nodes=$(

if is_heartbeat; then

    crm_node -H -p

else

    crm_node -p

fi)

Whit these:

if is_heartbeat; then

    nodes=$(crm_node -H -p)

else

    nodes=$(crm_node -p)

fi

and now the resource ‘external/sbd’ function very well.

Best regards, Nicola.

  _____  

Da: Michael Brown [mailto:michael at netdirect.ca] 
Inviato: giovedì 29 aprile 2010 16.53
A: n.sabatelli at ct.rupar.puglia.it
Oggetto: Re: R: [Pacemaker] R: R: Stonith external/sbd problem

Hrm, my limited knowledge is exhausted. Good luck!

M.

  _____  

From: Nicola Sabatelli 
To: 'Michael Brown' 
Sent: Thu Apr 29 10:36:15 2010
Subject: R: [Pacemaker] R: R: Stonith external/sbd problem

The response to a query

/usr/sbin/sbd -d /dev/mapper/mpath1p1 list

is

0       clover-a.rsr.rupar.puglia.it    clear

1       clover-h.rsr.rupar.puglia.it    clear

Ciao, Nicola.

  _____  

Da: Michael Brown [mailto:michael at netdirect.ca] 
Inviato: giovedì 29 aprile 2010 16.33
A: The Pacemaker cluster resource manager
Cc: Nicola Sabatelli
Oggetto: Re: [Pacemaker] R: R: Stonith external/sbd problem

FWIW, here's my setup for sbd on shared storage:

in /etc/init.d/boot.local:
sbd -d /dev/disk/by-id/dm-uuid-part2-mpath-3600a0b8000266f7e000035414bd00428 -D -W watch

xenhost1:~ # sbd -d /dev/disk/by-id/dm-uuid-part2-mpath-3600a0b8000266f7e000035414bd00428 list
0       xenhost1        clear
1       xenhost2        clear

excerpt from 'crm configure show':
primitive sbd stonith:external/sbd \
        operations $id="sbd-operations" \
        op monitor interval="15" timeout="15" start-delay="15" \
        params sbd_device="/dev/disk/by-id/dm-uuid-part2-mpath-3600a0b8000266f7e000035414bd00428"
clone sbd-clone sbd \
        meta interleave="true"

What do you see if you run '/usr/sbin/sbd -d /dev/mapper/mpath1p1 list'?

M.

On 04/29/2010 10:23 AM, Nicola Sabatelli wrote: 

Yes, I create the disk and allocate the node, and I create a resource on cluster in this way:

<clone id="cl_external_sbd_1">

        <meta_attributes id="cl_external_sbd_1-meta_attributes">

          <nvpair id="cl_external_sbd_1-meta_attributes-clone-max" name="clone-max" value="2"/>

        </meta_attributes>

        <primitive class="stonith" type="external/sbd" id="stonith_external_sbd_LOCK_LUN">

          <instance_attributes id="stonith_external_sbd_LOCK_LUN-instance_attributes">

            <nvpair id="nvpair-stonith_external_sbd_LOCK_LUN-sbd_device" name="sbd_device" value="/dev/mapper/mpath1p1"/>

          </instance_attributes>

          <operations id="stonith_external_sbd_LOCK_LUN-operations">

            <op id="op-stonith_external_sbd_LOCK_LUN-stop" interval="0" name="stop" timeout="60"/>

            <op id="op-stonith_external_sbd_LOCK_LUN-monitor" interval="60" name="monitor" start-delay="0" timeout="60"/>

            <op id="op-stonith_external_sbd_LOCK_LUN-start" interval="0" name="start" timeout="60"/>

          </operations>

          <meta_attributes id="stonith_external_sbd_LOCK_LUN-meta_attributes">

            <nvpair name="target-role" id="stonith_external_sbd_LOCK_LUN-meta_attributes-target-role" value="stopped"/>

          </meta_attributes>

        </primitive>

      </clone>

Ciao, Nicola.

  _____  

Da: Vit Pelcak [mailto:vpelcak at suse.cz] 
Inviato: giovedì 29 aprile 2010 16.08
A: pacemaker at oss.clusterlabs.org
Oggetto: Re: [Pacemaker] R: Stonith external/sbd problem

Also, it is needed to add stonith to cib:

crm configure primitive sbd_stonith stonith:external/sbd meta target-role="Started" op monitor interval="15" timeout="15" start-delay="15" params sbd_device="/dev/sda1"

Dne 29.4.2010 15:46, Nicola Sabatelli napsal(a): 

I have done exactly the configuration in the SBD_Fencing documentation.

That is:

/etc/sysconfig/sbd

SBD_DEVICE="/dev/mapper/mpath1p1"

SBD_OPTS="-W"

And I start the demon in this manner:

/usr/sbin/sbd -d /dev/mapper/mpath1p1 -D -W watch

Is correct?

Ciao, Nicola.

  _____  

Da: Vit Pelcak [mailto:vpelcak at suse.cz] 
Inviato: giovedì 29 aprile 2010 15.02
A: pacemaker at oss.clusterlabs.org
Oggetto: Re: [Pacemaker] Stonith external/sbd problem

cat /etc/sysconfig/sbd

SBD_DEVICE="/dev/sda1"
SBD_OPTS="-W"

sbd -d /dev/shared_disk create
sbd -d /dev/shared_disk allocate your_machine

Dne 29.4.2010 14:55, Michael Brown napsal(a): 

Oh, I forgot a piece: I had simular trouble until I actually properly started sbd and then it worked.

M.

  _____  

From: Michael Brown 
To: pacemaker at oss.clusterlabs.org 
Sent: Thu Apr 29 08:53:32 2010
Subject: Re: [Pacemaker] Stonith external/sbd problem 

I just set this up myself and it worked fine for me.

Did you follow the guide? You need to configure the sbd daemon to run on bootup with appropriate options before external/sbd can use it.

M.

  _____  

From: Nicola Sabatelli 
To: pacemaker at oss.clusterlabs.org 
Sent: Thu Apr 29 08:47:04 2010
Subject: [Pacemaker] Stonith external/sbd problem 

I have a problem with STONITH plugin external/sbd.

I have configured the system in according to directive that I find at url http://www.linux-ha.org/wiki/SBD_Fencing, and the device that I use is configured with multipath software because this disk is residend on a storage system.

I have create a resurse on my cluster using clove directive.

But when I try to start the resurse I have these errors:

from ha-log file:

Apr 29 14:37:51 clover-h stonithd: [16811]: info: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/sbd status' returned 256

Apr 29 14:37:51 clover-h stonithd: [16811]: CRIT: external_status: 'sbd status' failed with rc 256

Apr 29 14:37:51 clover-h stonithd: [10615]: WARN: start stonith_external_sbd_LOCK_LUN:0 failed, because its hostlist is empty

from crm_verify:

crm_verify[18607]: 2010/04/29_14:39:27 info: main: =#=#=#=#= Getting XML =#=#=#=#=

crm_verify[18607]: 2010/04/29_14:39:27 info: main: Reading XML from: live cluster

crm_verify[18607]: 2010/04/29_14:39:27 notice: unpack_config: On loss of CCM Quorum: Ignore

crm_verify[18607]: 2010/04/29_14:39:27 info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0

crm_verify[18607]: 2010/04/29_14:39:27 info: determine_online_status: Node clover-a.rsr.rupar.puglia.it is online

crm_verify[18607]: 2010/04/29_14:39:27 WARN: unpack_rsc_op: Processing failed op stonith_external_sbd_LOCK_LUN:1_start_0 on clover-a.rsr.rupar.puglia.it: unknown error (1)

crm_verify[18607]: 2010/04/29_14:39:27 info: find_clone: Internally renamed stonith_external_sbd_LOCK_LUN:0 on clover-a.rsr.rupar.puglia.it to stonith_external_sbd_LOCK_LUN:2 (ORPHAN)

crm_verify[18607]: 2010/04/29_14:39:27 info: determine_online_status: Node clover-h.rsr.rupar.puglia.it is online

crm_verify[18607]: 2010/04/29_14:39:27 WARN: unpack_rsc_op: Processing failed op stonith_external_sbd_LOCK_LUN:0_start_0 on clover-h.rsr.rupar.puglia.it: unknown error (1)

crm_verify[18607]: 2010/04/29_14:39:27 notice: clone_print:  Master/Slave Set: ms_drbd_1

crm_verify[18607]: 2010/04/29_14:39:27 notice: short_print:      Stopped: [ res_drbd_1:0 res_drbd_1:1 ]

crm_verify[18607]: 2010/04/29_14:39:27 notice: native_print: res_Filesystem_TEST        (ocf::heartbeat:Filesystem):    Stopped

crm_verify[18607]: 2010/04/29_14:39:27 notice: native_print: res_IPaddr2_ip_clover      (ocf::heartbeat:IPaddr2):       Stopped

crm_verify[18607]: 2010/04/29_14:39:27 notice: clone_print:  Clone Set: cl_external_sbd_1

crm_verify[18607]: 2010/04/29_14:39:27 notice: native_print:      stonith_external_sbd_LOCK_LUN:0       (stonith:external/sbd): Started clover-h.rsr.rupar.puglia.it FAILED

crm_verify[18607]: 2010/04/29_14:39:27 notice: native_print:      stonith_external_sbd_LOCK_LUN:1       (stonith:external/sbd): Started clover-a.rsr.rupar.puglia.it FAILED

crm_verify[18607]: 2010/04/29_14:39:27 info: get_failcount: cl_external_sbd_1 has failed 1000000 times on clover-h.rsr.rupar.puglia.it

crm_verify[18607]: 2010/04/29_14:39:27 WARN: common_apply_stickiness: Forcing cl_external_sbd_1 away from clover-h.rsr.rupar.puglia.it after 1000000 failures (max=1000000)

crm_verify[18607]: 2010/04/29_14:39:27 info: get_failcount: cl_external_sbd_1 has failed 1000000 times on clover-a.rsr.rupar.puglia.it

crm_verify[18607]: 2010/04/29_14:39:27 WARN: common_apply_stickiness: Forcing cl_external_sbd_1 away from clover-a.rsr.rupar.puglia.it after 1000000 failures (max=1000000)

crm_verify[18607]: 2010/04/29_14:39:27 info: native_merge_weights: ms_drbd_1: Rolling back scores from res_Filesystem_TEST

crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource res_drbd_1:0 cannot run anywhere

crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource res_drbd_1:1 cannot run anywhere

crm_verify[18607]: 2010/04/29_14:39:27 info: native_merge_weights: ms_drbd_1: Rolling back scores from res_Filesystem_TEST

crm_verify[18607]: 2010/04/29_14:39:27 info: master_color: ms_drbd_1: Promoted 0 instances of a possible 1 to master

crm_verify[18607]: 2010/04/29_14:39:27 info: master_color: ms_drbd_1: Promoted 0 instances of a possible 1 to master

crm_verify[18607]: 2010/04/29_14:39:27 info: native_merge_weights: res_Filesystem_TEST: Rolling back scores from res_IPaddr2_ip_clover

crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource res_Filesystem_TEST cannot run anywhere

crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource res_IPaddr2_ip_clover cannot run anywhere

crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource stonith_external_sbd_LOCK_LUN:0 cannot run anywhere

crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource stonith_external_sbd_LOCK_LUN:1 cannot run anywhere

crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Leave resource res_drbd_1:0  (Stopped)

crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Leave resource res_drbd_1:1  (Stopped)

crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Leave resource res_Filesystem_TEST   (Stopped)

crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Leave resource res_IPaddr2_ip_clover (Stopped)

crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Stop resource stonith_external_sbd_LOCK_LUN:0        (clover-h.rsr.rupar.puglia.it)

crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Stop resource stonith_external_sbd_LOCK_LUN:1        (clover-a.rsr.rupar.puglia.it)

Warnings found during check: config may not be valid

and from crm_mon:

============

Last updated: Thu Apr 29 14:39:57 2010

Stack: Heartbeat

Current DC: clover-h.rsr.rupar.puglia.it (e39bb201-2a6f-457a-a308-be6bfe71309c) - partition with quorum

Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7

2 Nodes configured, unknown expected votes

4 Resources configured.

============

Online: [ clover-h.rsr.rupar.puglia.it clover-a.rsr.rupar.puglia.it ]

 Clone Set: cl_external_sbd_1

     stonith_external_sbd_LOCK_LUN:0    (stonith:external/sbd): Started clover-h.rsr.rupar.puglia.it FAILED

     stonith_external_sbd_LOCK_LUN:1    (stonith:external/sbd): Started clover-a.rsr.rupar.puglia.it FAILED

Operations:

* Node clover-a.rsr.rupar.puglia.it:

   stonith_external_sbd_LOCK_LUN:1: migration-threshold=1000000 fail-count=1000000

    + (24) start: rc=1 (unknown error)

* Node clover-h.rsr.rupar.puglia.it:

   stonith_external_sbd_LOCK_LUN:0: migration-threshold=1000000 fail-count=1000000

    + (25) start: rc=1 (unknown error)

Failed actions:

    stonith_external_sbd_LOCK_LUN:1_start_0 (node=clover-a.rsr.rupar.puglia.it, call=24, rc=1, status=complete): unknown error

    stonith_external_sbd_LOCK_LUN:0_start_0 (node=clover-h.rsr.rupar.puglia.it, call=25, rc=1, status=complete): unknown error

Ciao, Nicola.

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

-- 
Michael Brown               | `One of the main causes of the fall of
Systems Consultant          | the Roman Empire was that, lacking zero,
Net Direct Inc.             | they had no way to indicate successful
☎: +1 519 883 1172 x5106    | termination of their C programs.' - Firth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100510/8abc297c/attachment-0001.html>