<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <br>

    <div class="moz-cite-prefix">On 03/21/2016 03:46 PM, Ken Gaillot

      wrote:<br>

    </div>

    <blockquote cite="mid:56F00945.5090007@redhat.com" type="cite">

      <pre wrap="">On 03/21/2016 08:39 AM, marvin wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">

On 03/15/2016 03:39 PM, Ken Gaillot wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">On 03/15/2016 09:10 AM, marvin wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">Hi,

I'm trying to get fence_scsi working, but i get "no such device" error.

It's a two node cluster with nodes called "node01" and "node03". The OS

is RHEL 7.2.

here is some relevant info:

# pcs status

Cluster name: testrhel7cluster

Last updated: Tue Mar 15 15:05:40 2016          Last change: Tue Mar 15

14:33:39 2016 by root via cibadmin on node01

Stack: corosync

Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with

quorum

2 nodes and 23 resources configured

Online: [ node01 node03 ]

Full list of resources:

  Clone Set: dlm-clone [dlm]

      Started: [ node01 node03 ]

  Clone Set: clvmd-clone [clvmd]

      Started: [ node01 node03 ]

  fence-node1    (stonith:fence_ipmilan):        Started node03

  fence-node3    (stonith:fence_ipmilan):        Started node01

  Resource Group: test_grupa

      test_ip    (ocf::heartbeat:IPaddr):        Started node01

      lv_testdbcl        (ocf::heartbeat:LVM):   Started node01

      fs_testdbcl        (ocf::heartbeat:Filesystem):    Started node01

      oracle11_baza      (ocf::heartbeat:oracle):        Started node01

      oracle11_lsnr      (ocf::heartbeat:oralsnr):       Started node01

  fence-scsi-node1       (stonith:fence_scsi):   Started node03

  fence-scsi-node3       (stonith:fence_scsi):   Started node01

PCSD Status:

   node01: Online

   node03: Online

Daemon Status:

   corosync: active/enabled

   pacemaker: active/enabled

   pcsd: active/enabled

# pcs stonith show

  fence-node1    (stonith:fence_ipmilan):        Started node03

  fence-node3    (stonith:fence_ipmilan):        Started node01

  fence-scsi-node1       (stonith:fence_scsi):   Started node03

  fence-scsi-node3       (stonith:fence_scsi):   Started node01

  Node: node01

   Level 1 - fence-scsi-node3

   Level 2 - fence-node3

  Node: node03

   Level 1 - fence-scsi-node1

   Level 2 - fence-node1

# pcs stonith show fence-scsi-node1 --all

  Resource: fence-scsi-node1 (class=stonith type=fence_scsi)

   Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata

pcmk_reboot_action=off

   Meta Attrs: provides=unfencing

   Operations: monitor interval=60s

(fence-scsi-node1-monitor-interval-60s)

# pcs stonith show fence-scsi-node3 --all

  Resource: fence-scsi-node3 (class=stonith type=fence_scsi)

   Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata

pcmk_reboot_action=off

   Meta Attrs: provides=unfencing

   Operations: monitor interval=60s

(fence-scsi-node3-monitor-interval-60s)

node01 # pcs stonith fence node03

Error: unable to fence 'node03'

Command failed: No such device

node01 # tail /var/log/messages

Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Client

stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with

device '(any)'

Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Initiating remote

operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0)

Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-scsi-node3 can

fence (reboot) node03: static-list

Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-node3 can fence

(reboot) node03: static-list

Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: All fencing options

to fence node03 for <a class="moz-txt-link-abbreviated" href="mailto:stonith_admin.29191@node01.d1df9201">stonith_admin.29191@node01.d1df9201</a> failed

</pre>

          </blockquote>

          <pre wrap="">The above line is the key. Both of the devices registered for node03

returned failure. Pacemaker then looked for any other device capable of

fencing node03 and there is none, so that's why it reported "No such

device" (an admittedly obscure message).

It looks like the fence agents require more configuration options than

you have set. If you run "/path/to/fence/agent -o metadata", you can see

the available options. It's a good idea to first get the agent running

successfully manually on the command line ("status" command is usually

sufficient), then put those same options in the cluster configuration.

</pre>

        </blockquote>

        <pre wrap="">Made some progress, found new issue.

So i get the scsi_fence to work, it unfences at start, and fences when i

tell it to.

The problem is when I, for instance, fence node01. It stops pacemaker

but leaves corosync, so node01 is in "pending" state and node03 won't

stop services until node01 is restarted. The keys seem to be handled

correctly.

</pre>

      </blockquote>

      <pre wrap="">

Technically, fence_scsi won't stop pacemaker or corosync, it will just

cut off the node's disk access and let the cluster know it's safe to

recover resources.

I haven't used fence_scsi myself, so I'm not sure of the exact details,

but your configuration needs some changes. The pcmk_host_list option

should list only the one node that the fence device can fence (one

device configured for each node). You need more attributes, such as

"devices" to specify which SCSI devices to cut off, and either "key" or

"nodename" to specify the node key for SCSI reservations.

</pre>

    </blockquote>

    I'll give it a try, but all those should be automagic if you don't

    define them (obviously it detects disk since i don't have them

    defined, and the keys are different on each node). At least that's

    what documentation claims.<br>

    According to <a class="moz-txt-link-freetext" href="https://access.redhat.com/articles/530533">https://access.redhat.com/articles/530533</a><br>

    "Normally <code>fence_scsi</code> will automatically detect which

    devices to manage, by checking which are physical volumes in volume

    groups marked with the "clustered" attribute. "<br>

    And this seems to work.<br>

    <br>

    The other thing is, if i put only single node in pcmk_host_list

    either fence or unfence doesn't work. Additionally, in same

    documentation from redhat:<br>

    "<code>pcmk_host_list="node1.example.com node2.example.com"</code>:

    <code>stonith-ng</code> tries to dynamically determine which nodes

    can fence which nodes, but <code>fence_scsi</code> does not support

    this. Instead, we tell it which nodes are managed by this device.

    All nodes should be listed here."<br>

    Now does this mean all nodes fenced, or all nodes in cluster?<br>

    <br>

    The tests:<br>

    1. changing pcmk_host_list to oposite node fences exactly the same

    as with all nodes listed, yet unfencing doesn't work<br>

    2. changing pcmk_host_list to same node fails at fencing with:<br>

    ERROR:root:Failed: keys cannot be same. You can not fence yourself.<br>

    Failed: keys cannot be same. You can not fence yourself.<br>

    as one would expect.<br>

    <br>

    I can do additional tests with adding keys and disk by hand, but

    those seem to be handled correctly by the agent.<br>

    Only reasonable conclusion i can get from this is that i should use

    this as level two fencing, when the connection between nodes breaks

    and ipmi fencing doesn't confirm (node without power for example),

    to let the live node run the services.<br>

  </body>

</html>