<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Hello Dear Andrei Borzenkov,</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">Thank you very much for your answer.

      I've check the logs all the time, but there are nothing helpful ,

      just a bunch of heartbeat messages.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">Anyway, i've read the book "Packt -

      CentOS High Availability" published in 2015, and got some new

      ideas, and tried out, the situation is something new.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">

      <hr width="100%" size="2"></div>

    <div class="moz-cite-prefix">pcs resource create p_iSCSITarget

      ocf:heartbeat:iSCSITarget implementation="tgt"

      iqn="iqn.2018-08.s-ka.local:disk" tid="1"<br>

      pcs resource create p_iSCSILogicalUnit

      ocf:heartbeat:iSCSILogicalUnit implementation="tgt"

      target_iqn="iqn.2018-08.s-ka.local:disk" lun="10"

      path="/dev/drbd/by-disk/vg0/ipstor0"<br>

      pcs resource group add p_iSCSI ClusterIP p_iSCSITarget

      p_iSCSILogicalUnit <br>

      pcs constraint colocation set ClusterIP p_iSCSITarget 

      p_iSCSILogicalUnit<br>

    </div>

    <div class="moz-cite-prefix">

      <hr width="100%" size="2"><br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">The difference from previous version is

      here: use iqn "iqn.2018-08.s-ka.local:disk" instead of

      "iqn.2018-08.s-ka.local:disk.1", which the last ".1" maybe means

      the "tid".</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">now i have new problem, because the

      resource and tgtd are startet, although i set "colocation

      constraint", the pacemaker always try to start tgtd on another

      node.</div>

    <div class="moz-cite-prefix">how to i solve this? thank you people

      in advance!</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">here the output from "pcs status":</div>

    <div class="moz-cite-prefix">

      <hr width="100%" size="2">[root@drbd0 /]# pcs status<br>

      Cluster name: cluster1<br>

      Stack: corosync<br>

      Current DC: drbd0-ha.s-ka.local (version

      1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum<br>

      Last updated: Wed Oct 24 08:43:29 2018<br>

      Last change: Wed Oct 24 08:43:24 2018 by root via cibadmin on

      drbd0-ha.s-ka.local<br>

      <br>

      2 nodes configured<br>

      5 resources configured<br>

      <br>

      Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]<br>

      <br>

      Full list of resources:<br>

      <br>

       Master/Slave Set: ipstor0Clone [ipstor0]<br>

           Masters: [ drbd0-ha.s-ka.local ]<br>

           Slaves: [ drbd1-ha.s-ka.local ]<br>

       Resource Group: p_iSCSI<br>

           ClusterIP    (ocf::heartbeat:IPaddr2):    Started

      drbd0-ha.s-ka.local<br>

           p_iSCSITarget    (ocf::heartbeat:iSCSITarget):    Started

      drbd0-ha.s-ka.local<br>

           p_iSCSILogicalUnit    (ocf::heartbeat:iSCSILogicalUnit):   

      Started drbd0-ha.s-ka.local<br>

      <br>

      Failed Actions:<br>

      * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error'

      (1): call=32, status=complete, exitreason='',<br>

          last-rc-change='Wed Oct 24 08:37:25 2018', queued=0ms,

      exec=23ms<br>

      * p_iSCSILogicalUnit_start_0 on drbd1-ha.s-ka.local 'unknown

      error' (1): call=38, status=complete, exitreason='',<br>

          last-rc-change='Wed Oct 24 08:37:55 2018', queued=0ms,

      exec=28ms<br>

      <br>

      <br>

      Daemon Status:<br>

        corosync: active/enabled<br>

        pacemaker: active/enabled<br>

        pcsd: active/enabled<br>

      [root@drbd0 /]</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">[root@drbd0 /]# pcs constraint show

      --full<br>

      Location Constraints:<br>

      Ordering Constraints:<br>

      Colocation Constraints:<br>

        Resource Sets:<br>

          set ClusterIP p_iSCSITarget p_iSCSILogicalUnit

      (id:pcs_rsc_set_ClusterIP_p_iSCSITarget_p_iSCSILogicalUnit)

      setoptions score=INFINITY

      (id:pcs_rsc_colocation_set_ClusterIP_p_iSCSITarget_p_iSCSILogicalUnit)<br>

      Ticket Constraints:<br>

      [root@drbd0 /]# <br>

      <hr width="100%" size="2"></div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">Best Regards</div>

    <div class="moz-cite-prefix">Lifeng<br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">在 2018/10/19 06:02, Andrei Borzenkov

      写道:<br>

    </div>

    <blockquote type="cite"

      cite="mid:590058dd-8ef9-412f-a79c-501f03b42f3b@gmail.com">

      <pre class="moz-quote-pre" wrap="">16.10.2018 15:29, LiFeng Zhang пишет:

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">Hi, all dear friends,

i need your help to enable the hot switch of iSCSI under a

Pacemaker/Corosync Cluster, which has a iSCSI Device based on a two node

DRBD Replication.

I've got the Pacemaker/Corosync cluster working, DRBD replication also

working, but it stuck at iSCSI, i can manually start a tgtd on one node,

so the VCSA can recognize the iSCSI Disk and create VMFS/StorageObject

on it, and then i can create a test VM on that VMFS.

But when i switch the Primary/Secondary of DRBD, although the test VM

still running, but the underlying Disk became read-only. As far as i

know, the tgtd should be handled by Pacemaker so it will automatically

start on the Primary DRBD Instance, but in my situation it's sadly NOT.

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

pacemaker only handles resources that were started by pacemaker.

According to your output below, in all cases resource was stopped from

pacemaker point of view and all pacemaker attempts to start resource

failed. You should troubleshoot why they failed. This requires knowledge

of specific resource agent, sadly I am not familiar with iSCSI target.

pacemaker logs may include more information from resource agent than

just "unknown reason".

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">

I've tried all kinds of resources/manuals/documents, but they all mixed

with extra information, other system, other software version.

And one of my BEST reference (the closest configuration to mein) is this

url: <a class="moz-txt-link-freetext" href="https://nnc3.com/mags/LJ_1994-2014/LJ/217/11275.html">https://nnc3.com/mags/LJ_1994-2014/LJ/217/11275.html</a>

The difference betwee me and this article, i think is i don't have LVM

Volume but only raw iSCSI Disk, and i have to translate CRM commands

into PCS commands

But after i "copied" the configuration from this article, my cluster can

not start anymore, i've tried remove the LVM resource (which caused a

"device not found" error), but the resource group still can't start and

without any explicit "reason" from Pacemaker.

*1*. The whole configuration is under a two node ESXi 6.5 Cluster, which

has a VCSA one one ESXi host installed.

I have a simple diagram in attachment, which may state the deployment

better.

2. start point:

The involved hosts are all with mapped through local dns, which also

includes the floating vip, the local domain is s-ka.local:

------------------------------------------------------------------------

firwall:    fw01.s-ka.local.        IN    A    192.168.95.249

vcsa:    vc01.s-ka.local.        IN    A    192.168.95.30

esxi:     esx01.s-ka.local.        IN    A    192.168.95.5

esxi:     esx02.s-ka.local.        IN    A    192.168.95.7

drbd:    drbd0.s-ka.local.        IN    A    192.168.95.45

drbd:    drbd1.s-ka.local.        IN    A    192.168.95.47

vip:      ipstor0.s-ka.local.        IN    A    192.168.95.48

heartbeat:    drbd0-ha.s-ka.local.    IN    A    192.168.96.45

heartbeat:    drbd1-ha.s-ka.local.    IN    A    192.168.96.47

------------------------------------------------------------------------

The both drbd server are CentOS 7.5, the installed packages are here:

------------------------------------------------------------------------

[root@drbd0 ~]# cat /etc/centos-release

CentOS Linux release 7.5.1804 (Core)

[root@drbd0 ~]# uname -a

Linux drbd0.s-ka.local 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16

16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[root@drbd1 ~]# yum list installed|grep pacemaker

pacemaker.x86_64 1.1.18-11.el7_5.3              @updates

pacemaker-cli.x86_64 1.1.18-11.el7_5.3              @updates

pacemaker-cluster-libs.x86_64 1.1.18-11.el7_5.3              @updates

pacemaker-libs.x86_64 1.1.18-11.el7_5.3              @updates

[root@drbd1 ~]# yum list installed|grep coro

corosync.x86_64 2.4.3-2.el7_5.1                @updates

corosynclib.x86_64 2.4.3-2.el7_5.1                @updates

[root@drbd1 ~]# yum list installed|grep drbd

drbd90-utils.x86_64 9.3.1-1.el7.elrepo             @elrepo

kmod-drbd90.x86_64 9.0.14-1.el7_5.elrepo          @elrepo

[root@drbd1 ~]# yum list installed|grep -i scsi

lsscsi.x86_64 0.27-6.el7                     @anaconda

scsi-target-utils.x86_64 1.0.55-4.el7                   @epel

------------------------------------------------------------------------

3. configurations

3.1 ok first the drbd configuration

------------------------------------------------------------------------

[root@drbd1 ~]# cat /etc/drbd.conf

# You can find an example in /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";

include "drbd.d/*.res";

[root@drbd1 ~]# cat /etc/drbd.d/r0.res

resource iscsivg01 {

  protocol C;

  device /dev/drbd0;

  disk /dev/vg0/ipstor0;

  flexible-meta-disk internal;

  on drbd0.s-ka.local {

    #volume 0 {

      #device /dev/drbd0;

      #disk /dev/vg0/ipstor0;

      #flexible-meta-disk internal;

    #}

    address 192.168.96.45:7788;

  }

  on drbd1.s-ka.local {

    #volume 0 {

      #device /dev/drbd0;

      #disk /dev/vg0/ipstor0;

      #flexible-meta-disk internal;

    #}

    address 192.168.96.47:7788;

  }

}

------------------------------------------------------------------------

3.2 then the drbd device

------------------------------------------------------------------------

[root@drbd1 ~]# lsblk

NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT

sda               8:0    0   25G  0 disk

├─sda1            8:1    0    1G  0 part /boot

└─sda2            8:2    0   24G  0 part

  ├─centos-root 253:0    0   22G  0 lvm  /

  └─centos-swap 253:1    0    2G  0 lvm  [SWAP]

sdb               8:16   0  500G  0 disk

└─sdb1            8:17   0  500G  0 part

  └─vg0-ipstor0 253:2    0  500G  0 lvm

    └─drbd0     147:0    0  500G  1 disk

sr0              11:0    1 1024M  0 rom

[root@drbd1 ~]# tree /dev/drbd

drbd/  drbd0

[root@drbd1 ~]# tree /dev/drbd

/dev/drbd

├── by-disk

│   └── vg0

│       └── ipstor0 -> ../../../drbd0

└── by-res

    └── iscsivg01

        └── 0 -> ../../../drbd0

4 directories, 2 files

------------------------------------------------------------------------

3.3drbd status

------------------------------------------------------------------------

[root@drbd1 ~]# drbdadm status

iscsivg01 role:Secondary

  disk:UpToDate

  drbd0.s-ka.local role:Primary

    peer-disk:UpToDate

[root@drbd0 ~]# drbdadm status

iscsivg01 role:Primary

  disk:UpToDate

  drbd1.s-ka.local role:Secondary

    peer-disk:UpToDate

[root@drbd0 ~]# cat /proc/drbd

version: 9.0.14-1 (api:2/proto:86-113)

GIT-hash: 62f906cf44ef02a30ce0c148fec223b40c51c533 build by mockbuild@,

2018-05-04 03:32:42

Transports (api:16): tcp (9.0.14-1)

------------------------------------------------------------------------

3.4 Corosync configuration

------------------------------------------------------------------------

[root@drbd0 corosync]# cat /etc/corosync/corosync.conf

totem {

    version: 2

    cluster_name: cluster1

    secauth: off

    transport: udpu

}

nodelist {

    node {

        ring0_addr: drbd0-ha.s-ka.local

        nodeid: 1

    }

    node {

        ring0_addr: drbd1-ha.s-ka.local

        nodeid: 2

    }

}

quorum {

    provider: corosync_votequorum

    two_node: 1

}

logging {

    to_logfile: yes

    logfile: /var/log/cluster/corosync.log

    to_syslog: yes

}

------------------------------------------------------------------------

3.5 Corosync status:

------------------------------------------------------------------------

[root@drbd0 corosync]# systemctl status corosync

● corosync.service - Corosync Cluster Engine

   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled;

vendor preset: disabled)

   Active: active (running) since Sun 2018-10-14 02:58:01 CEST; 2 days ago

     Docs: man:corosync

           man:corosync.conf

           man:corosync_overview

  Process: 1095 ExecStart=/usr/share/corosync/corosync start

(code=exited, status=0/SUCCESS)

 Main PID: 1167 (corosync)

   CGroup: /system.slice/corosync.service

           └─1167 corosync

Oct 14 02:58:00 drbd0.s-ka.local corosync[1167]:  [MAIN  ] Completed

service synchronization, ready to provide service.

Oct 14 02:58:01 drbd0.s-ka.local corosync[1095]: Starting Corosync

Cluster Engine (corosync): [  OK  ]

Oct 14 02:58:01 drbd0.s-ka.local systemd[1]: Started Corosync Cluster

Engine.

Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]:  [TOTEM ] A new

membership (192.168.96.45:384) was formed. Members left: 2

Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]:  [QUORUM] Members[1]: 1

Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]:  [MAIN  ] Completed

service synchronization, ready to provide service.

Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]:  [TOTEM ] A new

membership (192.168.96.45:388) was formed. Members joined: 2

Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]:  [CPG   ] downlist

left_list: 0 received in state 0

Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]:  [QUORUM] Members[2]: 1 2

Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]:  [MAIN  ] Completed

service synchronization, ready to provide service.

------------------------------------------------------------------------

3.6 tgtd configuration:

------------------------------------------------------------------------

[root@drbd0 corosync]# cat /etc/tgt/targets.conf

# This is a sample config file for tgt-admin.

#

# The "#" symbol disables the processing of a line.

# Set the driver. If not specified, defaults to "iscsi".

default-driver iscsi

# Set iSNS parameters, if needed

#iSNSServerIP 192.168.111.222

#iSNSServerPort 3205

#iSNSAccessControl On

#iSNS On

# Continue if tgtadm exits with non-zero code (equivalent of

# --ignore-errors command line option)

#ignore-errors yes

<target iqn.2018-08.s-ka.local:disk.1>

    lun 10

    backing-store /dev/drbd0

    initiator-address 192.168.96.0/24

    initiator-address 192.168.95.0/24

    target-address 192.168.95.48

</target>

------------------------------------------------------------------------

3.7 tgtd has been on both server disabled, only startable from current

Primary DRBD Node.

------------------------------------------------------------------------

Secondary Node:

[root@drbd1 ~]# systemctl status tgtd

● tgtd.service - tgtd iSCSI target daemon

   Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;

vendor preset: disabled)

   Active: inactive (dead)

[root@drbd1 ~]# systemctl restart tgtd

Job for tgtd.service failed because the control process exited with

error code. See "systemctl status tgtd.service" and "journalctl -xe" for

details.

Primary Node:

[root@drbd0 corosync]# systemctl status tgtd

● tgtd.service - tgtd iSCSI target daemon

   Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;

vendor preset: disabled)

   Active: inactive (dead)

[root@drbd0 corosync]# systemctl restart tgtd

[root@drbd0 corosync]# systemctl status  tgtd

● tgtd.service - tgtd iSCSI target daemon

   Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;

vendor preset: disabled)

   Active: active (running) since Tue 2018-10-16 14:09:47 CEST; 2min 29s

ago

  Process: 22300 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys

--name State -v ready (code=exited, status=0/SUCCESS)

  Process: 22272 ExecStartPost=/usr/sbin/tgt-admin -e -c $TGTD_CONFIG

(code=exited, status=0/SUCCESS)

  Process: 22271 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys

--name State -v offline (code=exited, status=0/SUCCESS)

  Process: 22270 ExecStartPost=/bin/sleep 5 (code=exited, status=0/SUCCESS)

 Main PID: 22269 (tgtd)

   CGroup: /system.slice/tgtd.service

           └─22269 /usr/sbin/tgtd -f

Oct 16 14:09:42 drbd0.s-ka.local systemd[1]: Starting tgtd iSCSI target

daemon...

Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: iser_ib_init(3436)

Failed to initialize RDMA; load kernel modules?

Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd:

work_timer_start(146) use timer_fd based scheduler

Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd:

bs_init_signalfd(267) could not open backing-store module directory

/usr/lib64/tgt/backing-store

Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: bs_init(386) use

signalfd notification

Oct 16 14:09:47 drbd0.s-ka.local tgtd[22269]: tgtd: device_mgmt(246)

sz:16 params:path=/dev/drbd0

Oct 16 14:09:47 drbd0.s-ka.local tgtd[22269]: tgtd: bs_thread_open(408) 16

Oct 16 14:09:47 drbd0.s-ka.local systemd[1]: Started tgtd iSCSI target

daemon.

------------------------------------------------------------------------

3.8 it was until this point all working, but if i switched the DRBD

Primary Node, it won't work anymore (FileSystem of test Node became

read-only)

so i changed the pcs configuration according to the previously mentioned

article:

------------------------------------------------------------------------

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">pcs resource create p_iscsivg01 ocf:heartbeat:LVM volgrpname="vg0" op

</pre>

        </blockquote>

        <pre class="moz-quote-pre" wrap="">monitor interval="30"

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">pcs resource group add p_iSCSI p_iscsivg01 p_iSCSITarget

</pre>

        </blockquote>

        <pre class="moz-quote-pre" wrap="">p_iSCSILogicalUnit ClusterIP

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">pcs constraint order start ipstor0Clone then start p_iSCSI then start

</pre>

        </blockquote>

        <pre class="moz-quote-pre" wrap="">ipstor0Clone:Master

[root@drbd0 ~]# pcs status

    Cluster name: cluster1

    Stack: corosync

    Current DC: drbd0-ha.s-ka.local (version

1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum

    Last updated: Sun Oct 14 01:38:18 2018

    Last change: Sun Oct 14 01:37:58 2018 by root via cibadmin on

drbd0-ha.s-ka.local

    2 nodes configured

    6 resources configured

    Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]

    Full list of resources:

     Master/Slave Set: ipstor0Clone [ipstor0]

         Masters: [ drbd0-ha.s-ka.local ]

         Slaves: [ drbd1-ha.s-ka.local ]

     Resource Group: p_iSCSI

         p_iscsivg01    (ocf::heartbeat:LVM):    Stopped

         p_iSCSITarget    (ocf::heartbeat:iSCSITarget):    Stopped

         p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit):    Stopped

         ClusterIP    (ocf::heartbeat:IPaddr2):    Stopped

    Failed Actions:

    * p_iSCSILogicalUnit_start_0 on drbd0-ha.s-ka.local 'unknown error'

(1): call=42, status=complete, exitreason='',

        last-rc-change='Sun Oct 14 01:20:38 2018', queued=0ms, exec=28ms

    * p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):

call=40, status=complete, exitreason='',

        last-rc-change='Sun Oct 14 00:54:36 2018', queued=0ms, exec=23ms

    * p_iscsivg01_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):

call=48, status=complete, exitreason='Volume group [iscsivg01] does not

exist or contains error!   Volume group "iscsivg01" not found',

        last-rc-change='Sun Oct 14 01:32:49 2018', queued=0ms, exec=47ms

    * p_iSCSILogicalUnit_start_0 on drbd1-ha.s-ka.local 'unknown error'

(1): call=41, status=complete, exitreason='',

        last-rc-change='Sun Oct 14 01:20:38 2018', queued=0ms, exec=31ms

    * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):

call=39, status=complete, exitreason='',

        last-rc-change='Sun Oct 14 00:54:36 2018', queued=0ms, exec=24ms

    * p_iscsivg01_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):

call=47, status=complete, exitreason='Volume group [iscsivg01] does not

exist or contains error!   Volume group "iscsivg01" not found',

        last-rc-change='Sun Oct 14 01:32:49 2018', queued=0ms, exec=50ms

    Daemon Status:

      corosync: active/enabled

      pacemaker: active/enabled

      pcsd: active/enabled

    [root@drbd0 ~]#

------------------------------------------------------------------------

3.9 since the "device not found" error, so i remove the LVM, it looks

like this now:

actually it was changed between /dev/drbd/by-disk and /dev/drbd/by-res,

but no effects

------------------------------------------------------------------------

[root@drbd0 corosync]# pcs status

Cluster name: cluster1

Stack: corosync

Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) -

partition with quorum

Last updated: Tue Oct 16 14:18:09 2018

Last change: Sun Oct 14 02:06:36 2018 by root via cibadmin on

drbd0-ha.s-ka.local

2 nodes configured

5 resources configured

Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]

Full list of resources:

 Master/Slave Set: ipstor0Clone [ipstor0]

     Masters: [ drbd0-ha.s-ka.local ]

     Slaves: [ drbd1-ha.s-ka.local ]

 Resource Group: p_iSCSI

     p_iSCSITarget    (ocf::heartbeat:iSCSITarget):    Stopped

     p_iSCSILogicalUnit    (ocf::heartbeat:iSCSILogicalUnit):  Stopped

     ClusterIP    (ocf::heartbeat:IPaddr2):    Stopped

Failed Actions:

* p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):

call=12, status=complete, exitreason='',

    last-rc-change='Sun Oct 14 02:58:04 2018', queued=1ms, exec=58ms

* p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):

call=12, status=complete, exitreason='',

    last-rc-change='Sun Oct 14 10:47:06 2018', queued=0ms, exec=22ms

Daemon Status:

  corosync: active/enabled

  pacemaker: active/enabled

  pcsd: active/enabled

[root@drbd0 corosync]#

------------------------------------------------------------------------

3.10 i've tried with "pcs resouce debug-start xxx --full" on the DRBD

Primary Node,

------------------------------------------------------------------------

[root@drbd0 corosync]# pcs resource debug-start p_iSCSI --full

Error: unable to debug-start a group, try one of the group's resource(s)

(p_iSCSITarget,p_iSCSILogicalUnit,ClusterIP)

[root@drbd0 corosync]# pcs resource debug-start p_iSCSITarget --full

Operation start for p_iSCSITarget (ocf:heartbeat:iSCSITarget) returned:

'ok' (0)

 >  stderr: DEBUG: p_iSCSITarget start : 0

[root@drbd0 corosync]# pcs resource debug-start p_iSCSILogicalUnit --full

Operation start for p_iSCSILogicalUnit (ocf:heartbeat:iSCSILogicalUnit)

returned: 'unknown error' (1)

 >  stderr: ERROR: tgtadm: this logical unit number already exists

[root@drbd0 corosync]# pcs resource debug-start ClusterIP --full

Operation start for ClusterIP (ocf:heartbeat:IPaddr2) returned: 'ok' (0)

 >  stderr: INFO: Adding inet address 192.168.95.48/32 with broadcast

address 192.168.95.255 to device ens192

 >  stderr: INFO: Bringing device ens192 up

 >  stderr: INFO: /usr/libexec/heartbeat/send_arp -i 200 -c 5 -p

/var/run/resource-agents/send_arp-192.168.95.48 -I ens192 -m auto

192.168.95.48

[root@drbd0 corosync]#

------------------------------------------------------------------------

3.11 as you may seen, there are errors, but "p_iSCSITarget" was

successfully startet. but "pcs status" show still "stopped"

------------------------------------------------------------------------

[root@drbd0 corosync]# pcs status

Cluster name: cluster1

Stack: corosync

Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) -

partition with quorum

Last updated: Tue Oct 16 14:22:38 2018

Last change: Sun Oct 14 02:06:36 2018 by root via cibadmin on

drbd0-ha.s-ka.local

2 nodes configured

5 resources configured

Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]

Full list of resources:

 Master/Slave Set: ipstor0Clone [ipstor0]

     Masters: [ drbd0-ha.s-ka.local ]

     Slaves: [ drbd1-ha.s-ka.local ]

 Resource Group: p_iSCSI

     p_iSCSITarget    (ocf::heartbeat:iSCSITarget):    Stopped

     p_iSCSILogicalUnit    (ocf::heartbeat:iSCSILogicalUnit): Stopped

     ClusterIP    (ocf::heartbeat:IPaddr2):    Stopped

Failed Actions:

* p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):

call=12, status=complete, exitreason='',

    last-rc-change='Sun Oct 14 02:58:04 2018', queued=1ms, exec=58ms

* p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):

call=12, status=complete, exitreason='',

    last-rc-change='Sun Oct 14 10:47:06 2018', queued=0ms, exec=22ms

Daemon Status:

  corosync: active/enabled

  pacemaker: active/enabled

  pcsd: active/enabled

[root@drbd0 corosync]#

------------------------------------------------------------------------

3.12 the pcs config is:

------------------------------------------------------------------------

[root@drbd0 corosync]# pcs config

Cluster Name: cluster1

Corosync Nodes:

 drbd0-ha.s-ka.local drbd1-ha.s-ka.local

Pacemaker Nodes:

 drbd0-ha.s-ka.local drbd1-ha.s-ka.local

Resources:

 Master: ipstor0Clone

  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1

clone-node-max=1

  Resource: ipstor0 (class=ocf provider=linbit type=drbd)

   Attributes: drbd_resource=iscsivg01

   Operations: demote interval=0s timeout=90 (ipstor0-demote-interval-0s)

               monitor interval=60s (ipstor0-monitor-interval-60s)

               notify interval=0s timeout=90 (ipstor0-notify-interval-0s)

               promote interval=0s timeout=90 (ipstor0-promote-interval-0s)

               reload interval=0s timeout=30 (ipstor0-reload-interval-0s)

               start interval=0s timeout=240 (ipstor0-start-interval-0s)

               stop interval=0s timeout=100 (ipstor0-stop-interval-0s)

 Group: p_iSCSI

  Resource: p_iSCSITarget (class=ocf provider=heartbeat type=iSCSITarget)

   Attributes: implementation=tgt iqn=iqn.2018-08.s-ka.local:disk.1 tid=1

   Operations: monitor interval=30 timeout=60

(p_iSCSITarget-monitor-interval-30)

               start interval=0 timeout=60 (p_iSCSITarget-start-interval-0)

               stop interval=0 timeout=60 (p_iSCSITarget-stop-interval-0)

  Resource: p_iSCSILogicalUnit (class=ocf provider=heartbeat

type=iSCSILogicalUnit)

   Attributes: implementation=tgt lun=10

path=/dev/drbd/by-disk/vg0/ipstor0 target_iqn=iqn.2018-08.s-ka.local:disk.1

   Operations: monitor interval=30 timeout=60

(p_iSCSILogicalUnit-monitor-interval-30)

               start interval=0 timeout=60

(p_iSCSILogicalUnit-start-interval-0)

               stop interval=0 timeout=60

(p_iSCSILogicalUnit-stop-interval-0)

  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)

   Attributes: cidr_netmask=32 ip=192.168.95.48

   Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)

               start interval=0s timeout=20s (ClusterIP-start-interval-0s)

               stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)

Stonith Devices:

Fencing Levels:

Location Constraints:

Ordering Constraints:

  start ipstor0Clone then start p_iSCSI (kind:Mandatory)

Colocation Constraints:

Ticket Constraints:

Alerts:

 No alerts defined

Resources Defaults:

 migration-threshold: 1

Operations Defaults:

 No defaults set

Cluster Properties:

 cluster-infrastructure: corosync

 cluster-name: cluster1

 dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9

 have-watchdog: false

 last-lrm-refresh: 1539474248

 no-quorum-policy: ignore

 stonith-enabled: false

Quorum:

  Options:

[root@drbd0 corosync]#

------------------------------------------------------------------------

4. so i am out of hands. don't what to do, may just dive into

pacemaker's source code??

Hope to get any feedback or tips from you, thank you very much in

advance :)

Best Regards

Zhang

_______________________________________________

Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>

<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>

Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>

Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

_______________________________________________

Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>

<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>

Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>

Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>

</pre>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>