[ClusterLabs] Need help to enable hot switch of iSCSI (tgtd) under two node Pacemaker + DRBD 9.0 under CentOS 7.5 in ESXi 6.5 Environment

Tue Oct 16 08:29:17 EDT 2018

Hi, all dear friends,

i need your help to enable the hot switch of iSCSI under a 
Pacemaker/Corosync Cluster, which has a iSCSI Device based on a two node 
DRBD Replication.

I've got the Pacemaker/Corosync cluster working, DRBD replication also 
working, but it stuck at iSCSI, i can manually start a tgtd on one node, 
so the VCSA can recognize the iSCSI Disk and create VMFS/StorageObject 
on it, and then i can create a test VM on that VMFS.

But when i switch the Primary/Secondary of DRBD, although the test VM 
still running, but the underlying Disk became read-only. As far as i 
know, the tgtd should be handled by Pacemaker so it will automatically 
start on the Primary DRBD Instance, but in my situation it's sadly NOT.

I've tried all kinds of resources/manuals/documents, but they all mixed 
with extra information, other system, other software version.

And one of my BEST reference (the closest configuration to mein) is this 
url: https://nnc3.com/mags/LJ_1994-2014/LJ/217/11275.html

The difference betwee me and this article, i think is i don't have LVM 
Volume but only raw iSCSI Disk, and i have to translate CRM commands 
into PCS commands

But after i "copied" the configuration from this article, my cluster can 
not start anymore, i've tried remove the LVM resource (which caused a 
"device not found" error), but the resource group still can't start and 
without any explicit "reason" from Pacemaker.

*1*. The whole configuration is under a two node ESXi 6.5 Cluster, which 
has a VCSA one one ESXi host installed.

I have a simple diagram in attachment, which may state the deployment 
better.

2. start point:

The involved hosts are all with mapped through local dns, which also 
includes the floating vip, the local domain is s-ka.local:

------------------------------------------------------------------------

firwall:    fw01.s-ka.local.        IN    A    192.168.95.249

vcsa:    vc01.s-ka.local.        IN    A    192.168.95.30
esxi:     esx01.s-ka.local.        IN    A    192.168.95.5
esxi:     esx02.s-ka.local.        IN    A    192.168.95.7

drbd:    drbd0.s-ka.local.        IN    A    192.168.95.45
drbd:    drbd1.s-ka.local.        IN    A    192.168.95.47
vip:      ipstor0.s-ka.local.        IN    A    192.168.95.48

heartbeat:    drbd0-ha.s-ka.local.    IN    A    192.168.96.45
heartbeat:    drbd1-ha.s-ka.local.    IN    A    192.168.96.47

------------------------------------------------------------------------

The both drbd server are CentOS 7.5, the installed packages are here:

------------------------------------------------------------------------

[root at drbd0 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)

[root at drbd0 ~]# uname -a
Linux drbd0.s-ka.local 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 
16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[root at drbd1 ~]# yum list installed|grep pacemaker
pacemaker.x86_64 1.1.18-11.el7_5.3              @updates
pacemaker-cli.x86_64 1.1.18-11.el7_5.3              @updates
pacemaker-cluster-libs.x86_64 1.1.18-11.el7_5.3              @updates
pacemaker-libs.x86_64 1.1.18-11.el7_5.3              @updates

[root at drbd1 ~]# yum list installed|grep coro
corosync.x86_64 2.4.3-2.el7_5.1                @updates
corosynclib.x86_64 2.4.3-2.el7_5.1                @updates

[root at drbd1 ~]# yum list installed|grep drbd
drbd90-utils.x86_64 9.3.1-1.el7.elrepo             @elrepo
kmod-drbd90.x86_64 9.0.14-1.el7_5.elrepo          @elrepo

[root at drbd1 ~]# yum list installed|grep -i scsi
lsscsi.x86_64 0.27-6.el7                     @anaconda
scsi-target-utils.x86_64 1.0.55-4.el7                   @epel

------------------------------------------------------------------------

3. configurations

3.1 ok first the drbd configuration

------------------------------------------------------------------------

[root at drbd1 ~]# cat /etc/drbd.conf
# You can find an example in /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

[root at drbd1 ~]# cat /etc/drbd.d/r0.res
resource iscsivg01 {
   protocol C;
   device /dev/drbd0;
   disk /dev/vg0/ipstor0;
   flexible-meta-disk internal;
   on drbd0.s-ka.local {
     #volume 0 {
       #device /dev/drbd0;
       #disk /dev/vg0/ipstor0;
       #flexible-meta-disk internal;
     #}
     address 192.168.96.45:7788;
   }
   on drbd1.s-ka.local {
     #volume 0 {
       #device /dev/drbd0;
       #disk /dev/vg0/ipstor0;
       #flexible-meta-disk internal;
     #}
     address 192.168.96.47:7788;
   }
}

------------------------------------------------------------------------

3.2 then the drbd device

------------------------------------------------------------------------

[root at drbd1 ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   25G  0 disk
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   24G  0 part
   ├─centos-root 253:0    0   22G  0 lvm  /
   └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
sdb               8:16   0  500G  0 disk
└─sdb1            8:17   0  500G  0 part
   └─vg0-ipstor0 253:2    0  500G  0 lvm
     └─drbd0     147:0    0  500G  1 disk
sr0              11:0    1 1024M  0 rom

[root at drbd1 ~]# tree /dev/drbd
drbd/  drbd0
[root at drbd1 ~]# tree /dev/drbd
/dev/drbd
├── by-disk
│   └── vg0
│       └── ipstor0 -> ../../../drbd0
└── by-res
     └── iscsivg01
         └── 0 -> ../../../drbd0

4 directories, 2 files

------------------------------------------------------------------------

3.3drbd status

------------------------------------------------------------------------

[root at drbd1 ~]# drbdadm status
iscsivg01 role:Secondary
   disk:UpToDate
   drbd0.s-ka.local role:Primary
     peer-disk:UpToDate

[root at drbd0 ~]# drbdadm status
iscsivg01 role:Primary
   disk:UpToDate
   drbd1.s-ka.local role:Secondary
     peer-disk:UpToDate

[root at drbd0 ~]# cat /proc/drbd
version: 9.0.14-1 (api:2/proto:86-113)
GIT-hash: 62f906cf44ef02a30ce0c148fec223b40c51c533 build by mockbuild@, 
2018-05-04 03:32:42
Transports (api:16): tcp (9.0.14-1)

------------------------------------------------------------------------

3.4 Corosync configuration

------------------------------------------------------------------------

[root at drbd0 corosync]# cat /etc/corosync/corosync.conf
totem {
     version: 2
     cluster_name: cluster1
     secauth: off
     transport: udpu
}

nodelist {
     node {
         ring0_addr: drbd0-ha.s-ka.local
         nodeid: 1
     }

     node {
         ring0_addr: drbd1-ha.s-ka.local
         nodeid: 2
     }
}

quorum {
     provider: corosync_votequorum
     two_node: 1
}

logging {
     to_logfile: yes
     logfile: /var/log/cluster/corosync.log
     to_syslog: yes
}

------------------------------------------------------------------------

3.5 Corosync status:

------------------------------------------------------------------------

[root at drbd0 corosync]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
    Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; 
vendor preset: disabled)
    Active: active (running) since Sun 2018-10-14 02:58:01 CEST; 2 days ago
      Docs: man:corosync
            man:corosync.conf
            man:corosync_overview
   Process: 1095 ExecStart=/usr/share/corosync/corosync start 
(code=exited, status=0/SUCCESS)
  Main PID: 1167 (corosync)
    CGroup: /system.slice/corosync.service
            └─1167 corosync

Oct 14 02:58:00 drbd0.s-ka.local corosync[1167]:  [MAIN  ] Completed 
service synchronization, ready to provide service.
Oct 14 02:58:01 drbd0.s-ka.local corosync[1095]: Starting Corosync 
Cluster Engine (corosync): [  OK  ]
Oct 14 02:58:01 drbd0.s-ka.local systemd[1]: Started Corosync Cluster 
Engine.
Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]:  [TOTEM ] A new 
membership (192.168.96.45:384) was formed. Members left: 2
Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]:  [QUORUM] Members[1]: 1
Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]:  [MAIN  ] Completed 
service synchronization, ready to provide service.
Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]:  [TOTEM ] A new 
membership (192.168.96.45:388) was formed. Members joined: 2
Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]:  [CPG   ] downlist 
left_list: 0 received in state 0
Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]:  [QUORUM] Members[2]: 1 2
Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]:  [MAIN  ] Completed 
service synchronization, ready to provide service.

------------------------------------------------------------------------

3.6 tgtd configuration:

------------------------------------------------------------------------

[root at drbd0 corosync]# cat /etc/tgt/targets.conf
# This is a sample config file for tgt-admin.
#
# The "#" symbol disables the processing of a line.

# Set the driver. If not specified, defaults to "iscsi".
default-driver iscsi

# Set iSNS parameters, if needed
#iSNSServerIP 192.168.111.222
#iSNSServerPort 3205
#iSNSAccessControl On
#iSNS On

# Continue if tgtadm exits with non-zero code (equivalent of
# --ignore-errors command line option)
#ignore-errors yes

<target iqn.2018-08.s-ka.local:disk.1>
     lun 10
     backing-store /dev/drbd0
     initiator-address 192.168.96.0/24
     initiator-address 192.168.95.0/24
     target-address 192.168.95.48
</target>

------------------------------------------------------------------------

3.7 tgtd has been on both server disabled, only startable from current 
Primary DRBD Node.

------------------------------------------------------------------------

Secondary Node:

[root at drbd1 ~]# systemctl status tgtd
● tgtd.service - tgtd iSCSI target daemon
    Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; 
vendor preset: disabled)
    Active: inactive (dead)
[root at drbd1 ~]# systemctl restart tgtd
Job for tgtd.service failed because the control process exited with 
error code. See "systemctl status tgtd.service" and "journalctl -xe" for 
details.

Primary Node:

[root at drbd0 corosync]# systemctl status tgtd
● tgtd.service - tgtd iSCSI target daemon
    Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; 
vendor preset: disabled)
    Active: inactive (dead)
[root at drbd0 corosync]# systemctl restart tgtd
[root at drbd0 corosync]# systemctl status  tgtd
● tgtd.service - tgtd iSCSI target daemon
    Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; 
vendor preset: disabled)
    Active: active (running) since Tue 2018-10-16 14:09:47 CEST; 2min 
29s ago
   Process: 22300 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys 
--name State -v ready (code=exited, status=0/SUCCESS)
   Process: 22272 ExecStartPost=/usr/sbin/tgt-admin -e -c $TGTD_CONFIG 
(code=exited, status=0/SUCCESS)
   Process: 22271 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys 
--name State -v offline (code=exited, status=0/SUCCESS)
   Process: 22270 ExecStartPost=/bin/sleep 5 (code=exited, status=0/SUCCESS)
  Main PID: 22269 (tgtd)
    CGroup: /system.slice/tgtd.service
            └─22269 /usr/sbin/tgtd -f

Oct 16 14:09:42 drbd0.s-ka.local systemd[1]: Starting tgtd iSCSI target 
daemon...
Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: iser_ib_init(3436) 
Failed to initialize RDMA; load kernel modules?
Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: 
work_timer_start(146) use timer_fd based scheduler
Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: 
bs_init_signalfd(267) could not open backing-store module directory 
/usr/lib64/tgt/backing-store
Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: bs_init(386) use 
signalfd notification
Oct 16 14:09:47 drbd0.s-ka.local tgtd[22269]: tgtd: device_mgmt(246) 
sz:16 params:path=/dev/drbd0
Oct 16 14:09:47 drbd0.s-ka.local tgtd[22269]: tgtd: bs_thread_open(408) 16
Oct 16 14:09:47 drbd0.s-ka.local systemd[1]: Started tgtd iSCSI target 
daemon.

------------------------------------------------------------------------

3.8 it was until this point all working, but if i switched the DRBD 
Primary Node, it won't work anymore (FileSystem of test Node became 
read-only)

so i changed the pcs configuration according to the previously mentioned 
article:

------------------------------------------------------------------------

 > pcs resource create p_iscsivg01 ocf:heartbeat:LVM volgrpname="vg0" op 
monitor interval="30"

 > pcs resource group add p_iSCSI p_iscsivg01 p_iSCSITarget 
p_iSCSILogicalUnit ClusterIP

 > pcs constraint order start ipstor0Clone then start p_iSCSI then start 
ipstor0Clone:Master

[root at drbd0 ~]# pcs status
     Cluster name: cluster1
     Stack: corosync
     Current DC: drbd0-ha.s-ka.local (version 
1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
     Last updated: Sun Oct 14 01:38:18 2018
     Last change: Sun Oct 14 01:37:58 2018 by root via cibadmin on 
drbd0-ha.s-ka.local

     2 nodes configured
     6 resources configured

     Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]

     Full list of resources:

      Master/Slave Set: ipstor0Clone [ipstor0]
          Masters: [ drbd0-ha.s-ka.local ]
          Slaves: [ drbd1-ha.s-ka.local ]
      Resource Group: p_iSCSI
          p_iscsivg01    (ocf::heartbeat:LVM):    Stopped
          p_iSCSITarget    (ocf::heartbeat:iSCSITarget):    Stopped
          p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit):    Stopped
          ClusterIP    (ocf::heartbeat:IPaddr2):    Stopped

     Failed Actions:
     * p_iSCSILogicalUnit_start_0 on drbd0-ha.s-ka.local 'unknown error' 
(1): call=42, status=complete, exitreason='',
         last-rc-change='Sun Oct 14 01:20:38 2018', queued=0ms, exec=28ms
     * p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1): 
call=40, status=complete, exitreason='',
         last-rc-change='Sun Oct 14 00:54:36 2018', queued=0ms, exec=23ms
     * p_iscsivg01_start_0 on drbd0-ha.s-ka.local 'unknown error' (1): 
call=48, status=complete, exitreason='Volume group [iscsivg01] does not 
exist or contains error!   Volume group "iscsivg01" not found',
         last-rc-change='Sun Oct 14 01:32:49 2018', queued=0ms, exec=47ms
     * p_iSCSILogicalUnit_start_0 on drbd1-ha.s-ka.local 'unknown error' 
(1): call=41, status=complete, exitreason='',
         last-rc-change='Sun Oct 14 01:20:38 2018', queued=0ms, exec=31ms
     * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1): 
call=39, status=complete, exitreason='',
         last-rc-change='Sun Oct 14 00:54:36 2018', queued=0ms, exec=24ms
     * p_iscsivg01_start_0 on drbd1-ha.s-ka.local 'unknown error' (1): 
call=47, status=complete, exitreason='Volume group [iscsivg01] does not 
exist or contains error!   Volume group "iscsivg01" not found',
         last-rc-change='Sun Oct 14 01:32:49 2018', queued=0ms, exec=50ms

     Daemon Status:
       corosync: active/enabled
       pacemaker: active/enabled
       pcsd: active/enabled
     [root at drbd0 ~]#

------------------------------------------------------------------------

3.9 since the "device not found" error, so i remove the LVM, it looks 
like this now:

actually it was changed between /dev/drbd/by-disk and /dev/drbd/by-res, 
but no effects

------------------------------------------------------------------------

[root at drbd0 corosync]# pcs status
Cluster name: cluster1
Stack: corosync
Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) - 
partition with quorum
Last updated: Tue Oct 16 14:18:09 2018
Last change: Sun Oct 14 02:06:36 2018 by root via cibadmin on 
drbd0-ha.s-ka.local

2 nodes configured
5 resources configured

Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]

Full list of resources:

  Master/Slave Set: ipstor0Clone [ipstor0]
      Masters: [ drbd0-ha.s-ka.local ]
      Slaves: [ drbd1-ha.s-ka.local ]
  Resource Group: p_iSCSI
      p_iSCSITarget    (ocf::heartbeat:iSCSITarget):    Stopped
      p_iSCSILogicalUnit    (ocf::heartbeat:iSCSILogicalUnit):  Stopped
      ClusterIP    (ocf::heartbeat:IPaddr2):    Stopped

Failed Actions:
* p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1): 
call=12, status=complete, exitreason='',
     last-rc-change='Sun Oct 14 02:58:04 2018', queued=1ms, exec=58ms
* p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1): 
call=12, status=complete, exitreason='',
     last-rc-change='Sun Oct 14 10:47:06 2018', queued=0ms, exec=22ms

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled
[root at drbd0 corosync]#

------------------------------------------------------------------------

3.10 i've tried with "pcs resouce debug-start xxx --full" on the DRBD 
Primary Node,

------------------------------------------------------------------------

[root at drbd0 corosync]# pcs resource debug-start p_iSCSI --full
Error: unable to debug-start a group, try one of the group's resource(s) 
(p_iSCSITarget,p_iSCSILogicalUnit,ClusterIP)

[root at drbd0 corosync]# pcs resource debug-start p_iSCSITarget --full
Operation start for p_iSCSITarget (ocf:heartbeat:iSCSITarget) returned: 
'ok' (0)
  >  stderr: DEBUG: p_iSCSITarget start : 0

[root at drbd0 corosync]# pcs resource debug-start p_iSCSILogicalUnit --full
Operation start for p_iSCSILogicalUnit (ocf:heartbeat:iSCSILogicalUnit) 
returned: 'unknown error' (1)
  >  stderr: ERROR: tgtadm: this logical unit number already exists

[root at drbd0 corosync]# pcs resource debug-start ClusterIP --full
Operation start for ClusterIP (ocf:heartbeat:IPaddr2) returned: 'ok' (0)
  >  stderr: INFO: Adding inet address 192.168.95.48/32 with broadcast 
address 192.168.95.255 to device ens192
  >  stderr: INFO: Bringing device ens192 up
  >  stderr: INFO: /usr/libexec/heartbeat/send_arp -i 200 -c 5 -p 
/var/run/resource-agents/send_arp-192.168.95.48 -I ens192 -m auto 
192.168.95.48
[root at drbd0 corosync]#

------------------------------------------------------------------------

3.11 as you may seen, there are errors, but "p_iSCSITarget" was 
successfully startet. but "pcs status" show still "stopped"

------------------------------------------------------------------------

[root at drbd0 corosync]# pcs status
Cluster name: cluster1
Stack: corosync
Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) - 
partition with quorum
Last updated: Tue Oct 16 14:22:38 2018
Last change: Sun Oct 14 02:06:36 2018 by root via cibadmin on 
drbd0-ha.s-ka.local

2 nodes configured
5 resources configured

Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]

Full list of resources:

  Master/Slave Set: ipstor0Clone [ipstor0]
      Masters: [ drbd0-ha.s-ka.local ]
      Slaves: [ drbd1-ha.s-ka.local ]
  Resource Group: p_iSCSI
      p_iSCSITarget    (ocf::heartbeat:iSCSITarget):    Stopped
      p_iSCSILogicalUnit    (ocf::heartbeat:iSCSILogicalUnit): Stopped
      ClusterIP    (ocf::heartbeat:IPaddr2):    Stopped

Failed Actions:
* p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1): 
call=12, status=complete, exitreason='',
     last-rc-change='Sun Oct 14 02:58:04 2018', queued=1ms, exec=58ms
* p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1): 
call=12, status=complete, exitreason='',
     last-rc-change='Sun Oct 14 10:47:06 2018', queued=0ms, exec=22ms

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled
[root at drbd0 corosync]#

------------------------------------------------------------------------

3.12 the pcs config is:

------------------------------------------------------------------------

[root at drbd0 corosync]# pcs config
Cluster Name: cluster1
Corosync Nodes:
  drbd0-ha.s-ka.local drbd1-ha.s-ka.local
Pacemaker Nodes:
  drbd0-ha.s-ka.local drbd1-ha.s-ka.local

Resources:
  Master: ipstor0Clone
   Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 
clone-node-max=1
   Resource: ipstor0 (class=ocf provider=linbit type=drbd)
    Attributes: drbd_resource=iscsivg01
    Operations: demote interval=0s timeout=90 (ipstor0-demote-interval-0s)
                monitor interval=60s (ipstor0-monitor-interval-60s)
                notify interval=0s timeout=90 (ipstor0-notify-interval-0s)
                promote interval=0s timeout=90 (ipstor0-promote-interval-0s)
                reload interval=0s timeout=30 (ipstor0-reload-interval-0s)
                start interval=0s timeout=240 (ipstor0-start-interval-0s)
                stop interval=0s timeout=100 (ipstor0-stop-interval-0s)
  Group: p_iSCSI
   Resource: p_iSCSITarget (class=ocf provider=heartbeat type=iSCSITarget)
    Attributes: implementation=tgt iqn=iqn.2018-08.s-ka.local:disk.1 tid=1
    Operations: monitor interval=30 timeout=60 
(p_iSCSITarget-monitor-interval-30)
                start interval=0 timeout=60 (p_iSCSITarget-start-interval-0)
                stop interval=0 timeout=60 (p_iSCSITarget-stop-interval-0)
   Resource: p_iSCSILogicalUnit (class=ocf provider=heartbeat 
type=iSCSILogicalUnit)
    Attributes: implementation=tgt lun=10 
path=/dev/drbd/by-disk/vg0/ipstor0 target_iqn=iqn.2018-08.s-ka.local:disk.1
    Operations: monitor interval=30 timeout=60 
(p_iSCSILogicalUnit-monitor-interval-30)
                start interval=0 timeout=60 
(p_iSCSILogicalUnit-start-interval-0)
                stop interval=0 timeout=60 
(p_iSCSILogicalUnit-stop-interval-0)
   Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: cidr_netmask=32 ip=192.168.95.48
    Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
                start interval=0s timeout=20s (ClusterIP-start-interval-0s)
                stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
   start ipstor0Clone then start p_iSCSI (kind:Mandatory)
Colocation Constraints:
Ticket Constraints:

Alerts:
  No alerts defined

Resources Defaults:
  migration-threshold: 1
Operations Defaults:
  No defaults set

Cluster Properties:
  cluster-infrastructure: corosync
  cluster-name: cluster1
  dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9
  have-watchdog: false
  last-lrm-refresh: 1539474248
  no-quorum-policy: ignore
  stonith-enabled: false

Quorum:
   Options:
[root at drbd0 corosync]#

------------------------------------------------------------------------

4. so i am out of hands. don't what to do, may just dive into 
pacemaker's source code??

Hope to get any feedback or tips from you, thank you very much in advance :)

Best Regards

Zhang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20181016/2daed806/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ClusterLabs Problem.png
Type: image/png
Size: 92917 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20181016/2daed806/attachment-0001.png>