[ClusterLabs] Need help to enable hot switch of iSCSI (tgtd) under two node Pacemaker + DRBD 9.0 under CentOS 7.5 in ESXi 6.5 Environment
LiFeng Zhang
zhang at linux-systeme.de
Wed Oct 24 02:53:10 EDT 2018
Hello Dear Andrei Borzenkov,
Thank you very much for your answer. I've check the logs all the time,
but there are nothing helpful , just a bunch of heartbeat messages.
Anyway, i've read the book "Packt - CentOS High Availability" published
in 2015, and got some new ideas, and tried out, the situation is
something new.
------------------------------------------------------------------------
pcs resource create p_iSCSITarget ocf:heartbeat:iSCSITarget
implementation="tgt" iqn="iqn.2018-08.s-ka.local:disk" tid="1"
pcs resource create p_iSCSILogicalUnit ocf:heartbeat:iSCSILogicalUnit
implementation="tgt" target_iqn="iqn.2018-08.s-ka.local:disk" lun="10"
path="/dev/drbd/by-disk/vg0/ipstor0"
pcs resource group add p_iSCSI ClusterIP p_iSCSITarget p_iSCSILogicalUnit
pcs constraint colocation set ClusterIP p_iSCSITarget p_iSCSILogicalUnit
------------------------------------------------------------------------
The difference from previous version is here: use iqn
"iqn.2018-08.s-ka.local:disk" instead of
"iqn.2018-08.s-ka.local:disk.1", which the last ".1" maybe means the "tid".
now i have new problem, because the resource and tgtd are startet,
although i set "colocation constraint", the pacemaker always try to
start tgtd on another node.
how to i solve this? thank you people in advance!
here the output from "pcs status":
------------------------------------------------------------------------
[root at drbd0 /]# pcs status
Cluster name: cluster1
Stack: corosync
Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
partition with quorum
Last updated: Wed Oct 24 08:43:29 2018
Last change: Wed Oct 24 08:43:24 2018 by root via cibadmin on
drbd0-ha.s-ka.local
2 nodes configured
5 resources configured
Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]
Full list of resources:
Master/Slave Set: ipstor0Clone [ipstor0]
Masters: [ drbd0-ha.s-ka.local ]
Slaves: [ drbd1-ha.s-ka.local ]
Resource Group: p_iSCSI
ClusterIP (ocf::heartbeat:IPaddr2): Started drbd0-ha.s-ka.local
p_iSCSITarget (ocf::heartbeat:iSCSITarget): Started
drbd0-ha.s-ka.local
p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit): Started
drbd0-ha.s-ka.local
Failed Actions:
* p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
call=32, status=complete, exitreason='',
last-rc-change='Wed Oct 24 08:37:25 2018', queued=0ms, exec=23ms
* p_iSCSILogicalUnit_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
call=38, status=complete, exitreason='',
last-rc-change='Wed Oct 24 08:37:55 2018', queued=0ms, exec=28ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root at drbd0 /]
[root at drbd0 /]# pcs constraint show --full
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Resource Sets:
set ClusterIP p_iSCSITarget p_iSCSILogicalUnit
(id:pcs_rsc_set_ClusterIP_p_iSCSITarget_p_iSCSILogicalUnit) setoptions
score=INFINITY
(id:pcs_rsc_colocation_set_ClusterIP_p_iSCSITarget_p_iSCSILogicalUnit)
Ticket Constraints:
[root at drbd0 /]#
------------------------------------------------------------------------
Best Regards
Lifeng
在 2018/10/19 06:02, Andrei Borzenkov 写道:
> 16.10.2018 15:29, LiFeng Zhang пишет:
>> Hi, all dear friends,
>>
>> i need your help to enable the hot switch of iSCSI under a
>> Pacemaker/Corosync Cluster, which has a iSCSI Device based on a two node
>> DRBD Replication.
>>
>> I've got the Pacemaker/Corosync cluster working, DRBD replication also
>> working, but it stuck at iSCSI, i can manually start a tgtd on one node,
>> so the VCSA can recognize the iSCSI Disk and create VMFS/StorageObject
>> on it, and then i can create a test VM on that VMFS.
>>
>> But when i switch the Primary/Secondary of DRBD, although the test VM
>> still running, but the underlying Disk became read-only. As far as i
>> know, the tgtd should be handled by Pacemaker so it will automatically
>> start on the Primary DRBD Instance, but in my situation it's sadly NOT.
>>
> pacemaker only handles resources that were started by pacemaker.
> According to your output below, in all cases resource was stopped from
> pacemaker point of view and all pacemaker attempts to start resource
> failed. You should troubleshoot why they failed. This requires knowledge
> of specific resource agent, sadly I am not familiar with iSCSI target.
> pacemaker logs may include more information from resource agent than
> just "unknown reason".
>
>> I've tried all kinds of resources/manuals/documents, but they all mixed
>> with extra information, other system, other software version.
>>
>> And one of my BEST reference (the closest configuration to mein) is this
>> url: https://nnc3.com/mags/LJ_1994-2014/LJ/217/11275.html
>>
>> The difference betwee me and this article, i think is i don't have LVM
>> Volume but only raw iSCSI Disk, and i have to translate CRM commands
>> into PCS commands
>>
>> But after i "copied" the configuration from this article, my cluster can
>> not start anymore, i've tried remove the LVM resource (which caused a
>> "device not found" error), but the resource group still can't start and
>> without any explicit "reason" from Pacemaker.
>>
>>
>> *1*. The whole configuration is under a two node ESXi 6.5 Cluster, which
>> has a VCSA one one ESXi host installed.
>>
>> I have a simple diagram in attachment, which may state the deployment
>> better.
>>
>> 2. start point:
>>
>> The involved hosts are all with mapped through local dns, which also
>> includes the floating vip, the local domain is s-ka.local:
>>
>> ------------------------------------------------------------------------
>>
>> firwall: fw01.s-ka.local. IN A 192.168.95.249
>>
>> vcsa: vc01.s-ka.local. IN A 192.168.95.30
>> esxi: esx01.s-ka.local. IN A 192.168.95.5
>> esxi: esx02.s-ka.local. IN A 192.168.95.7
>>
>> drbd: drbd0.s-ka.local. IN A 192.168.95.45
>> drbd: drbd1.s-ka.local. IN A 192.168.95.47
>> vip: ipstor0.s-ka.local. IN A 192.168.95.48
>>
>> heartbeat: drbd0-ha.s-ka.local. IN A 192.168.96.45
>> heartbeat: drbd1-ha.s-ka.local. IN A 192.168.96.47
>>
>> ------------------------------------------------------------------------
>>
>>
>> The both drbd server are CentOS 7.5, the installed packages are here:
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd0 ~]# cat /etc/centos-release
>> CentOS Linux release 7.5.1804 (Core)
>>
>> [root at drbd0 ~]# uname -a
>> Linux drbd0.s-ka.local 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16
>> 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>
>> [root at drbd1 ~]# yum list installed|grep pacemaker
>> pacemaker.x86_64 1.1.18-11.el7_5.3 @updates
>> pacemaker-cli.x86_64 1.1.18-11.el7_5.3 @updates
>> pacemaker-cluster-libs.x86_64 1.1.18-11.el7_5.3 @updates
>> pacemaker-libs.x86_64 1.1.18-11.el7_5.3 @updates
>>
>> [root at drbd1 ~]# yum list installed|grep coro
>> corosync.x86_64 2.4.3-2.el7_5.1 @updates
>> corosynclib.x86_64 2.4.3-2.el7_5.1 @updates
>>
>> [root at drbd1 ~]# yum list installed|grep drbd
>> drbd90-utils.x86_64 9.3.1-1.el7.elrepo @elrepo
>> kmod-drbd90.x86_64 9.0.14-1.el7_5.elrepo @elrepo
>>
>> [root at drbd1 ~]# yum list installed|grep -i scsi
>> lsscsi.x86_64 0.27-6.el7 @anaconda
>> scsi-target-utils.x86_64 1.0.55-4.el7 @epel
>>
>> ------------------------------------------------------------------------
>>
>>
>> 3. configurations
>>
>> 3.1 ok first the drbd configuration
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd1 ~]# cat /etc/drbd.conf
>> # You can find an example in /usr/share/doc/drbd.../drbd.conf.example
>>
>> include "drbd.d/global_common.conf";
>> include "drbd.d/*.res";
>>
>> [root at drbd1 ~]# cat /etc/drbd.d/r0.res
>> resource iscsivg01 {
>> protocol C;
>> device /dev/drbd0;
>> disk /dev/vg0/ipstor0;
>> flexible-meta-disk internal;
>> on drbd0.s-ka.local {
>> #volume 0 {
>> #device /dev/drbd0;
>> #disk /dev/vg0/ipstor0;
>> #flexible-meta-disk internal;
>> #}
>> address 192.168.96.45:7788;
>> }
>> on drbd1.s-ka.local {
>> #volume 0 {
>> #device /dev/drbd0;
>> #disk /dev/vg0/ipstor0;
>> #flexible-meta-disk internal;
>> #}
>> address 192.168.96.47:7788;
>> }
>> }
>>
>> ------------------------------------------------------------------------
>>
>> 3.2 then the drbd device
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd1 ~]# lsblk
>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>> sda 8:0 0 25G 0 disk
>> ├─sda1 8:1 0 1G 0 part /boot
>> └─sda2 8:2 0 24G 0 part
>> ├─centos-root 253:0 0 22G 0 lvm /
>> └─centos-swap 253:1 0 2G 0 lvm [SWAP]
>> sdb 8:16 0 500G 0 disk
>> └─sdb1 8:17 0 500G 0 part
>> └─vg0-ipstor0 253:2 0 500G 0 lvm
>> └─drbd0 147:0 0 500G 1 disk
>> sr0 11:0 1 1024M 0 rom
>>
>> [root at drbd1 ~]# tree /dev/drbd
>> drbd/ drbd0
>> [root at drbd1 ~]# tree /dev/drbd
>> /dev/drbd
>> ├── by-disk
>> │ └── vg0
>> │ └── ipstor0 -> ../../../drbd0
>> └── by-res
>> └── iscsivg01
>> └── 0 -> ../../../drbd0
>>
>> 4 directories, 2 files
>>
>> ------------------------------------------------------------------------
>>
>> 3.3drbd status
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd1 ~]# drbdadm status
>> iscsivg01 role:Secondary
>> disk:UpToDate
>> drbd0.s-ka.local role:Primary
>> peer-disk:UpToDate
>>
>> [root at drbd0 ~]# drbdadm status
>> iscsivg01 role:Primary
>> disk:UpToDate
>> drbd1.s-ka.local role:Secondary
>> peer-disk:UpToDate
>>
>> [root at drbd0 ~]# cat /proc/drbd
>> version: 9.0.14-1 (api:2/proto:86-113)
>> GIT-hash: 62f906cf44ef02a30ce0c148fec223b40c51c533 build by mockbuild@,
>> 2018-05-04 03:32:42
>> Transports (api:16): tcp (9.0.14-1)
>>
>> ------------------------------------------------------------------------
>>
>> 3.4 Corosync configuration
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd0 corosync]# cat /etc/corosync/corosync.conf
>> totem {
>> version: 2
>> cluster_name: cluster1
>> secauth: off
>> transport: udpu
>> }
>>
>> nodelist {
>> node {
>> ring0_addr: drbd0-ha.s-ka.local
>> nodeid: 1
>> }
>>
>> node {
>> ring0_addr: drbd1-ha.s-ka.local
>> nodeid: 2
>> }
>> }
>>
>> quorum {
>> provider: corosync_votequorum
>> two_node: 1
>> }
>>
>> logging {
>> to_logfile: yes
>> logfile: /var/log/cluster/corosync.log
>> to_syslog: yes
>> }
>>
>> ------------------------------------------------------------------------
>>
>>
>> 3.5 Corosync status:
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd0 corosync]# systemctl status corosync
>> ● corosync.service - Corosync Cluster Engine
>> Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled;
>> vendor preset: disabled)
>> Active: active (running) since Sun 2018-10-14 02:58:01 CEST; 2 days ago
>> Docs: man:corosync
>> man:corosync.conf
>> man:corosync_overview
>> Process: 1095 ExecStart=/usr/share/corosync/corosync start
>> (code=exited, status=0/SUCCESS)
>> Main PID: 1167 (corosync)
>> CGroup: /system.slice/corosync.service
>> └─1167 corosync
>>
>> Oct 14 02:58:00 drbd0.s-ka.local corosync[1167]: [MAIN ] Completed
>> service synchronization, ready to provide service.
>> Oct 14 02:58:01 drbd0.s-ka.local corosync[1095]: Starting Corosync
>> Cluster Engine (corosync): [ OK ]
>> Oct 14 02:58:01 drbd0.s-ka.local systemd[1]: Started Corosync Cluster
>> Engine.
>> Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]: [TOTEM ] A new
>> membership (192.168.96.45:384) was formed. Members left: 2
>> Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]: [QUORUM] Members[1]: 1
>> Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]: [MAIN ] Completed
>> service synchronization, ready to provide service.
>> Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]: [TOTEM ] A new
>> membership (192.168.96.45:388) was formed. Members joined: 2
>> Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]: [CPG ] downlist
>> left_list: 0 received in state 0
>> Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]: [QUORUM] Members[2]: 1 2
>> Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]: [MAIN ] Completed
>> service synchronization, ready to provide service.
>>
>> ------------------------------------------------------------------------
>>
>> 3.6 tgtd configuration:
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd0 corosync]# cat /etc/tgt/targets.conf
>> # This is a sample config file for tgt-admin.
>> #
>> # The "#" symbol disables the processing of a line.
>>
>> # Set the driver. If not specified, defaults to "iscsi".
>> default-driver iscsi
>>
>> # Set iSNS parameters, if needed
>> #iSNSServerIP 192.168.111.222
>> #iSNSServerPort 3205
>> #iSNSAccessControl On
>> #iSNS On
>>
>> # Continue if tgtadm exits with non-zero code (equivalent of
>> # --ignore-errors command line option)
>> #ignore-errors yes
>>
>>
>> <target iqn.2018-08.s-ka.local:disk.1>
>> lun 10
>> backing-store /dev/drbd0
>> initiator-address 192.168.96.0/24
>> initiator-address 192.168.95.0/24
>> target-address 192.168.95.48
>> </target>
>>
>> ------------------------------------------------------------------------
>>
>>
>> 3.7 tgtd has been on both server disabled, only startable from current
>> Primary DRBD Node.
>>
>> ------------------------------------------------------------------------
>>
>> Secondary Node:
>>
>> [root at drbd1 ~]# systemctl status tgtd
>> ● tgtd.service - tgtd iSCSI target daemon
>> Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;
>> vendor preset: disabled)
>> Active: inactive (dead)
>> [root at drbd1 ~]# systemctl restart tgtd
>> Job for tgtd.service failed because the control process exited with
>> error code. See "systemctl status tgtd.service" and "journalctl -xe" for
>> details.
>>
>>
>> Primary Node:
>>
>> [root at drbd0 corosync]# systemctl status tgtd
>> ● tgtd.service - tgtd iSCSI target daemon
>> Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;
>> vendor preset: disabled)
>> Active: inactive (dead)
>> [root at drbd0 corosync]# systemctl restart tgtd
>> [root at drbd0 corosync]# systemctl status tgtd
>> ● tgtd.service - tgtd iSCSI target daemon
>> Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;
>> vendor preset: disabled)
>> Active: active (running) since Tue 2018-10-16 14:09:47 CEST; 2min 29s
>> ago
>> Process: 22300 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys
>> --name State -v ready (code=exited, status=0/SUCCESS)
>> Process: 22272 ExecStartPost=/usr/sbin/tgt-admin -e -c $TGTD_CONFIG
>> (code=exited, status=0/SUCCESS)
>> Process: 22271 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys
>> --name State -v offline (code=exited, status=0/SUCCESS)
>> Process: 22270 ExecStartPost=/bin/sleep 5 (code=exited, status=0/SUCCESS)
>> Main PID: 22269 (tgtd)
>> CGroup: /system.slice/tgtd.service
>> └─22269 /usr/sbin/tgtd -f
>>
>> Oct 16 14:09:42 drbd0.s-ka.local systemd[1]: Starting tgtd iSCSI target
>> daemon...
>> Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: iser_ib_init(3436)
>> Failed to initialize RDMA; load kernel modules?
>> Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd:
>> work_timer_start(146) use timer_fd based scheduler
>> Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd:
>> bs_init_signalfd(267) could not open backing-store module directory
>> /usr/lib64/tgt/backing-store
>> Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: bs_init(386) use
>> signalfd notification
>> Oct 16 14:09:47 drbd0.s-ka.local tgtd[22269]: tgtd: device_mgmt(246)
>> sz:16 params:path=/dev/drbd0
>> Oct 16 14:09:47 drbd0.s-ka.local tgtd[22269]: tgtd: bs_thread_open(408) 16
>> Oct 16 14:09:47 drbd0.s-ka.local systemd[1]: Started tgtd iSCSI target
>> daemon.
>>
>> ------------------------------------------------------------------------
>>
>> 3.8 it was until this point all working, but if i switched the DRBD
>> Primary Node, it won't work anymore (FileSystem of test Node became
>> read-only)
>>
>> so i changed the pcs configuration according to the previously mentioned
>> article:
>>
>> ------------------------------------------------------------------------
>>
>>> pcs resource create p_iscsivg01 ocf:heartbeat:LVM volgrpname="vg0" op
>> monitor interval="30"
>>
>>> pcs resource group add p_iSCSI p_iscsivg01 p_iSCSITarget
>> p_iSCSILogicalUnit ClusterIP
>>
>>> pcs constraint order start ipstor0Clone then start p_iSCSI then start
>> ipstor0Clone:Master
>>
>>
>> [root at drbd0 ~]# pcs status
>> Cluster name: cluster1
>> Stack: corosync
>> Current DC: drbd0-ha.s-ka.local (version
>> 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
>> Last updated: Sun Oct 14 01:38:18 2018
>> Last change: Sun Oct 14 01:37:58 2018 by root via cibadmin on
>> drbd0-ha.s-ka.local
>>
>> 2 nodes configured
>> 6 resources configured
>>
>> Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]
>>
>> Full list of resources:
>>
>> Master/Slave Set: ipstor0Clone [ipstor0]
>> Masters: [ drbd0-ha.s-ka.local ]
>> Slaves: [ drbd1-ha.s-ka.local ]
>> Resource Group: p_iSCSI
>> p_iscsivg01 (ocf::heartbeat:LVM): Stopped
>> p_iSCSITarget (ocf::heartbeat:iSCSITarget): Stopped
>> p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit): Stopped
>> ClusterIP (ocf::heartbeat:IPaddr2): Stopped
>>
>> Failed Actions:
>> * p_iSCSILogicalUnit_start_0 on drbd0-ha.s-ka.local 'unknown error'
>> (1): call=42, status=complete, exitreason='',
>> last-rc-change='Sun Oct 14 01:20:38 2018', queued=0ms, exec=28ms
>> * p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):
>> call=40, status=complete, exitreason='',
>> last-rc-change='Sun Oct 14 00:54:36 2018', queued=0ms, exec=23ms
>> * p_iscsivg01_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):
>> call=48, status=complete, exitreason='Volume group [iscsivg01] does not
>> exist or contains error! Volume group "iscsivg01" not found',
>> last-rc-change='Sun Oct 14 01:32:49 2018', queued=0ms, exec=47ms
>> * p_iSCSILogicalUnit_start_0 on drbd1-ha.s-ka.local 'unknown error'
>> (1): call=41, status=complete, exitreason='',
>> last-rc-change='Sun Oct 14 01:20:38 2018', queued=0ms, exec=31ms
>> * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
>> call=39, status=complete, exitreason='',
>> last-rc-change='Sun Oct 14 00:54:36 2018', queued=0ms, exec=24ms
>> * p_iscsivg01_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
>> call=47, status=complete, exitreason='Volume group [iscsivg01] does not
>> exist or contains error! Volume group "iscsivg01" not found',
>> last-rc-change='Sun Oct 14 01:32:49 2018', queued=0ms, exec=50ms
>>
>>
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>> [root at drbd0 ~]#
>>
>> ------------------------------------------------------------------------
>>
>>
>> 3.9 since the "device not found" error, so i remove the LVM, it looks
>> like this now:
>>
>> actually it was changed between /dev/drbd/by-disk and /dev/drbd/by-res,
>> but no effects
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd0 corosync]# pcs status
>> Cluster name: cluster1
>> Stack: corosync
>> Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
>> partition with quorum
>> Last updated: Tue Oct 16 14:18:09 2018
>> Last change: Sun Oct 14 02:06:36 2018 by root via cibadmin on
>> drbd0-ha.s-ka.local
>>
>> 2 nodes configured
>> 5 resources configured
>>
>> Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]
>>
>> Full list of resources:
>>
>> Master/Slave Set: ipstor0Clone [ipstor0]
>> Masters: [ drbd0-ha.s-ka.local ]
>> Slaves: [ drbd1-ha.s-ka.local ]
>> Resource Group: p_iSCSI
>> p_iSCSITarget (ocf::heartbeat:iSCSITarget): Stopped
>> p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit): Stopped
>> ClusterIP (ocf::heartbeat:IPaddr2): Stopped
>>
>> Failed Actions:
>> * p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):
>> call=12, status=complete, exitreason='',
>> last-rc-change='Sun Oct 14 02:58:04 2018', queued=1ms, exec=58ms
>> * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
>> call=12, status=complete, exitreason='',
>> last-rc-change='Sun Oct 14 10:47:06 2018', queued=0ms, exec=22ms
>>
>>
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>> [root at drbd0 corosync]#
>>
>> ------------------------------------------------------------------------
>>
>> 3.10 i've tried with "pcs resouce debug-start xxx --full" on the DRBD
>> Primary Node,
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd0 corosync]# pcs resource debug-start p_iSCSI --full
>> Error: unable to debug-start a group, try one of the group's resource(s)
>> (p_iSCSITarget,p_iSCSILogicalUnit,ClusterIP)
>>
>> [root at drbd0 corosync]# pcs resource debug-start p_iSCSITarget --full
>> Operation start for p_iSCSITarget (ocf:heartbeat:iSCSITarget) returned:
>> 'ok' (0)
>> > stderr: DEBUG: p_iSCSITarget start : 0
>>
>> [root at drbd0 corosync]# pcs resource debug-start p_iSCSILogicalUnit --full
>> Operation start for p_iSCSILogicalUnit (ocf:heartbeat:iSCSILogicalUnit)
>> returned: 'unknown error' (1)
>> > stderr: ERROR: tgtadm: this logical unit number already exists
>>
>> [root at drbd0 corosync]# pcs resource debug-start ClusterIP --full
>> Operation start for ClusterIP (ocf:heartbeat:IPaddr2) returned: 'ok' (0)
>> > stderr: INFO: Adding inet address 192.168.95.48/32 with broadcast
>> address 192.168.95.255 to device ens192
>> > stderr: INFO: Bringing device ens192 up
>> > stderr: INFO: /usr/libexec/heartbeat/send_arp -i 200 -c 5 -p
>> /var/run/resource-agents/send_arp-192.168.95.48 -I ens192 -m auto
>> 192.168.95.48
>> [root at drbd0 corosync]#
>>
>> ------------------------------------------------------------------------
>>
>> 3.11 as you may seen, there are errors, but "p_iSCSITarget" was
>> successfully startet. but "pcs status" show still "stopped"
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd0 corosync]# pcs status
>> Cluster name: cluster1
>> Stack: corosync
>> Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
>> partition with quorum
>> Last updated: Tue Oct 16 14:22:38 2018
>> Last change: Sun Oct 14 02:06:36 2018 by root via cibadmin on
>> drbd0-ha.s-ka.local
>>
>> 2 nodes configured
>> 5 resources configured
>>
>> Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]
>>
>> Full list of resources:
>>
>> Master/Slave Set: ipstor0Clone [ipstor0]
>> Masters: [ drbd0-ha.s-ka.local ]
>> Slaves: [ drbd1-ha.s-ka.local ]
>> Resource Group: p_iSCSI
>> p_iSCSITarget (ocf::heartbeat:iSCSITarget): Stopped
>> p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit): Stopped
>> ClusterIP (ocf::heartbeat:IPaddr2): Stopped
>>
>> Failed Actions:
>> * p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):
>> call=12, status=complete, exitreason='',
>> last-rc-change='Sun Oct 14 02:58:04 2018', queued=1ms, exec=58ms
>> * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
>> call=12, status=complete, exitreason='',
>> last-rc-change='Sun Oct 14 10:47:06 2018', queued=0ms, exec=22ms
>>
>>
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>> [root at drbd0 corosync]#
>>
>> ------------------------------------------------------------------------
>>
>> 3.12 the pcs config is:
>>
>> ------------------------------------------------------------------------
>>
>> [root at drbd0 corosync]# pcs config
>> Cluster Name: cluster1
>> Corosync Nodes:
>> drbd0-ha.s-ka.local drbd1-ha.s-ka.local
>> Pacemaker Nodes:
>> drbd0-ha.s-ka.local drbd1-ha.s-ka.local
>>
>> Resources:
>> Master: ipstor0Clone
>> Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
>> clone-node-max=1
>> Resource: ipstor0 (class=ocf provider=linbit type=drbd)
>> Attributes: drbd_resource=iscsivg01
>> Operations: demote interval=0s timeout=90 (ipstor0-demote-interval-0s)
>> monitor interval=60s (ipstor0-monitor-interval-60s)
>> notify interval=0s timeout=90 (ipstor0-notify-interval-0s)
>> promote interval=0s timeout=90 (ipstor0-promote-interval-0s)
>> reload interval=0s timeout=30 (ipstor0-reload-interval-0s)
>> start interval=0s timeout=240 (ipstor0-start-interval-0s)
>> stop interval=0s timeout=100 (ipstor0-stop-interval-0s)
>> Group: p_iSCSI
>> Resource: p_iSCSITarget (class=ocf provider=heartbeat type=iSCSITarget)
>> Attributes: implementation=tgt iqn=iqn.2018-08.s-ka.local:disk.1 tid=1
>> Operations: monitor interval=30 timeout=60
>> (p_iSCSITarget-monitor-interval-30)
>> start interval=0 timeout=60 (p_iSCSITarget-start-interval-0)
>> stop interval=0 timeout=60 (p_iSCSITarget-stop-interval-0)
>> Resource: p_iSCSILogicalUnit (class=ocf provider=heartbeat
>> type=iSCSILogicalUnit)
>> Attributes: implementation=tgt lun=10
>> path=/dev/drbd/by-disk/vg0/ipstor0 target_iqn=iqn.2018-08.s-ka.local:disk.1
>> Operations: monitor interval=30 timeout=60
>> (p_iSCSILogicalUnit-monitor-interval-30)
>> start interval=0 timeout=60
>> (p_iSCSILogicalUnit-start-interval-0)
>> stop interval=0 timeout=60
>> (p_iSCSILogicalUnit-stop-interval-0)
>> Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>> Attributes: cidr_netmask=32 ip=192.168.95.48
>> Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
>> start interval=0s timeout=20s (ClusterIP-start-interval-0s)
>> stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
>>
>> Stonith Devices:
>> Fencing Levels:
>>
>> Location Constraints:
>> Ordering Constraints:
>> start ipstor0Clone then start p_iSCSI (kind:Mandatory)
>> Colocation Constraints:
>> Ticket Constraints:
>>
>> Alerts:
>> No alerts defined
>>
>> Resources Defaults:
>> migration-threshold: 1
>> Operations Defaults:
>> No defaults set
>>
>> Cluster Properties:
>> cluster-infrastructure: corosync
>> cluster-name: cluster1
>> dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9
>> have-watchdog: false
>> last-lrm-refresh: 1539474248
>> no-quorum-policy: ignore
>> stonith-enabled: false
>>
>> Quorum:
>> Options:
>> [root at drbd0 corosync]#
>>
>> ------------------------------------------------------------------------
>>
>>
>> 4. so i am out of hands. don't what to do, may just dive into
>> pacemaker's source code??
>>
>> Hope to get any feedback or tips from you, thank you very much in
>> advance :)
>>
>>
>> Best Regards
>>
>> Zhang
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20181024/5d3d6248/attachment-0002.html>
More information about the Users
mailing list