[ClusterLabs] Need help to enable hot switch of iSCSI (tgtd) under two node Pacemaker + DRBD 9.0 under CentOS 7.5 in ESXi 6.5 Environment
LiFeng Zhang
zhang at linux-systeme.de
Wed Oct 24 02:59:01 EDT 2018
Hello Friends,
further thought about this situation:
I want to have a cluster service tgtd, on a "primary/second" DRBD, since
the DRBD is only active on the primary node, so the tgtd won't success
on the secondary node.
So should i, or how do config a primary/seconday tgtd service?
Or, which feature from pacemaker should i use, so the tgtd starts only
on one node?
for any suggestions, thank you very much in advance
Best Regards
Lifeng
> Hello Dear Andrei Borzenkov,
>
> Thank you very much for your answer. I've check the logs all the time,
> but there are nothing helpful , just a bunch of heartbeat messages.
>
> Anyway, i've read the book "Packt - CentOS High Availability"
> published in 2015, and got some new ideas, and tried out, the
> situation is something new.
>
> ------------------------------------------------------------------------
> pcs resource create p_iSCSITarget ocf:heartbeat:iSCSITarget
> implementation="tgt" iqn="iqn.2018-08.s-ka.local:disk" tid="1"
> pcs resource create p_iSCSILogicalUnit ocf:heartbeat:iSCSILogicalUnit
> implementation="tgt" target_iqn="iqn.2018-08.s-ka.local:disk" lun="10"
> path="/dev/drbd/by-disk/vg0/ipstor0"
> pcs resource group add p_iSCSI ClusterIP p_iSCSITarget p_iSCSILogicalUnit
> pcs constraint colocation set ClusterIP p_iSCSITarget p_iSCSILogicalUnit
> ------------------------------------------------------------------------
>
>
> The difference from previous version is here: use iqn
> "iqn.2018-08.s-ka.local:disk" instead of
> "iqn.2018-08.s-ka.local:disk.1", which the last ".1" maybe means the
> "tid".
>
> now i have new problem, because the resource and tgtd are startet,
> although i set "colocation constraint", the pacemaker always try to
> start tgtd on another node.
> how to i solve this? thank you people in advance!
>
> here the output from "pcs status":
> ------------------------------------------------------------------------
> [root at drbd0 /]# pcs status
> Cluster name: cluster1
> Stack: corosync
> Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9)
> - partition with quorum
> Last updated: Wed Oct 24 08:43:29 2018
> Last change: Wed Oct 24 08:43:24 2018 by root via cibadmin on
> drbd0-ha.s-ka.local
>
> 2 nodes configured
> 5 resources configured
>
> Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]
>
> Full list of resources:
>
> Master/Slave Set: ipstor0Clone [ipstor0]
> Masters: [ drbd0-ha.s-ka.local ]
> Slaves: [ drbd1-ha.s-ka.local ]
> Resource Group: p_iSCSI
> ClusterIP (ocf::heartbeat:IPaddr2): Started drbd0-ha.s-ka.local
> p_iSCSITarget (ocf::heartbeat:iSCSITarget): Started
> drbd0-ha.s-ka.local
> p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit): Started
> drbd0-ha.s-ka.local
>
> Failed Actions:
> * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
> call=32, status=complete, exitreason='',
> last-rc-change='Wed Oct 24 08:37:25 2018', queued=0ms, exec=23ms
> * p_iSCSILogicalUnit_start_0 on drbd1-ha.s-ka.local 'unknown error'
> (1): call=38, status=complete, exitreason='',
> last-rc-change='Wed Oct 24 08:37:55 2018', queued=0ms, exec=28ms
>
>
> Daemon Status:
> corosync: active/enabled
> pacemaker: active/enabled
> pcsd: active/enabled
> [root at drbd0 /]
>
>
> [root at drbd0 /]# pcs constraint show --full
> Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
> Resource Sets:
> set ClusterIP p_iSCSITarget p_iSCSILogicalUnit
> (id:pcs_rsc_set_ClusterIP_p_iSCSITarget_p_iSCSILogicalUnit) setoptions
> score=INFINITY
> (id:pcs_rsc_colocation_set_ClusterIP_p_iSCSITarget_p_iSCSILogicalUnit)
> Ticket Constraints:
> [root at drbd0 /]#
> ------------------------------------------------------------------------
>
> Best Regards
> Lifeng
>
> 在 2018/10/19 06:02, Andrei Borzenkov 写道:
>> 16.10.2018 15:29, LiFeng Zhang пишет:
>>> Hi, all dear friends,
>>>
>>> i need your help to enable the hot switch of iSCSI under a
>>> Pacemaker/Corosync Cluster, which has a iSCSI Device based on a two node
>>> DRBD Replication.
>>>
>>> I've got the Pacemaker/Corosync cluster working, DRBD replication also
>>> working, but it stuck at iSCSI, i can manually start a tgtd on one node,
>>> so the VCSA can recognize the iSCSI Disk and create VMFS/StorageObject
>>> on it, and then i can create a test VM on that VMFS.
>>>
>>> But when i switch the Primary/Secondary of DRBD, although the test VM
>>> still running, but the underlying Disk became read-only. As far as i
>>> know, the tgtd should be handled by Pacemaker so it will automatically
>>> start on the Primary DRBD Instance, but in my situation it's sadly NOT.
>>>
>> pacemaker only handles resources that were started by pacemaker.
>> According to your output below, in all cases resource was stopped from
>> pacemaker point of view and all pacemaker attempts to start resource
>> failed. You should troubleshoot why they failed. This requires knowledge
>> of specific resource agent, sadly I am not familiar with iSCSI target.
>> pacemaker logs may include more information from resource agent than
>> just "unknown reason".
>>
>>> I've tried all kinds of resources/manuals/documents, but they all mixed
>>> with extra information, other system, other software version.
>>>
>>> And one of my BEST reference (the closest configuration to mein) is this
>>> url:https://nnc3.com/mags/LJ_1994-2014/LJ/217/11275.html
>>>
>>> The difference betwee me and this article, i think is i don't have LVM
>>> Volume but only raw iSCSI Disk, and i have to translate CRM commands
>>> into PCS commands
>>>
>>> But after i "copied" the configuration from this article, my cluster can
>>> not start anymore, i've tried remove the LVM resource (which caused a
>>> "device not found" error), but the resource group still can't start and
>>> without any explicit "reason" from Pacemaker.
>>>
>>>
>>> *1*. The whole configuration is under a two node ESXi 6.5 Cluster, which
>>> has a VCSA one one ESXi host installed.
>>>
>>> I have a simple diagram in attachment, which may state the deployment
>>> better.
>>>
>>> 2. start point:
>>>
>>> The involved hosts are all with mapped through local dns, which also
>>> includes the floating vip, the local domain is s-ka.local:
>>>
>>> ------------------------------------------------------------------------
>>>
>>> firwall: fw01.s-ka.local. IN A 192.168.95.249
>>>
>>> vcsa: vc01.s-ka.local. IN A 192.168.95.30
>>> esxi: esx01.s-ka.local. IN A 192.168.95.5
>>> esxi: esx02.s-ka.local. IN A 192.168.95.7
>>>
>>> drbd: drbd0.s-ka.local. IN A 192.168.95.45
>>> drbd: drbd1.s-ka.local. IN A 192.168.95.47
>>> vip: ipstor0.s-ka.local. IN A 192.168.95.48
>>>
>>> heartbeat: drbd0-ha.s-ka.local. IN A 192.168.96.45
>>> heartbeat: drbd1-ha.s-ka.local. IN A 192.168.96.47
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> The both drbd server are CentOS 7.5, the installed packages are here:
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd0 ~]# cat /etc/centos-release
>>> CentOS Linux release 7.5.1804 (Core)
>>>
>>> [root at drbd0 ~]# uname -a
>>> Linux drbd0.s-ka.local 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16
>>> 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> [root at drbd1 ~]# yum list installed|grep pacemaker
>>> pacemaker.x86_64 1.1.18-11.el7_5.3 @updates
>>> pacemaker-cli.x86_64 1.1.18-11.el7_5.3 @updates
>>> pacemaker-cluster-libs.x86_64 1.1.18-11.el7_5.3 @updates
>>> pacemaker-libs.x86_64 1.1.18-11.el7_5.3 @updates
>>>
>>> [root at drbd1 ~]# yum list installed|grep coro
>>> corosync.x86_64 2.4.3-2.el7_5.1 @updates
>>> corosynclib.x86_64 2.4.3-2.el7_5.1 @updates
>>>
>>> [root at drbd1 ~]# yum list installed|grep drbd
>>> drbd90-utils.x86_64 9.3.1-1.el7.elrepo @elrepo
>>> kmod-drbd90.x86_64 9.0.14-1.el7_5.elrepo @elrepo
>>>
>>> [root at drbd1 ~]# yum list installed|grep -i scsi
>>> lsscsi.x86_64 0.27-6.el7 @anaconda
>>> scsi-target-utils.x86_64 1.0.55-4.el7 @epel
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> 3. configurations
>>>
>>> 3.1 ok first the drbd configuration
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd1 ~]# cat /etc/drbd.conf
>>> # You can find an example in /usr/share/doc/drbd.../drbd.conf.example
>>>
>>> include "drbd.d/global_common.conf";
>>> include "drbd.d/*.res";
>>>
>>> [root at drbd1 ~]# cat /etc/drbd.d/r0.res
>>> resource iscsivg01 {
>>> protocol C;
>>> device /dev/drbd0;
>>> disk /dev/vg0/ipstor0;
>>> flexible-meta-disk internal;
>>> on drbd0.s-ka.local {
>>> #volume 0 {
>>> #device /dev/drbd0;
>>> #disk /dev/vg0/ipstor0;
>>> #flexible-meta-disk internal;
>>> #}
>>> address 192.168.96.45:7788;
>>> }
>>> on drbd1.s-ka.local {
>>> #volume 0 {
>>> #device /dev/drbd0;
>>> #disk /dev/vg0/ipstor0;
>>> #flexible-meta-disk internal;
>>> #}
>>> address 192.168.96.47:7788;
>>> }
>>> }
>>>
>>> ------------------------------------------------------------------------
>>>
>>> 3.2 then the drbd device
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd1 ~]# lsblk
>>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>>> sda 8:0 0 25G 0 disk
>>> ├─sda1 8:1 0 1G 0 part /boot
>>> └─sda2 8:2 0 24G 0 part
>>> ├─centos-root 253:0 0 22G 0 lvm /
>>> └─centos-swap 253:1 0 2G 0 lvm [SWAP]
>>> sdb 8:16 0 500G 0 disk
>>> └─sdb1 8:17 0 500G 0 part
>>> └─vg0-ipstor0 253:2 0 500G 0 lvm
>>> └─drbd0 147:0 0 500G 1 disk
>>> sr0 11:0 1 1024M 0 rom
>>>
>>> [root at drbd1 ~]# tree /dev/drbd
>>> drbd/ drbd0
>>> [root at drbd1 ~]# tree /dev/drbd
>>> /dev/drbd
>>> ├── by-disk
>>> │ └── vg0
>>> │ └── ipstor0 -> ../../../drbd0
>>> └── by-res
>>> └── iscsivg01
>>> └── 0 -> ../../../drbd0
>>>
>>> 4 directories, 2 files
>>>
>>> ------------------------------------------------------------------------
>>>
>>> 3.3drbd status
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd1 ~]# drbdadm status
>>> iscsivg01 role:Secondary
>>> disk:UpToDate
>>> drbd0.s-ka.local role:Primary
>>> peer-disk:UpToDate
>>>
>>> [root at drbd0 ~]# drbdadm status
>>> iscsivg01 role:Primary
>>> disk:UpToDate
>>> drbd1.s-ka.local role:Secondary
>>> peer-disk:UpToDate
>>>
>>> [root at drbd0 ~]# cat /proc/drbd
>>> version: 9.0.14-1 (api:2/proto:86-113)
>>> GIT-hash: 62f906cf44ef02a30ce0c148fec223b40c51c533 build by mockbuild@,
>>> 2018-05-04 03:32:42
>>> Transports (api:16): tcp (9.0.14-1)
>>>
>>> ------------------------------------------------------------------------
>>>
>>> 3.4 Corosync configuration
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd0 corosync]# cat /etc/corosync/corosync.conf
>>> totem {
>>> version: 2
>>> cluster_name: cluster1
>>> secauth: off
>>> transport: udpu
>>> }
>>>
>>> nodelist {
>>> node {
>>> ring0_addr: drbd0-ha.s-ka.local
>>> nodeid: 1
>>> }
>>>
>>> node {
>>> ring0_addr: drbd1-ha.s-ka.local
>>> nodeid: 2
>>> }
>>> }
>>>
>>> quorum {
>>> provider: corosync_votequorum
>>> two_node: 1
>>> }
>>>
>>> logging {
>>> to_logfile: yes
>>> logfile: /var/log/cluster/corosync.log
>>> to_syslog: yes
>>> }
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> 3.5 Corosync status:
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd0 corosync]# systemctl status corosync
>>> ● corosync.service - Corosync Cluster Engine
>>> Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled;
>>> vendor preset: disabled)
>>> Active: active (running) since Sun 2018-10-14 02:58:01 CEST; 2 days ago
>>> Docs: man:corosync
>>> man:corosync.conf
>>> man:corosync_overview
>>> Process: 1095 ExecStart=/usr/share/corosync/corosync start
>>> (code=exited, status=0/SUCCESS)
>>> Main PID: 1167 (corosync)
>>> CGroup: /system.slice/corosync.service
>>> └─1167 corosync
>>>
>>> Oct 14 02:58:00 drbd0.s-ka.local corosync[1167]: [MAIN ] Completed
>>> service synchronization, ready to provide service.
>>> Oct 14 02:58:01 drbd0.s-ka.local corosync[1095]: Starting Corosync
>>> Cluster Engine (corosync): [ OK ]
>>> Oct 14 02:58:01 drbd0.s-ka.local systemd[1]: Started Corosync Cluster
>>> Engine.
>>> Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]: [TOTEM ] A new
>>> membership (192.168.96.45:384) was formed. Members left: 2
>>> Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]: [QUORUM] Members[1]: 1
>>> Oct 14 10:46:03 drbd0.s-ka.local corosync[1167]: [MAIN ] Completed
>>> service synchronization, ready to provide service.
>>> Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]: [TOTEM ] A new
>>> membership (192.168.96.45:388) was formed. Members joined: 2
>>> Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]: [CPG ] downlist
>>> left_list: 0 received in state 0
>>> Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]: [QUORUM] Members[2]: 1 2
>>> Oct 14 10:46:22 drbd0.s-ka.local corosync[1167]: [MAIN ] Completed
>>> service synchronization, ready to provide service.
>>>
>>> ------------------------------------------------------------------------
>>>
>>> 3.6 tgtd configuration:
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd0 corosync]# cat /etc/tgt/targets.conf
>>> # This is a sample config file for tgt-admin.
>>> #
>>> # The "#" symbol disables the processing of a line.
>>>
>>> # Set the driver. If not specified, defaults to "iscsi".
>>> default-driver iscsi
>>>
>>> # Set iSNS parameters, if needed
>>> #iSNSServerIP 192.168.111.222
>>> #iSNSServerPort 3205
>>> #iSNSAccessControl On
>>> #iSNS On
>>>
>>> # Continue if tgtadm exits with non-zero code (equivalent of
>>> # --ignore-errors command line option)
>>> #ignore-errors yes
>>>
>>>
>>> <target iqn.2018-08.s-ka.local:disk.1>
>>> lun 10
>>> backing-store /dev/drbd0
>>> initiator-address 192.168.96.0/24
>>> initiator-address 192.168.95.0/24
>>> target-address 192.168.95.48
>>> </target>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> 3.7 tgtd has been on both server disabled, only startable from current
>>> Primary DRBD Node.
>>>
>>> ------------------------------------------------------------------------
>>>
>>> Secondary Node:
>>>
>>> [root at drbd1 ~]# systemctl status tgtd
>>> ● tgtd.service - tgtd iSCSI target daemon
>>> Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;
>>> vendor preset: disabled)
>>> Active: inactive (dead)
>>> [root at drbd1 ~]# systemctl restart tgtd
>>> Job for tgtd.service failed because the control process exited with
>>> error code. See "systemctl status tgtd.service" and "journalctl -xe" for
>>> details.
>>>
>>>
>>> Primary Node:
>>>
>>> [root at drbd0 corosync]# systemctl status tgtd
>>> ● tgtd.service - tgtd iSCSI target daemon
>>> Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;
>>> vendor preset: disabled)
>>> Active: inactive (dead)
>>> [root at drbd0 corosync]# systemctl restart tgtd
>>> [root at drbd0 corosync]# systemctl status tgtd
>>> ● tgtd.service - tgtd iSCSI target daemon
>>> Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled;
>>> vendor preset: disabled)
>>> Active: active (running) since Tue 2018-10-16 14:09:47 CEST; 2min 29s
>>> ago
>>> Process: 22300 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys
>>> --name State -v ready (code=exited, status=0/SUCCESS)
>>> Process: 22272 ExecStartPost=/usr/sbin/tgt-admin -e -c $TGTD_CONFIG
>>> (code=exited, status=0/SUCCESS)
>>> Process: 22271 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys
>>> --name State -v offline (code=exited, status=0/SUCCESS)
>>> Process: 22270 ExecStartPost=/bin/sleep 5 (code=exited, status=0/SUCCESS)
>>> Main PID: 22269 (tgtd)
>>> CGroup: /system.slice/tgtd.service
>>> └─22269 /usr/sbin/tgtd -f
>>>
>>> Oct 16 14:09:42 drbd0.s-ka.local systemd[1]: Starting tgtd iSCSI target
>>> daemon...
>>> Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: iser_ib_init(3436)
>>> Failed to initialize RDMA; load kernel modules?
>>> Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd:
>>> work_timer_start(146) use timer_fd based scheduler
>>> Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd:
>>> bs_init_signalfd(267) could not open backing-store module directory
>>> /usr/lib64/tgt/backing-store
>>> Oct 16 14:09:42 drbd0.s-ka.local tgtd[22269]: tgtd: bs_init(386) use
>>> signalfd notification
>>> Oct 16 14:09:47 drbd0.s-ka.local tgtd[22269]: tgtd: device_mgmt(246)
>>> sz:16 params:path=/dev/drbd0
>>> Oct 16 14:09:47 drbd0.s-ka.local tgtd[22269]: tgtd: bs_thread_open(408) 16
>>> Oct 16 14:09:47 drbd0.s-ka.local systemd[1]: Started tgtd iSCSI target
>>> daemon.
>>>
>>> ------------------------------------------------------------------------
>>>
>>> 3.8 it was until this point all working, but if i switched the DRBD
>>> Primary Node, it won't work anymore (FileSystem of test Node became
>>> read-only)
>>>
>>> so i changed the pcs configuration according to the previously mentioned
>>> article:
>>>
>>> ------------------------------------------------------------------------
>>>
>>>> pcs resource create p_iscsivg01 ocf:heartbeat:LVM volgrpname="vg0" op
>>> monitor interval="30"
>>>
>>>> pcs resource group add p_iSCSI p_iscsivg01 p_iSCSITarget
>>> p_iSCSILogicalUnit ClusterIP
>>>
>>>> pcs constraint order start ipstor0Clone then start p_iSCSI then start
>>> ipstor0Clone:Master
>>>
>>>
>>> [root at drbd0 ~]# pcs status
>>> Cluster name: cluster1
>>> Stack: corosync
>>> Current DC: drbd0-ha.s-ka.local (version
>>> 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
>>> Last updated: Sun Oct 14 01:38:18 2018
>>> Last change: Sun Oct 14 01:37:58 2018 by root via cibadmin on
>>> drbd0-ha.s-ka.local
>>>
>>> 2 nodes configured
>>> 6 resources configured
>>>
>>> Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]
>>>
>>> Full list of resources:
>>>
>>> Master/Slave Set: ipstor0Clone [ipstor0]
>>> Masters: [ drbd0-ha.s-ka.local ]
>>> Slaves: [ drbd1-ha.s-ka.local ]
>>> Resource Group: p_iSCSI
>>> p_iscsivg01 (ocf::heartbeat:LVM): Stopped
>>> p_iSCSITarget (ocf::heartbeat:iSCSITarget): Stopped
>>> p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit): Stopped
>>> ClusterIP (ocf::heartbeat:IPaddr2): Stopped
>>>
>>> Failed Actions:
>>> * p_iSCSILogicalUnit_start_0 on drbd0-ha.s-ka.local 'unknown error'
>>> (1): call=42, status=complete, exitreason='',
>>> last-rc-change='Sun Oct 14 01:20:38 2018', queued=0ms, exec=28ms
>>> * p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):
>>> call=40, status=complete, exitreason='',
>>> last-rc-change='Sun Oct 14 00:54:36 2018', queued=0ms, exec=23ms
>>> * p_iscsivg01_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):
>>> call=48, status=complete, exitreason='Volume group [iscsivg01] does not
>>> exist or contains error! Volume group "iscsivg01" not found',
>>> last-rc-change='Sun Oct 14 01:32:49 2018', queued=0ms, exec=47ms
>>> * p_iSCSILogicalUnit_start_0 on drbd1-ha.s-ka.local 'unknown error'
>>> (1): call=41, status=complete, exitreason='',
>>> last-rc-change='Sun Oct 14 01:20:38 2018', queued=0ms, exec=31ms
>>> * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
>>> call=39, status=complete, exitreason='',
>>> last-rc-change='Sun Oct 14 00:54:36 2018', queued=0ms, exec=24ms
>>> * p_iscsivg01_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
>>> call=47, status=complete, exitreason='Volume group [iscsivg01] does not
>>> exist or contains error! Volume group "iscsivg01" not found',
>>> last-rc-change='Sun Oct 14 01:32:49 2018', queued=0ms, exec=50ms
>>>
>>>
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>> [root at drbd0 ~]#
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> 3.9 since the "device not found" error, so i remove the LVM, it looks
>>> like this now:
>>>
>>> actually it was changed between /dev/drbd/by-disk and /dev/drbd/by-res,
>>> but no effects
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd0 corosync]# pcs status
>>> Cluster name: cluster1
>>> Stack: corosync
>>> Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
>>> partition with quorum
>>> Last updated: Tue Oct 16 14:18:09 2018
>>> Last change: Sun Oct 14 02:06:36 2018 by root via cibadmin on
>>> drbd0-ha.s-ka.local
>>>
>>> 2 nodes configured
>>> 5 resources configured
>>>
>>> Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]
>>>
>>> Full list of resources:
>>>
>>> Master/Slave Set: ipstor0Clone [ipstor0]
>>> Masters: [ drbd0-ha.s-ka.local ]
>>> Slaves: [ drbd1-ha.s-ka.local ]
>>> Resource Group: p_iSCSI
>>> p_iSCSITarget (ocf::heartbeat:iSCSITarget): Stopped
>>> p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit): Stopped
>>> ClusterIP (ocf::heartbeat:IPaddr2): Stopped
>>>
>>> Failed Actions:
>>> * p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):
>>> call=12, status=complete, exitreason='',
>>> last-rc-change='Sun Oct 14 02:58:04 2018', queued=1ms, exec=58ms
>>> * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
>>> call=12, status=complete, exitreason='',
>>> last-rc-change='Sun Oct 14 10:47:06 2018', queued=0ms, exec=22ms
>>>
>>>
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>> [root at drbd0 corosync]#
>>>
>>> ------------------------------------------------------------------------
>>>
>>> 3.10 i've tried with "pcs resouce debug-start xxx --full" on the DRBD
>>> Primary Node,
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd0 corosync]# pcs resource debug-start p_iSCSI --full
>>> Error: unable to debug-start a group, try one of the group's resource(s)
>>> (p_iSCSITarget,p_iSCSILogicalUnit,ClusterIP)
>>>
>>> [root at drbd0 corosync]# pcs resource debug-start p_iSCSITarget --full
>>> Operation start for p_iSCSITarget (ocf:heartbeat:iSCSITarget) returned:
>>> 'ok' (0)
>>> > stderr: DEBUG: p_iSCSITarget start : 0
>>>
>>> [root at drbd0 corosync]# pcs resource debug-start p_iSCSILogicalUnit --full
>>> Operation start for p_iSCSILogicalUnit (ocf:heartbeat:iSCSILogicalUnit)
>>> returned: 'unknown error' (1)
>>> > stderr: ERROR: tgtadm: this logical unit number already exists
>>>
>>> [root at drbd0 corosync]# pcs resource debug-start ClusterIP --full
>>> Operation start for ClusterIP (ocf:heartbeat:IPaddr2) returned: 'ok' (0)
>>> > stderr: INFO: Adding inet address 192.168.95.48/32 with broadcast
>>> address 192.168.95.255 to device ens192
>>> > stderr: INFO: Bringing device ens192 up
>>> > stderr: INFO: /usr/libexec/heartbeat/send_arp -i 200 -c 5 -p
>>> /var/run/resource-agents/send_arp-192.168.95.48 -I ens192 -m auto
>>> 192.168.95.48
>>> [root at drbd0 corosync]#
>>>
>>> ------------------------------------------------------------------------
>>>
>>> 3.11 as you may seen, there are errors, but "p_iSCSITarget" was
>>> successfully startet. but "pcs status" show still "stopped"
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd0 corosync]# pcs status
>>> Cluster name: cluster1
>>> Stack: corosync
>>> Current DC: drbd0-ha.s-ka.local (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
>>> partition with quorum
>>> Last updated: Tue Oct 16 14:22:38 2018
>>> Last change: Sun Oct 14 02:06:36 2018 by root via cibadmin on
>>> drbd0-ha.s-ka.local
>>>
>>> 2 nodes configured
>>> 5 resources configured
>>>
>>> Online: [ drbd0-ha.s-ka.local drbd1-ha.s-ka.local ]
>>>
>>> Full list of resources:
>>>
>>> Master/Slave Set: ipstor0Clone [ipstor0]
>>> Masters: [ drbd0-ha.s-ka.local ]
>>> Slaves: [ drbd1-ha.s-ka.local ]
>>> Resource Group: p_iSCSI
>>> p_iSCSITarget (ocf::heartbeat:iSCSITarget): Stopped
>>> p_iSCSILogicalUnit (ocf::heartbeat:iSCSILogicalUnit): Stopped
>>> ClusterIP (ocf::heartbeat:IPaddr2): Stopped
>>>
>>> Failed Actions:
>>> * p_iSCSITarget_start_0 on drbd0-ha.s-ka.local 'unknown error' (1):
>>> call=12, status=complete, exitreason='',
>>> last-rc-change='Sun Oct 14 02:58:04 2018', queued=1ms, exec=58ms
>>> * p_iSCSITarget_start_0 on drbd1-ha.s-ka.local 'unknown error' (1):
>>> call=12, status=complete, exitreason='',
>>> last-rc-change='Sun Oct 14 10:47:06 2018', queued=0ms, exec=22ms
>>>
>>>
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>> [root at drbd0 corosync]#
>>>
>>> ------------------------------------------------------------------------
>>>
>>> 3.12 the pcs config is:
>>>
>>> ------------------------------------------------------------------------
>>>
>>> [root at drbd0 corosync]# pcs config
>>> Cluster Name: cluster1
>>> Corosync Nodes:
>>> drbd0-ha.s-ka.local drbd1-ha.s-ka.local
>>> Pacemaker Nodes:
>>> drbd0-ha.s-ka.local drbd1-ha.s-ka.local
>>>
>>> Resources:
>>> Master: ipstor0Clone
>>> Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
>>> clone-node-max=1
>>> Resource: ipstor0 (class=ocf provider=linbit type=drbd)
>>> Attributes: drbd_resource=iscsivg01
>>> Operations: demote interval=0s timeout=90 (ipstor0-demote-interval-0s)
>>> monitor interval=60s (ipstor0-monitor-interval-60s)
>>> notify interval=0s timeout=90 (ipstor0-notify-interval-0s)
>>> promote interval=0s timeout=90 (ipstor0-promote-interval-0s)
>>> reload interval=0s timeout=30 (ipstor0-reload-interval-0s)
>>> start interval=0s timeout=240 (ipstor0-start-interval-0s)
>>> stop interval=0s timeout=100 (ipstor0-stop-interval-0s)
>>> Group: p_iSCSI
>>> Resource: p_iSCSITarget (class=ocf provider=heartbeat type=iSCSITarget)
>>> Attributes: implementation=tgt iqn=iqn.2018-08.s-ka.local:disk.1 tid=1
>>> Operations: monitor interval=30 timeout=60
>>> (p_iSCSITarget-monitor-interval-30)
>>> start interval=0 timeout=60 (p_iSCSITarget-start-interval-0)
>>> stop interval=0 timeout=60 (p_iSCSITarget-stop-interval-0)
>>> Resource: p_iSCSILogicalUnit (class=ocf provider=heartbeat
>>> type=iSCSILogicalUnit)
>>> Attributes: implementation=tgt lun=10
>>> path=/dev/drbd/by-disk/vg0/ipstor0 target_iqn=iqn.2018-08.s-ka.local:disk.1
>>> Operations: monitor interval=30 timeout=60
>>> (p_iSCSILogicalUnit-monitor-interval-30)
>>> start interval=0 timeout=60
>>> (p_iSCSILogicalUnit-start-interval-0)
>>> stop interval=0 timeout=60
>>> (p_iSCSILogicalUnit-stop-interval-0)
>>> Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>> Attributes: cidr_netmask=32 ip=192.168.95.48
>>> Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
>>> start interval=0s timeout=20s (ClusterIP-start-interval-0s)
>>> stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
>>>
>>> Stonith Devices:
>>> Fencing Levels:
>>>
>>> Location Constraints:
>>> Ordering Constraints:
>>> start ipstor0Clone then start p_iSCSI (kind:Mandatory)
>>> Colocation Constraints:
>>> Ticket Constraints:
>>>
>>> Alerts:
>>> No alerts defined
>>>
>>> Resources Defaults:
>>> migration-threshold: 1
>>> Operations Defaults:
>>> No defaults set
>>>
>>> Cluster Properties:
>>> cluster-infrastructure: corosync
>>> cluster-name: cluster1
>>> dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9
>>> have-watchdog: false
>>> last-lrm-refresh: 1539474248
>>> no-quorum-policy: ignore
>>> stonith-enabled: false
>>>
>>> Quorum:
>>> Options:
>>> [root at drbd0 corosync]#
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> 4. so i am out of hands. don't what to do, may just dive into
>>> pacemaker's source code??
>>>
>>> Hope to get any feedback or tips from you, thank you very much in
>>> advance :)
>>>
>>>
>>> Best Regards
>>>
>>> Zhang
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list:Users at clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home:http://www.clusterlabs.org
>>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:http://bugs.clusterlabs.org
>>>
>> _______________________________________________
>> Users mailing list:Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home:http://www.clusterlabs.org
>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20181024/cb9e0176/attachment-0002.html>
More information about the Users
mailing list