[ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary node to Slave (always Stopped)

Thu Sep 17 23:42:01 UTC 2015

Ah yes, sorry.

clone-node-max
How many copies of the resource can be started on a single node; the
default value is 1.

So yes, a value of 1 here is correct.

Luke Pascoe

*E* luke at osnz.co.nz
* P* +64 (9) 296 2961
* M* +64 (27) 426 6649
* W* www.osnz.co.nz

24 Wellington St
Papakura
Auckland, 2110
New Zealand

On 18 September 2015 at 11:36, Jason Gress <jgress at accertify.com> wrote:

> I can't say whether or not you are right or wrong (you may be!) but I
> followed the Cluster From Scratch tutorial closely, and it only had a
> clone-node-max=1 there.  (Page 106 of the pdf, for the curious.)
>
> Thanks,
>
> Jason
>
> From: Luke Pascoe <luke at osnz.co.nz>
> Reply-To: Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>
> Date: Thursday, September 17, 2015 at 6:29 PM
> To: Cluster Labs - All topics related to open-source clustering welcomed <
> users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary
> node to Slave (always Stopped)
>
> I may be wrong, but shouldn't "clone-node-max" be 2 on the ms_drbd_vmfs
> resource?
>
> Luke Pascoe
>
>
>
> *E* luke at osnz.co.nz
> *P* +64 (9) 296 2961
> *M* +64 (27) 426 6649
> *W* www.osnz.co.nz
>
> 24 Wellington St
> Papakura
> Auckland, 2110
> New Zealand
>
> On 18 September 2015 at 11:02, Jason Gress <jgress at accertify.com> wrote:
>
>> I have a simple DRBD + filesystem + NFS configuration that works properly
>> when I manually start/stop DRBD, but will not start the DRBD slave resource
>> properly on failover or recovery.  I cannot ever get the Master/Slave set
>> to say anything but 'Stopped'.  I am running CentOS 7.1 with the latest
>> packages as of today:
>>
>> [root at fx201-1a log]# rpm -qa | grep -e pcs -e pacemaker -e drbd
>> pacemaker-cluster-libs-1.1.12-22.el7_1.4.x86_64
>> pacemaker-1.1.12-22.el7_1.4.x86_64
>> pcs-0.9.137-13.el7_1.4.x86_64
>> pacemaker-libs-1.1.12-22.el7_1.4.x86_64
>> drbd84-utils-8.9.3-1.1.el7.elrepo.x86_64
>> pacemaker-cli-1.1.12-22.el7_1.4.x86_64
>> kmod-drbd84-8.4.6-1.el7.elrepo.x86_64
>>
>> Here is my pcs config output:
>>
>> [root at fx201-1a log]# pcs config
>> Cluster Name: fx201-vmcl
>> Corosync Nodes:
>>  fx201-1a.ams fx201-1b.ams
>> Pacemaker Nodes:
>>  fx201-1a.ams fx201-1b.ams
>>
>> Resources:
>>  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>   Attributes: ip=10.XX.XX.XX cidr_netmask=24
>>   Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
>>               stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
>>               monitor interval=15s (ClusterIP-monitor-interval-15s)
>>  Master: ms_drbd_vmfs
>>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
>> notify=true
>>   Resource: drbd_vmfs (class=ocf provider=linbit type=drbd)
>>    Attributes: drbd_resource=vmfs
>>    Operations: start interval=0s timeout=240 (drbd_vmfs-start-timeout-240)
>>                promote interval=0s timeout=90
>> (drbd_vmfs-promote-timeout-90)
>>                demote interval=0s timeout=90 (drbd_vmfs-demote-timeout-90)
>>                stop interval=0s timeout=100 (drbd_vmfs-stop-timeout-100)
>>                monitor interval=30s (drbd_vmfs-monitor-interval-30s)
>>  Resource: vmfsFS (class=ocf provider=heartbeat type=Filesystem)
>>   Attributes: device=/dev/drbd0 directory=/exports/vmfs fstype=xfs
>>   Operations: start interval=0s timeout=60 (vmfsFS-start-timeout-60)
>>               stop interval=0s timeout=60 (vmfsFS-stop-timeout-60)
>>               monitor interval=20 timeout=40 (vmfsFS-monitor-interval-20)
>>  Resource: nfs-server (class=systemd type=nfs-server)
>>   Operations: monitor interval=60s (nfs-server-monitor-interval-60s)
>>
>> Stonith Devices:
>> Fencing Levels:
>>
>> Location Constraints:
>> Ordering Constraints:
>>   promote ms_drbd_vmfs then start vmfsFS (kind:Mandatory)
>> (id:order-ms_drbd_vmfs-vmfsFS-mandatory)
>>   start vmfsFS then start nfs-server (kind:Mandatory)
>> (id:order-vmfsFS-nfs-server-mandatory)
>>   start ClusterIP then start nfs-server (kind:Mandatory)
>> (id:order-ClusterIP-nfs-server-mandatory)
>> Colocation Constraints:
>>   ms_drbd_vmfs with ClusterIP (score:INFINITY)
>> (id:colocation-ms_drbd_vmfs-ClusterIP-INFINITY)
>>   vmfsFS with ms_drbd_vmfs (score:INFINITY) (with-rsc-role:Master)
>> (id:colocation-vmfsFS-ms_drbd_vmfs-INFINITY)
>>   nfs-server with vmfsFS (score:INFINITY)
>> (id:colocation-nfs-server-vmfsFS-INFINITY)
>>
>> Cluster Properties:
>>  cluster-infrastructure: corosync
>>  cluster-name: fx201-vmcl
>>  dc-version: 1.1.13-a14efad
>>  have-watchdog: false
>>  last-lrm-refresh: 1442528181
>>  stonith-enabled: false
>>
>> And status:
>>
>> [root at fx201-1a log]# pcs status --full
>> Cluster name: fx201-vmcl
>> Last updated: Thu Sep 17 17:55:56 2015 Last change: Thu Sep 17 17:18:10
>> 2015 by root via crm_attribute on fx201-1b.ams
>> Stack: corosync
>> Current DC: fx201-1b.ams (2) (version 1.1.13-a14efad) - partition with
>> quorum
>> 2 nodes and 5 resources configured
>>
>> Online: [ fx201-1a.ams (1) fx201-1b.ams (2) ]
>>
>> Full list of resources:
>>
>>  ClusterIP (ocf::heartbeat:IPaddr2):Started fx201-1a.ams
>>  Master/Slave Set: ms_drbd_vmfs [drbd_vmfs]
>>      drbd_vmfs (ocf::linbit:drbd):Master fx201-1a.ams
>>      drbd_vmfs (ocf::linbit:drbd):Stopped
>>      Masters: [ fx201-1a.ams ]
>>      Stopped: [ fx201-1b.ams ]
>>  vmfsFS (ocf::heartbeat:Filesystem):Started fx201-1a.ams
>>  nfs-server (systemd:nfs-server):Started fx201-1a.ams
>>
>> PCSD Status:
>>   fx201-1a.ams: Online
>>   fx201-1b.ams: Online
>>
>> Daemon Status:
>>   corosync: active/enabled
>>   pacemaker: active/enabled
>>   pcsd: active/enabled
>>
>> If I do a failover, after manually confirming that the DRBD data is
>> synchronized completely, it does work, but then never reconnects the
>> secondary side, and in order to get the resource synchronized again, I have
>> to manually correct it, ad infinitum.  I have tried standby/unstandby, pcs
>> resource debug-start (with undesirable results), and so on.
>>
>> Here are some relevant log messages from pacemaker.log:
>>
>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net       crmd:     info:
>> crm_timer_popped:PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net       crmd:   notice:
>> do_state_transition:State transition S_IDLE -> S_POLICY_ENGINE [
>> input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net       crmd:     info:
>> do_state_transition:Progressed to state S_POLICY_ENGINE after
>> C_TIMER_POPPED
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> process_pe_message:Input has not changed since last time, not saving to
>> disk
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> determine_online_status:Node fx201-1b.ams is online
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> determine_online_status:Node fx201-1a.ams is online
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> determine_op_status:Operation monitor found resource drbd_vmfs:0 active
>> in master mode on fx201-1b.ams
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> determine_op_status:Operation monitor found resource drbd_vmfs:0 active
>> on fx201-1a.ams
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> native_print:ClusterIP (ocf::heartbeat:IPaddr2):Started fx201-1a.ams
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> clone_print:Master/Slave Set: ms_drbd_vmfs [drbd_vmfs]
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> short_print:    Masters: [ fx201-1a.ams ]
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> short_print:    Stopped: [ fx201-1b.ams ]
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> native_print:vmfsFS (ocf::heartbeat:Filesystem):Started fx201-1a.ams
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> native_print:nfs-server (systemd:nfs-server):Started fx201-1a.ams
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> native_color:Resource drbd_vmfs:1 cannot run anywhere
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> master_color:Promoting drbd_vmfs:0 (Master fx201-1a.ams)
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> master_color:ms_drbd_vmfs: Promoted 1 instances of a possible 1 to master
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> LogActions:Leave   ClusterIP (Started fx201-1a.ams)
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> LogActions:Leave   drbd_vmfs:0 (Master fx201-1a.ams)
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> LogActions:Leave   drbd_vmfs:1 (Stopped)
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> LogActions:Leave   vmfsFS (Started fx201-1a.ams)
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:     info:
>> LogActions:Leave   nfs-server (Started fx201-1a.ams)
>> Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net    pengine:   notice:
>> process_pe_message:Calculated Transition 16:
>> /var/lib/pacemaker/pengine/pe-input-61.bz2
>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net       crmd:     info:
>> do_state_transition:State transition S_POLICY_ENGINE ->
>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>> origin=handle_response ]
>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net       crmd:     info:
>> do_te_invoke:Processing graph 16 (ref=pe_calc-dc-1442530090-97) derived
>> from /var/lib/pacemaker/pengine/pe-input-61.bz2
>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net       crmd:   notice:
>> run_graph:Transition 16 (Complete=0, Pending=0, Fired=0, Skipped=0,
>> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-61.bz2): Complete
>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net       crmd:     info:
>> do_log:FSA: Input I_TE_SUCCESS from notify_crmd() received in state
>> S_TRANSITION_ENGINE
>> Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net       crmd:   notice:
>> do_state_transition:State transition S_TRANSITION_ENGINE -> S_IDLE [
>> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>>
>> Thank you all for your help,
>>
>> Jason
>>
>> "This message and any attachments may contain confidential information. If you
>> have received this  message in error, any use or distribution is prohibited.
>> Please notify us by reply e-mail if you have mistakenly received this message,
>> and immediately and permanently delete it and any attachments. Thank you."
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> "This message and any attachments may contain confidential information. If you
> have received this  message in error, any use or distribution is prohibited.
> Please notify us by reply e-mail if you have mistakenly received this message,
> and immediately and permanently delete it and any attachments. Thank you."
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150918/a08272fc/attachment.htm>