[ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary node to Slave (always Stopped)

Mon Sep 21 15:23:59 UTC 2015

Yeah, I had problems, which I am thinking might be firewall related.  In a
previous place of employment I had ipmi working great (but with
heartbeat), so I do have some experience with IPMI STONITH.

Example:

[root at fx201-1a ~]# fence_ipmilan -a 10.XX.XX.XX -l root -p calvin -o status
Failed: Unable to obtain correct plug status or plug is not available

(Don't worry, default dell pw left in for illustrative purposes only!)

When I tried to watch it with tcpdump, it seems to use random ports, so I
couldn't really advise the firewall team on how to make this work.  SSH
into the drac works fine, and IPMI over IP is enabled.  If anyone has
ideas on this, they would be greatly appreciated.

Thank you again,

Jason

On 9/21/15, 10:12 AM, "Digimer" <lists at alteeve.ca> wrote:

>IPMI fencing is a very common type, and it shouldn't be so hard to get
>working. Easiest is to test it out first on the command line, outside
>pacemaker. Run:
>
>fence_ipmilan -a <ipmi_ip> -l <ipmi_user> -p <password> -o status
>
>If that doesn't work, you may need to use lanplus or similar. See 'man
>fence_ipmilan'. Once you can use the above command to query the power
>status of the nodes, you're 95% of the way there.
>
>Fencing can not be put on the back burner, as you've now seen. Without
>it, things can and will go very wrong.
>
>On 21/09/15 09:34 AM, Jason Gress wrote:
>> Thank you for comment.  I attempted to use iDRAC/IPMI STONITH, and after
>> spending over a day, I had to put it on the backburner for timeline
>> reasons.  For whatever reason, I could not get IPMI to talk, and the
>> iDRAC5 plugin was not working either for reasons I don't understand.
>> 
>> Is that what you had in mind, or is there another method/configuration
>>for
>> fencing DRBD?
>> 
>> Thank you for your advice,
>> 
>> Jason
>> 
>> On 9/20/15, 9:40 PM, "Digimer" <lists at alteeve.ca> wrote:
>> 
>>> On 20/09/15 09:18 PM, Jason Gress wrote:
>>>> I had seemed to cause a split brain attempting to repair this.  But
>>>>that
>>>
>>> Use fencing! Voila, no more split-brains.
>>>
>>>> wasn't the issue.  You can't have any colocation requirements for DRBD
>>>> resources; that's what killed me.   This line did it:
>>>>
>>>>  ms_drbd_vmfs with ClusterIP (score:INFINITY)
>>>> (id:colocation-ms_drbd_vmfs-ClusterIP-INFINITY)
>>>>
>>>> Do NOT do this!
>>>>
>>>> Jason
>>>>
>>>> From: Jason Gress <jgress at accertify.com <mailto:jgress at accertify.com>>
>>>> Reply-To: Cluster Labs - All topics related to open-source clustering
>>>> welcomed <users at clusterlabs.org <mailto:users at clusterlabs.org>>
>>>> Date: Friday, September 18, 2015 at 3:03 PM
>>>> To: Cluster Labs - All topics related to open-source clustering
>>>>welcomed
>>>> <users at clusterlabs.org <mailto:users at clusterlabs.org>>
>>>> Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary
>>>> node to Slave (always Stopped)
>>>>
>>>> Well, it almost worked.  I was able to modify the existing cluster per
>>>> your command, and it worked great.
>>>>
>>>> Today, I made two more clusters via the exact same process (I
>>>> used/modified my notes as I was building and fixing the first one
>>>> yesterday) and now it's doing the same thing, despite having your
>>>> improved master slave rule.  Here's the config:
>>>>
>>>> [root at fx201-1a ~]# pcs config --full
>>>> Cluster Name: fx201-vmcl
>>>> Corosync Nodes:
>>>>  fx201-1a.zwo fx201-1b.zwo
>>>> Pacemaker Nodes:
>>>>  fx201-1a.zwo fx201-1b.zwo
>>>>
>>>> Resources:
>>>>  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>>>   Attributes: ip=10.XX.XX.XX cidr_netmask=24
>>>>   Operations: start interval=0s timeout=20s
>>>> (ClusterIP-start-timeout-20s)
>>>>               stop interval=0s timeout=20s
>>>>(ClusterIP-stop-timeout-20s)
>>>>               monitor interval=15s (ClusterIP-monitor-interval-15s)
>>>>  Master: ms_drbd_vmfs
>>>>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
>>>> clone-node-max=1 notify=true
>>>>   Resource: drbd_vmfs (class=ocf provider=linbit type=drbd)
>>>>    Attributes: drbd_resource=vmfs
>>>>    Operations: start interval=0s timeout=240
>>>> (drbd_vmfs-start-timeout-240)
>>>>                promote interval=0s timeout=90
>>>> (drbd_vmfs-promote-timeout-90)
>>>>                demote interval=0s timeout=90
>>>> (drbd_vmfs-demote-timeout-90)
>>>>                stop interval=0s timeout=100
>>>>(drbd_vmfs-stop-timeout-100)
>>>>                monitor interval=29s role=Master
>>>> (drbd_vmfs-monitor-interval-29s-role-Master)
>>>>                monitor interval=31s role=Slave
>>>> (drbd_vmfs-monitor-interval-31s-role-Slave)
>>>>  Resource: vmfsFS (class=ocf provider=heartbeat type=Filesystem)
>>>>   Attributes: device=/dev/drbd0 directory=/exports/vmfs fstype=xfs
>>>>   Operations: start interval=0s timeout=60 (vmfsFS-start-timeout-60)
>>>>               stop interval=0s timeout=60 (vmfsFS-stop-timeout-60)
>>>>               monitor interval=20 timeout=40
>>>> (vmfsFS-monitor-interval-20)
>>>>  Resource: nfs-server (class=systemd type=nfs-server)
>>>>   Operations: monitor interval=60s (nfs-server-monitor-interval-60s)
>>>>
>>>> Stonith Devices:
>>>> Fencing Levels:
>>>>
>>>> Location Constraints:
>>>> Ordering Constraints:
>>>>   promote ms_drbd_vmfs then start vmfsFS (kind:Mandatory)
>>>> (id:order-ms_drbd_vmfs-vmfsFS-mandatory)
>>>>   start vmfsFS then start nfs-server (kind:Mandatory)
>>>> (id:order-vmfsFS-nfs-server-mandatory)
>>>>   start ClusterIP then start nfs-server (kind:Mandatory)
>>>> (id:order-ClusterIP-nfs-server-mandatory)
>>>> Colocation Constraints:
>>>>   ms_drbd_vmfs with ClusterIP (score:INFINITY)
>>>> (id:colocation-ms_drbd_vmfs-ClusterIP-INFINITY)
>>>>   vmfsFS with ms_drbd_vmfs (score:INFINITY) (with-rsc-role:Master)
>>>> (id:colocation-vmfsFS-ms_drbd_vmfs-INFINITY)
>>>>   nfs-server with vmfsFS (score:INFINITY)
>>>> (id:colocation-nfs-server-vmfsFS-INFINITY)
>>>>   nfs-server with ClusterIP (score:INFINITY)
>>>> (id:colocation-nfs-server-ClusterIP-INFINITY)
>>>>
>>>> Cluster Properties:
>>>>  cluster-infrastructure: corosync
>>>>  cluster-name: fx201-vmcl
>>>>  dc-version: 1.1.13-a14efad
>>>>  have-watchdog: false
>>>>  stonith-enabled: false
>>>>
>>>> [root at fx201-1a ~]# pcs status --full
>>>> Cluster name: fx201-vmcl
>>>> Last updated: Fri Sep 18 15:02:16 2015Last change: Fri Sep 18 14:44:33
>>>> 2015 by root via crm_attribute on fx201-1b.zwo
>>>> Stack: corosync
>>>> Current DC: fx201-1a.zwo (1) (version 1.1.13-a14efad) - partition with
>>>> quorum
>>>> 2 nodes and 5 resources configured
>>>>
>>>> Online: [ fx201-1a.zwo (1) fx201-1b.zwo (2) ]
>>>>
>>>> Full list of resources:
>>>>
>>>>  ClusterIP(ocf::heartbeat:IPaddr2):Started fx201-1a.zwo
>>>>  Master/Slave Set: ms_drbd_vmfs [drbd_vmfs]
>>>>      drbd_vmfs(ocf::linbit:drbd):Master fx201-1a.zwo
>>>>      drbd_vmfs(ocf::linbit:drbd):Stopped
>>>>      Masters: [ fx201-1a.zwo ]
>>>>      Stopped: [ fx201-1b.zwo ]
>>>>  vmfsFS(ocf::heartbeat:Filesystem):Started fx201-1a.zwo
>>>>  nfs-server(systemd:nfs-server):Started fx201-1a.zwo
>>>>
>>>> PCSD Status:
>>>>   fx201-1a.zwo: Online
>>>>   fx201-1b.zwo: Online
>>>>
>>>> Daemon Status:
>>>>   corosync: active/enabled
>>>>   pacemaker: active/enabled
>>>>   pcsd: active/enabled
>>>>
>>>> This is so strange... The master/slave rule fixed my other two
>>>>clusters,
>>>> but not this one.
>>>>
>>>> Thank you all for your advice,
>>>>
>>>> Jason
>>>>
>>>> From: Jason Gress <jgress at accertify.com <mailto:jgress at accertify.com>>
>>>> Reply-To: Cluster Labs - All topics related to open-source clustering
>>>> welcomed <users at clusterlabs.org <mailto:users at clusterlabs.org>>
>>>> Date: Thursday, September 17, 2015 at 7:25 PM
>>>> To: Cluster Labs - All topics related to open-source clustering
>>>>welcomed
>>>> <users at clusterlabs.org <mailto:users at clusterlabs.org>>
>>>> Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary
>>>> node to Slave (always Stopped)
>>>>
>>>> That was Exactly what I needed.  Thank you so much!
>>>>
>>>> Jason
>>>>
>>>> From: Luke Pascoe <luke at osnz.co.nz <mailto:luke at osnz.co.nz>>
>>>> Reply-To: Cluster Labs - All topics related to open-source clustering
>>>> welcomed <users at clusterlabs.org <mailto:users at clusterlabs.org>>
>>>> Date: Thursday, September 17, 2015 at 7:08 PM
>>>> To: Cluster Labs - All topics related to open-source clustering
>>>>welcomed
>>>> <users at clusterlabs.org <mailto:users at clusterlabs.org>>
>>>> Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting secondary
>>>> node to Slave (always Stopped)
>>>>
>>>> pcs resource create drbd_iscsivg0 ocf:linbit:drbd
>>>>drbd_resource=iscsivg0
>>>> op monitor interval="29s" role="Master" op monitor interval="31s"
>>>> role="Slave"
>>>>
>>>> Luke Pascoe
>>>>
>>>>
>>>> *
>>>> *
>>>>
>>>> *E* luke at osnz.co.nz <mailto:luke at osnz.co.nz>
>>>> *P* +64 (9) 296 2961
>>>> *M* +64 (27) 426 6649
>>>> *W* www.osnz.co.nz <http://www.osnz.co.nz/>
>>>>
>>>> 24 Wellington St
>>>> Papakura
>>>> Auckland, 2110
>>>> New Zealand
>>>>
>>>>
>>>> On 18 September 2015 at 12:02, Jason Gress <jgress at accertify.com
>>>> <mailto:jgress at accertify.com>> wrote:
>>>>
>>>>     That may very well be it.  Would you be so kind as to show me the
>>>>     pcs command to create that config?  I generated my configuration
>>>>     with these commands, and I'm not sure how to get the additional
>>>>     monitor options in there:
>>>>
>>>>     pcs resource create drbd_vmfs ocf:linbit:drbd drbd_resource=vmfs
>>>>op
>>>>     monitor interval=30s
>>>>     pcs resource master ms_drbd_vmfs drbd_vmfs master-max=1
>>>>     master-node-max=1 clone-max=2 clone-node-max=1 notify=true
>>>>
>>>>     Thank you very much for your help, and sorry for the newbie
>>>> question!
>>>>
>>>>     Jason
>>>>
>>>>     From: Luke Pascoe <luke at osnz.co.nz <mailto:luke at osnz.co.nz>>
>>>>     Reply-To: Cluster Labs - All topics related to open-source
>>>>     clustering welcomed <users at clusterlabs.org
>>>>     <mailto:users at clusterlabs.org>>
>>>>     Date: Thursday, September 17, 2015 at 6:54 PM
>>>>
>>>>     To: Cluster Labs - All topics related to open-source clustering
>>>>     welcomed <users at clusterlabs.org <mailto:users at clusterlabs.org>>
>>>>     Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting
>>>>     secondary node to Slave (always Stopped)
>>>>
>>>>     The only difference in the DRBD resource between yours and mine
>>>>that
>>>>     I can see is the monitoring parameters (mine works nicely, but is
>>>>     Centos 6). Here's mine:
>>>>
>>>>     Master: ms_drbd_iscsicg0
>>>>       Meta Attrs: master-max=1 master-node-max=1 clone-max=2
>>>>     clone-node-max=1 notify=true
>>>>       Resource: drbd_iscsivg0 (class=ocf provider=linbit type=drbd)
>>>>        Attributes: drbd_resource=iscsivg0
>>>>        Operations: start interval=0s timeout=240
>>>>     (drbd_iscsivg0-start-timeout-240)
>>>>                    promote interval=0s timeout=90
>>>>     (drbd_iscsivg0-promote-timeout-90)
>>>>                    demote interval=0s timeout=90
>>>>     (drbd_iscsivg0-demote-timeout-90)
>>>>                    stop interval=0s timeout=100
>>>>     (drbd_iscsivg0-stop-timeout-100)
>>>>                    monitor interval=29s role=Master
>>>>     (drbd_iscsivg0-monitor-interval-29s-role-Master)
>>>>                    monitor interval=31s role=Slave
>>>>     (drbd_iscsivg0-monitor-interval-31s-role-Slave)
>>>>
>>>>     What mechanism are you using to fail over? Check your constraints
>>>>     after you do it and make sure it hasn't added one which stops the
>>>>     slave clone from starting on the "failed" node.
>>>>
>>>>
>>>>     Luke Pascoe
>>>>
>>>>
>>>>     *
>>>>     *
>>>>
>>>>     *E* luke at osnz.co.nz <mailto:luke at osnz.co.nz>
>>>>     *P* +64 (9) 296 2961 <tel:%2B64%20%289%29%20296%202961>
>>>>     *M* +64 (27) 426 6649
>>>>     *W* www.osnz.co.nz <http://www.osnz.co.nz/>
>>>>
>>>>     24 Wellington St
>>>>     Papakura
>>>>     Auckland, 2110
>>>>     New Zealand
>>>>
>>>>
>>>>     On 18 September 2015 at 11:40, Jason Gress <jgress at accertify.com
>>>>     <mailto:jgress at accertify.com>> wrote:
>>>>
>>>>         Looking more closely, according to page 64
>>>>         (http://clusterlabs.org/doc/Cluster_from_Scratch.pdf) it does
>>>>         indeed appear that 1 is the correct number.  (I just realized
>>>>         that it's page 64 of the "book", but page 76 of the pdf.)
>>>>
>>>>         Thank you again,
>>>>
>>>>         Jason
>>>>
>>>>         From: Jason Gress <jgress at accertify.com
>>>>         <mailto:jgress at accertify.com>>
>>>>         Reply-To: Cluster Labs - All topics related to open-source
>>>>         clustering welcomed <users at clusterlabs.org
>>>>         <mailto:users at clusterlabs.org>>
>>>>         Date: Thursday, September 17, 2015 at 6:36 PM
>>>>         To: Cluster Labs - All topics related to open-source
>>>>clustering
>>>>         welcomed <users at clusterlabs.org
>>>><mailto:users at clusterlabs.org>>
>>>>         Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting
>>>>         secondary node to Slave (always Stopped)
>>>>
>>>>         I can't say whether or not you are right or wrong (you may be
>>>>         right!) but I followed the Cluster From Scratch tutorial
>>>>         closely, and it only had a clone-node-max=1 there.  (Page 106
>>>>of
>>>>         the pdf, for the curious.)
>>>>
>>>>         Thanks,
>>>>
>>>>         Jason
>>>>
>>>>         From: Luke Pascoe <luke at osnz.co.nz <mailto:luke at osnz.co.nz>>
>>>>         Reply-To: Cluster Labs - All topics related to open-source
>>>>         clustering welcomed <users at clusterlabs.org
>>>>         <mailto:users at clusterlabs.org>>
>>>>         Date: Thursday, September 17, 2015 at 6:29 PM
>>>>         To: Cluster Labs - All topics related to open-source
>>>>clustering
>>>>         welcomed <users at clusterlabs.org
>>>><mailto:users at clusterlabs.org>>
>>>>         Subject: Re: [ClusterLabs] Pacemaker/pcs & DRBD not demoting
>>>>         secondary node to Slave (always Stopped)
>>>>
>>>>         I may be wrong, but shouldn't "clone-node-max" be 2 on
>>>>         the ms_drbd_vmfs resource?
>>>>
>>>>         Luke Pascoe
>>>>
>>>>
>>>>         *
>>>>         *
>>>>
>>>>         *E* luke at osnz.co.nz <mailto:luke at osnz.co.nz>
>>>>         *P* +64 (9) 296 2961 <tel:%2B64%20%289%29%20296%202961>
>>>>         *M* +64 (27) 426 6649
>>>>         *W* www.osnz.co.nz <http://www.osnz.co.nz/>
>>>>
>>>>         24 Wellington St
>>>>         Papakura
>>>>         Auckland, 2110
>>>>         New Zealand
>>>>
>>>>
>>>>         On 18 September 2015 at 11:02, Jason Gress
>>>><jgress at accertify.com
>>>>         <mailto:jgress at accertify.com>> wrote:
>>>>
>>>>             I have a simple DRBD + filesystem + NFS configuration that
>>>>             works properly when I manually start/stop DRBD, but will
>>>>not
>>>>             start the DRBD slave resource properly on failover or
>>>>             recovery.  I cannot ever get the Master/Slave set to say
>>>>             anything but 'Stopped'.  I am running ?CentOS 7.1 with the
>>>>             latest packages as of today:
>>>>
>>>>             [root at fx201-1a log]# rpm -qa | grep -e pcs -e pacemaker -e
>>>> drbd
>>>>             pacemaker-cluster-libs-1.1.12-22.el7_1.4.x86_64
>>>>             pacemaker-1.1.12-22.el7_1.4.x86_64
>>>>             pcs-0.9.137-13.el7_1.4.x86_64
>>>>             pacemaker-libs-1.1.12-22.el7_1.4.x86_64
>>>>             drbd84-utils-8.9.3-1.1.el7.elrepo.x86_64
>>>>             pacemaker-cli-1.1.12-22.el7_1.4.x86_64
>>>>             kmod-drbd84-8.4.6-1.el7.elrepo.x86_64
>>>>
>>>>             Here is my pcs config output:
>>>>
>>>>             [root at fx201-1a log]# pcs config
>>>>             Cluster Name: fx201-vmcl
>>>>             Corosync Nodes:
>>>>              fx201-1a.ams fx201-1b.ams
>>>>             Pacemaker Nodes:
>>>>              fx201-1a.ams fx201-1b.ams
>>>>
>>>>             Resources:
>>>>              Resource: ClusterIP (class=ocf provider=heartbeat
>>>> type=IPaddr2)
>>>>               Attributes: ip=10.XX.XX.XX cidr_netmask=24
>>>>               Operations: start interval=0s timeout=20s
>>>>             (ClusterIP-start-timeout-20s)
>>>>                           stop interval=0s timeout=20s
>>>>             (ClusterIP-stop-timeout-20s)
>>>>                           monitor interval=15s
>>>>             (ClusterIP-monitor-interval-15s)
>>>>              Master: ms_drbd_vmfs
>>>>               Meta Attrs: master-max=1 master-node-max=1 clone-max=2
>>>>             clone-node-max=1 notify=true
>>>>               Resource: drbd_vmfs (class=ocf provider=linbit
>>>>type=drbd)
>>>>                Attributes: drbd_resource=vmfs
>>>>                Operations: start interval=0s timeout=240
>>>>             (drbd_vmfs-start-timeout-240)
>>>>                            promote interval=0s timeout=90
>>>>             (drbd_vmfs-promote-timeout-90)
>>>>                            demote interval=0s timeout=90
>>>>             (drbd_vmfs-demote-timeout-90)
>>>>                            stop interval=0s timeout=100
>>>>             (drbd_vmfs-stop-timeout-100)
>>>>                            monitor interval=30s
>>>>             (drbd_vmfs-monitor-interval-30s)
>>>>              Resource: vmfsFS (class=ocf provider=heartbeat
>>>> type=Filesystem)
>>>>               Attributes: device=/dev/drbd0 directory=/exports/vmfs
>>>>             fstype=xfs
>>>>               Operations: start interval=0s timeout=60
>>>>             (vmfsFS-start-timeout-60)
>>>>                           stop interval=0s timeout=60
>>>>             (vmfsFS-stop-timeout-60)
>>>>                           monitor interval=20 timeout=40
>>>>             (vmfsFS-monitor-interval-20)
>>>>              Resource: nfs-server (class=systemd type=nfs-server)
>>>>               Operations: monitor interval=60s
>>>>             (nfs-server-monitor-interval-60s)
>>>>
>>>>             Stonith Devices:
>>>>             Fencing Levels:
>>>>
>>>>             Location Constraints:
>>>>             Ordering Constraints:
>>>>               promote ms_drbd_vmfs then start vmfsFS (kind:Mandatory)
>>>>             (id:order-ms_drbd_vmfs-vmfsFS-mandatory)
>>>>               start vmfsFS then start nfs-server (kind:Mandatory)
>>>>             (id:order-vmfsFS-nfs-server-mandatory)
>>>>               start ClusterIP then start nfs-server (kind:Mandatory)
>>>>             (id:order-ClusterIP-nfs-server-mandatory)
>>>>             Colocation Constraints:
>>>>               ms_drbd_vmfs with ClusterIP (score:INFINITY)
>>>>             (id:colocation-ms_drbd_vmfs-ClusterIP-INFINITY)
>>>>               vmfsFS with ms_drbd_vmfs (score:INFINITY)
>>>>             (with-rsc-role:Master)
>>>>             (id:colocation-vmfsFS-ms_drbd_vmfs-INFINITY)
>>>>               nfs-server with vmfsFS (score:INFINITY)
>>>>             (id:colocation-nfs-server-vmfsFS-INFINITY)
>>>>
>>>>             Cluster Properties:
>>>>              cluster-infrastructure: corosync
>>>>              cluster-name: fx201-vmcl
>>>>              dc-version: 1.1.13-a14efad
>>>>              have-watchdog: false
>>>>              last-lrm-refresh: 1442528181
>>>>              stonith-enabled: false
>>>>
>>>>             And status:
>>>>
>>>>             [root at fx201-1a log]# pcs status --full
>>>>             Cluster name: fx201-vmcl
>>>>             Last updated: Thu Sep 17 17:55:56 2015Last change: Thu Sep
>>>>             17 17:18:10 2015 by root via crm_attribute on fx201-1b.ams
>>>>             Stack: corosync
>>>>             Current DC: fx201-1b.ams (2) (version 1.1.13-a14efad) -
>>>>             partition with quorum
>>>>             2 nodes and 5 resources configured
>>>>
>>>>             Online: [ fx201-1a.ams (1) fx201-1b.ams (2) ]
>>>>
>>>>             Full list of resources:
>>>>
>>>>              ClusterIP(ocf::heartbeat:IPaddr2):Started fx201-1a.ams
>>>>              Master/Slave Set: ms_drbd_vmfs [drbd_vmfs]
>>>>                  drbd_vmfs(ocf::linbit:drbd):Master fx201-1a.ams
>>>>                  drbd_vmfs(ocf::linbit:drbd):Stopped
>>>>                  Masters: [ fx201-1a.ams ]
>>>>                  Stopped: [ fx201-1b.ams ]
>>>>              vmfsFS(ocf::heartbeat:Filesystem):Started fx201-1a.ams
>>>>              nfs-server(systemd:nfs-server):Started fx201-1a.ams
>>>>
>>>>             PCSD Status:
>>>>               fx201-1a.ams: Online
>>>>               fx201-1b.ams: Online
>>>>
>>>>             Daemon Status:
>>>>               corosync: active/enabled
>>>>               pacemaker: active/enabled
>>>>               pcsd: active/enabled
>>>>
>>>>             If I do a failover, after manually confirming that the
>>>>DRBD
>>>>             data is synchronized completely, it does work, but then
>>>>             never reconnects the secondary side, and in order to get
>>>>the
>>>>             resource synchronized again, I have to manually correct
>>>>it,
>>>>             ad infinitum.  I have tried standby/unstandby, pcs
>>>>resource
>>>>             debug-start (with undesirable results), and so on.
>>>>
>>>>             Here are some relevant log messages from pacemaker.log:
>>>>
>>>>             Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>       crmd:     info:
>>>>             crm_timer_popped:PEngine Recheck Timer (I_PE_CALC) just
>>>>             popped (900000ms)
>>>>             Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>       crmd:   notice:
>>>>             do_state_transition:State transition S_IDLE ->
>>>>             S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
>>>>             origin=crm_timer_popped ]
>>>>             Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>       crmd:     info:
>>>>             do_state_transition:Progressed to state S_POLICY_ENGINE
>>>>             after C_TIMER_POPPED
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             process_pe_message:Input has not changed since last time,
>>>>             not saving to disk
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             determine_online_status:Node fx201-1b.ams is online
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             determine_online_status:Node fx201-1a.ams is online
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             determine_op_status:Operation monitor found resource
>>>>             drbd_vmfs:0 active in master mode on fx201-1b.ams
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             determine_op_status:Operation monitor found resource
>>>>             drbd_vmfs:0 active on fx201-1a.ams
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             native_print:ClusterIP(ocf::heartbeat:IPaddr2):Started
>>>>             fx201-1a.ams
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             clone_print:Master/Slave Set: ms_drbd_vmfs [drbd_vmfs]
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             short_print:    Masters: [ fx201-1a.ams ]
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             short_print:    Stopped: [ fx201-1b.ams ]
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             native_print:vmfsFS(ocf::heartbeat:Filesystem):Started
>>>>             fx201-1a.ams
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             native_print:nfs-server(systemd:nfs-server):Started
>>>> fx201-1a.ams
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             native_color:Resource drbd_vmfs:1 cannot run anywhere
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             master_color:Promoting drbd_vmfs:0 (Master fx201-1a.ams)
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             master_color:ms_drbd_vmfs: Promoted 1 instances of a
>>>>             possible 1 to master
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             LogActions:Leave   ClusterIP(Started fx201-1a.ams)
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             LogActions:Leave   drbd_vmfs:0(Master fx201-1a.ams)
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             LogActions:Leave   drbd_vmfs:1(Stopped)
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             LogActions:Leave   vmfsFS(Started fx201-1a.ams)
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:     info:
>>>>             LogActions:Leave   nfs-server(Started fx201-1a.ams)
>>>>             Sep 17 17:48:10 [5662] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>    pengine:   notice:
>>>>             process_pe_message:Calculated Transition 16:
>>>>             /var/lib/pacemaker/pengine/pe-input-61.bz2
>>>>             Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>       crmd:     info:
>>>>             do_state_transition:State transition S_POLICY_ENGINE ->
>>>>             S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
>>>>cause=C_IPC_MESSAGE
>>>>             origin=handle_response ]
>>>>             Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>       crmd:     info:
>>>>             do_te_invoke:Processing graph 16
>>>>             (ref=pe_calc-dc-1442530090-97) derived from
>>>>             /var/lib/pacemaker/pengine/pe-input-61.bz2
>>>>             Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>       crmd:   notice:
>>>>             run_graph:Transition 16 (Complete=0, Pending=0, Fired=0,
>>>>             Skipped=0, Incomplete=0,
>>>>             Source=/var/lib/pacemaker/pengine/pe-input-61.bz2):
>>>>Complete
>>>>             Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>       crmd:     info:
>>>>             do_log:FSA: Input I_TE_SUCCESS from notify_crmd() received
>>>>             in state S_TRANSITION_ENGINE
>>>>             Sep 17 17:48:10 [13954] fx201-1b.ams.accertify.net
>>>>             <http://fx201-1b.ams.accertify.net>       crmd:   notice:
>>>>             do_state_transition:State transition S_TRANSITION_ENGINE
>>>>->
>>>>             S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
>>>>             origin=notify_crmd ]
>>>>
>>>>             Thank you all for your help,
>>>>
>>>>             Jason
>>>>
>>>>             "This message and any attachments may contain confidential
>>>> information. If you
>>>>             have received this  message in error, any use or
>>>> distribution is prohibited.
>>>>             Please notify us by reply e-mail if you have mistakenly
>>>> received this message,
>>>>             and immediately and permanently delete it and any
>>>> attachments. Thank you."
>>>>
>>>>
>>>>             _______________________________________________
>>>>             Users mailing list: Users at clusterlabs.org
>>>>             <mailto:Users at clusterlabs.org>
>>>>             http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>             Project Home: http://www.clusterlabs.org
>>>>             Getting started:
>>>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>             Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>         "This message and any attachments may contain confidential
>>>> information. If you
>>>>         have received this  message in error, any use or distribution
>>>> is prohibited.
>>>>         Please notify us by reply e-mail if you have mistakenly
>>>> received this message,
>>>>         and immediately and permanently delete it and any attachments.
>>>> Thank you."
>>>>
>>>>         "This message and any attachments may contain confidential
>>>> information. If you
>>>>         have received this  message in error, any use or distribution
>>>> is prohibited.
>>>>         Please notify us by reply e-mail if you have mistakenly
>>>> received this message,
>>>>         and immediately and permanently delete it and any attachments.
>>>> Thank you."
>>>>
>>>>
>>>>         _______________________________________________
>>>>         Users mailing list: Users at clusterlabs.org
>>>>         <mailto:Users at clusterlabs.org>
>>>>         http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>         Project Home: http://www.clusterlabs.org
>>>>         Getting started:
>>>>         http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>         Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>     "This message and any attachments may contain confidential
>>>> information. If you
>>>>     have received this  message in error, any use or distribution is
>>>> prohibited. 
>>>>     Please notify us by reply e-mail if you have mistakenly received
>>>> this message,
>>>>     and immediately and permanently delete it and any attachments.
>>>> Thank you."
>>>>
>>>>
>>>>     _______________________________________________
>>>>     Users mailing list: Users at clusterlabs.org
>>>> <mailto:Users at clusterlabs.org>
>>>>     http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>     Project Home: http://www.clusterlabs.org
>>>>     Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>     Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>> "This message and any attachments may contain confidential
>>>>information.
>>>> If you
>>>> have received this  message in error, any use or distribution is
>>>> prohibited. 
>>>> Please notify us by reply e-mail if you have mistakenly received this
>>>> message,
>>>> and immediately and permanently delete it and any attachments. Thank
>>>> you."
>>>>
>>>> "This message and any attachments may contain confidential
>>>>information.
>>>> If you
>>>> have received this  message in error, any use or distribution is
>>>> prohibited. 
>>>> Please notify us by reply e-mail if you have mistakenly received this
>>>> message,
>>>> and immediately and permanently delete it and any attachments. Thank
>>>> you."
>>>>
>>>>
>>>> "This message and any attachments may contain confidential
>>>>information.
>>>> If you
>>>> have received this  message in error, any use or distribution is
>>>> prohibited. 
>>>> Please notify us by reply e-mail if you have mistakenly received this
>>>> message,
>>>> and immediately and permanently delete it and any attachments. Thank
>>>> you."
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> -- 
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: 
>>>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> 
>> 
>> "This message and any attachments may contain confidential information. 
>>If you
>> have received this  message in error, any use or distribution is 
>>prohibited. 
>> Please notify us by reply e-mail if you have mistakenly received this 
>>message,
>> and immediately and permanently delete it and any attachments. Thank 
>>you."
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
>
>
>-- 
>Digimer
>Papers and Projects: https://alteeve.ca/w/
>What if the cure for cancer is trapped in the mind of a person without
>access to education?
>
>_______________________________________________
>Users mailing list: Users at clusterlabs.org
>http://clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

"This message and any attachments may contain confidential information. If you
have received this  message in error, any use or distribution is prohibited. 
Please notify us by reply e-mail if you have mistakenly received this message,
and immediately and permanently delete it and any attachments. Thank you."