[ClusterLabs] Can't do anything right; how do I start over?

Sat Oct 15 02:56:14 EDT 2016

Greetings,

Heh.  Well, the comment in corosync.conf makes sense to me now.
Thanks, I've fixed that.

Here's my corosync.conf
----------------------------------------
totem {
    version: 2

    crypto_cipher: none
    crypto_hash: none

    interface {
        ringnumber: 0
        bindnetaddr: 10.1.0.0
        mcastaddr: 239.255.1.1
        mcastport: 5405
        ttl: 1
    }
    cluster_name: pecan
}

logging {
    fileline: off
    to_stderr: no
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
    debug: off
    timestamp: on
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
    wait_for_all: 1
}
service {
    name: pacemaker
    ver: 1
}
nodelist {
  node {
        ring0_addr: smoking
        nodeid: 1
       }
  node {
        ring0_addr: mars
        nodeid: 2
       }
}
----------------------------------------

And a few things are behaving better than they did before.

At the moment my goal is to set up a partition as drbd.
In the interest of bandwidth I will show the commands that
I use and the result I finally get.

----------------------------------------
pcs cluster auth smoking mars
pcs property set stonith-enabled=true
stonith_admin --metadata --agent fence_pcmk
cibadmin -C -o resources --xml-file stonith.xml
pcs resource create floating_ip IPaddr2 ip=10.1.2.101 cidr_netmask=32
pcs resource  defaults resource-stickiness=100
----------------------------------------

And at this point, all appears well.  My pcs status output looks like
I think it should.

Now, of course, I admit that setting up the floating_ip is
not relevant to my goal of a drbd backed filesystem, but I've been
doing it as a sanity check.

On to drbd
----------------------------------------
modprobe drbd
systemctl start drbd.service
[root at smoking cluster]#  cat /proc/drbd
version: 8.4.8-1 (api:1/proto:86-101)
GIT-hash: 22b4c802192646e433d3f7399d578ec7fecc6272 build by mockbuild@,
2016-10-
13 19:58:26
 0: cs:Connected ro:Secondary/Secondary ds:Diskless/Diskless C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:10574 dw:10574 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:0
 2: cs:Connected ro:Secondary/Secondary ds:Diskless/Diskless C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
----------------------------------------
Again, this is stuff that hung around from the previous incarnation.
But it looks okay to me.  I'm planning to use the '1' device.
The above is run on the secondary machine, so Secondary/Primary is
correct.  And UpToDate/UpToDate looks right to me.

Now it goes south.  The mkfs.xfs appears to work, but that's not
relevant anyway, right?
----------------------------------------
pcs  resource create BravoSpace \
  ocf:linbit:drbd drbd_resource=bravo \
  op monitor interval=60s

[root at smoking ~]# pcs status
Cluster name: pecan
Last updated: Sat Oct 15 01:33:37 2016        Last change: Sat Oct 15
01:18:56
 2016 by root via cibadmin on mars
Stack: corosync
Current DC: mars (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 3 resources configured

Node mars: UNCLEAN (online)
Node smoking: UNCLEAN (online)

Full list of resources:

 Fencing    (stonith:fence_pcmk):    Started mars
 floating_ip    (ocf::heartbeat:IPaddr2):    Started mars
 BravoSpace    (ocf::linbit:drbd):    FAILED[ smoking mars ]

Failed Actions:
* BravoSpace_stop_0 on smoking 'not configured' (6): call=18,
status=complete, e
xitreason='none',
    last-rc-change='Sat Oct 15 01:18:56 2016', queued=0ms, exec=63ms
* BravoSpace_stop_0 on mars 'not configured' (6): call=18, status=complete,
exit
reason='none',
    last-rc-change='Sat Oct 15 01:18:56 2016', queued=0ms, exec=60ms

PCSD Status:
  smoking: Online
  mars: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled
----------------------------------------
I've looked in /var/log/cluster/corosync.log and it doesn't seem
happy but I don't know what I'm looking at.  On the primary
machine it's 1800+ lines on the secondary it's 600+ lines.
There are 337 lines just with BravoSpace in them.
One of them says
drbd(BravoSpace)[3295]:    2016/10/15_01:18:56 ERROR: meta parameter
misconfigured,
 expected clone-max -le 2, but found unset.
But I tried adding clone-max=2 but the command barfed-- that's not a legal
parameter.

So, what's wrong?  (I'm a newbie, of course.)

I did a pcs resource cleanup .  That shut down fencing and the IP.
I tried pcs cluster start to get them back, no help.
I did pcs cluster standby smoking, and then unstandby smoking.
The ip started, but fencing has failed on BOTH machines.
I can't see what I'm doing wrong.

Thanks.  I realize I'm consuming your time on the cheap.

On Fri, Oct 14, 2016 at 3:33 PM, Dimitri Maziuk <dmaziuk at bmrb.wisc.edu>
wrote:

> On 10/14/2016 02:48 PM, Jay Scott wrote:
>
> > When I "start over" I stop all the services, delete the packages,
> > empty the configs and logs as best I know how.  But this doesn't
> > completely clear everything:  the drbd metadata is evidently still
> > on the partitions I've set aside for it.
>
> If it's small enough, dd if=/dev/zero of=/your/partition
>
> Get DRBD working and fully sync'ed outside of the cluster before you
> start adding it.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20161015/94886331/attachment-0003.html>