[ClusterLabs] Can't do anything right; how do I start over?
Jay Scott
bigcrater at gmail.com
Sat Oct 15 08:56:14 CEST 2016
Greetings,
Heh. Well, the comment in corosync.conf makes sense to me now.
Thanks, I've fixed that.
Here's my corosync.conf
----------------------------------------
totem {
version: 2
crypto_cipher: none
crypto_hash: none
interface {
ringnumber: 0
bindnetaddr: 10.1.0.0
mcastaddr: 239.255.1.1
mcastport: 5405
ttl: 1
}
cluster_name: pecan
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 1
}
service {
name: pacemaker
ver: 1
}
nodelist {
node {
ring0_addr: smoking
nodeid: 1
}
node {
ring0_addr: mars
nodeid: 2
}
}
----------------------------------------
And a few things are behaving better than they did before.
At the moment my goal is to set up a partition as drbd.
In the interest of bandwidth I will show the commands that
I use and the result I finally get.
----------------------------------------
pcs cluster auth smoking mars
pcs property set stonith-enabled=true
stonith_admin --metadata --agent fence_pcmk
cibadmin -C -o resources --xml-file stonith.xml
pcs resource create floating_ip IPaddr2 ip=10.1.2.101 cidr_netmask=32
pcs resource defaults resource-stickiness=100
----------------------------------------
And at this point, all appears well. My pcs status output looks like
I think it should.
Now, of course, I admit that setting up the floating_ip is
not relevant to my goal of a drbd backed filesystem, but I've been
doing it as a sanity check.
On to drbd
----------------------------------------
modprobe drbd
systemctl start drbd.service
[root at smoking cluster]# cat /proc/drbd
version: 8.4.8-1 (api:1/proto:86-101)
GIT-hash: 22b4c802192646e433d3f7399d578ec7fecc6272 build by mockbuild@,
2016-10-
13 19:58:26
0: cs:Connected ro:Secondary/Secondary ds:Diskless/Diskless C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:10574 dw:10574 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:0
2: cs:Connected ro:Secondary/Secondary ds:Diskless/Diskless C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
----------------------------------------
Again, this is stuff that hung around from the previous incarnation.
But it looks okay to me. I'm planning to use the '1' device.
The above is run on the secondary machine, so Secondary/Primary is
correct. And UpToDate/UpToDate looks right to me.
Now it goes south. The mkfs.xfs appears to work, but that's not
relevant anyway, right?
----------------------------------------
pcs resource create BravoSpace \
ocf:linbit:drbd drbd_resource=bravo \
op monitor interval=60s
[root at smoking ~]# pcs status
Cluster name: pecan
Last updated: Sat Oct 15 01:33:37 2016 Last change: Sat Oct 15
01:18:56
2016 by root via cibadmin on mars
Stack: corosync
Current DC: mars (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 3 resources configured
Node mars: UNCLEAN (online)
Node smoking: UNCLEAN (online)
Full list of resources:
Fencing (stonith:fence_pcmk): Started mars
floating_ip (ocf::heartbeat:IPaddr2): Started mars
BravoSpace (ocf::linbit:drbd): FAILED[ smoking mars ]
Failed Actions:
* BravoSpace_stop_0 on smoking 'not configured' (6): call=18,
status=complete, e
xitreason='none',
last-rc-change='Sat Oct 15 01:18:56 2016', queued=0ms, exec=63ms
* BravoSpace_stop_0 on mars 'not configured' (6): call=18, status=complete,
exit
reason='none',
last-rc-change='Sat Oct 15 01:18:56 2016', queued=0ms, exec=60ms
PCSD Status:
smoking: Online
mars: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled
----------------------------------------
I've looked in /var/log/cluster/corosync.log and it doesn't seem
happy but I don't know what I'm looking at. On the primary
machine it's 1800+ lines on the secondary it's 600+ lines.
There are 337 lines just with BravoSpace in them.
One of them says
drbd(BravoSpace)[3295]: 2016/10/15_01:18:56 ERROR: meta parameter
misconfigured,
expected clone-max -le 2, but found unset.
But I tried adding clone-max=2 but the command barfed-- that's not a legal
parameter.
So, what's wrong? (I'm a newbie, of course.)
I did a pcs resource cleanup . That shut down fencing and the IP.
I tried pcs cluster start to get them back, no help.
I did pcs cluster standby smoking, and then unstandby smoking.
The ip started, but fencing has failed on BOTH machines.
I can't see what I'm doing wrong.
Thanks. I realize I'm consuming your time on the cheap.
On Fri, Oct 14, 2016 at 3:33 PM, Dimitri Maziuk <dmaziuk at bmrb.wisc.edu>
wrote:
> On 10/14/2016 02:48 PM, Jay Scott wrote:
>
> > When I "start over" I stop all the services, delete the packages,
> > empty the configs and logs as best I know how. But this doesn't
> > completely clear everything: the drbd metadata is evidently still
> > on the partitions I've set aside for it.
>
> If it's small enough, dd if=/dev/zero of=/your/partition
>
> Get DRBD working and fully sync'ed outside of the cluster before you
> start adding it.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20161015/94886331/attachment.html>
More information about the Users
mailing list