[ClusterLabs] Dual Primary DRBD + OCFS2 (elias)

Wed Nov 20 00:12:59 EST 2019

Thanks Roger!

I configured according to the SUSE doc for OCFS2, but DLM resource stop with error -107 (no interface found).
I think it is necessary to configure the OCFS2 cluster manually, but correctly do it through the RA Pacemaker.

Ilya Nasonov
elias at po-mayak

От: users-request at clusterlabs.org
Отправлено: 19 ноября 2019 г. в 19:32
Кому: users at clusterlabs.org
Тема: Users Digest, Vol 58, Issue 20

Send Users mailing list submissions to
	users at clusterlabs.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://lists.clusterlabs.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
	users-request at clusterlabs.org

You can reach the person managing the list at
	users-owner at clusterlabs.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Users digest..."

Today's Topics:

   1. Re: Antw: Re:  Pacemaker 2.0.3-rc3 now available
      (Jehan-Guillaume de Rorthais)
   2. corosync 3.0.1 on Debian/Buster reports some MTU	errors
      (Jean-Francois Malouin)
   3. Dual Primary DRBD + OCFS2 (???? ???????)
   4. Re: Dual Primary DRBD + OCFS2 (Roger Zhou)
   5. Q: ldirectord and "checktype = external-perl" broken?
      (Ulrich Windl)
   6. Q: ocf:pacemaker:ping (Ulrich Windl)

----------------------------------------------------------------------

Message: 1
Date: Mon, 18 Nov 2019 18:13:57 +0100
From: Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
To: Ken Gaillot <kgaillot at redhat.com>
Cc: Cluster Labs - All topics related to open-source clustering
	welcomed <users at clusterlabs.org>
Subject: Re: [ClusterLabs] Antw: Re:  Pacemaker 2.0.3-rc3 now
	available
Message-ID: <20191118181357.6899c051 at firost>
Content-Type: text/plain; charset=UTF-8

On Mon, 18 Nov 2019 10:45:25 -0600
Ken Gaillot <kgaillot at redhat.com> wrote:

> On Fri, 2019-11-15 at 14:35 +0100, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 14 Nov 2019 11:09:57 -0600
> > Ken Gaillot <kgaillot at redhat.com> wrote:
> >   
> > > On Thu, 2019-11-14 at 15:22 +0100, Ulrich Windl wrote:  
> > > > > > > Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am
> > > > > > > 14.11.2019 um    
> > > > 
> > > > 15:17 in
> > > > Nachricht <20191114151719.6cbf4e38 at firost>:    
> > > > > On Wed, 13 Nov 2019 17:30:31 ?0600
> > > > > Ken Gaillot <kgaillot at redhat.com> wrote:
> > > > > ...    
> > > > > > A longstanding pain point in the logs has been improved.
> > > > > > Whenever
> > > > > > the
> > > > > > scheduler processes resource history, it logs a warning for
> > > > > > any
> > > > > > failures it finds, regardless of whether they are new or old,
> > > > > > which can
> > > > > > confuse anyone reading the logs. Now, the log will contain
> > > > > > the
> > > > > > time of
> > > > > > the failure, so it's obvious whether you're seeing the same
> > > > > > event
> > > > > > or
> > > > > > not. The log will also contain the exit reason if one was
> > > > > > provided by
> > > > > > the resource agent, for easier troubleshooting.    
> > > > > 
> > > > > I've been hurt by this in the past and I was wondering what was
> > > > > the
> > > > > point of
> > > > > warning again and again in the logs for past failures during
> > > > > scheduling? 
> > > > > What this information brings to the administrator?    
> > > 
> > > The controller will log an event just once, when it happens.
> > > 
> > > The scheduler, on the other hand, uses the entire recorded resource
> > > history to determine the current resource state. Old failures (that
> > > haven't been cleaned) must be taken into account.  
> > 
> > OK, I wasn't aware of this. If you have a few minutes, I would be
> > interested to
> > know why the full history is needed and not just find the latest
> > entry from
> > there. Or maybe there's some comments in the source code that already
> > cover this question?  
> 
> The full *recorded* history consists of the most recent operation that
> affects the state (like start/stop/promote/demote), the most recent
> failed operation, and the most recent results of any recurring
> monitors.
> 
> For example there may be a failed monitor, but whether the resource is
> considered failed or not would depend on whether there was a more
> recent successful stop or start. Even if the failed monitor has been
> superseded, it needs to stay in the history for display purposes until
> the user has cleaned it up.

OK, understood.

Maybe that's why "FAILED" appears shortly in crm_mon during a resource move on
a clean resource, but with past failures? Maybe I should dig this weird
behavior and wrap up a bug report if I confirm this?

> > > Every run of the scheduler is completely independent, so it doesn't
> > > know about any earlier runs or what they logged. Think of it like
> > > Frosty the Snowman saying "Happy Birthday!" every time his hat is
> > > put
> > > on.  
> > 
> > I don't have this ref :)  
> 
> I figured not everybody would, but it was too fun to pass up :)
> 
> The snowman comes to life every time his magic hat is put on, but to
> him each time feels like he's being born for the first time, so he says
> "Happy Birthday!"
> 
> https://www.youtube.com/watch?v=1PbWTEYoN8o

heh :)

> > > As far as each run is concerned, it is the first time it's seen the
> > > history. This is what allows the DC role to move from node to node,
> > > and
> > > the scheduler to be run as a simulation using a saved CIB file.
> > > 
> > > We could change the wording further if necessary. The previous
> > > version
> > > would log something like:
> > > 
> > > warning: Processing failed monitor of my-rsc on node1: not running
> > > 
> > > and this latest change will log it like:
> > > 
> > > warning: Unexpected result (not running: No process state file
> > > found)
> > > was recorded for monitor of my-rsc on node1 at Nov 12 19:19:02 2019  
> > 
> > /result/state/ ?  
> 
> It's the result of a resource agent action, so it could be for example
> a timeout or a permissions issue.

ok

> > > I wanted to be explicit about the message being about processing
> > > resource history that may or may not be the first time it's been
> > > processed and logged, but everything I came up with seemed too long
> > > for
> > > a log line. Another possibility might be something like:
> > > 
> > > warning: Using my-rsc history to determine its current state on
> > > node1:
> > > Unexpected result (not running: No process state file found) was
> > > recorded for monitor at Nov 12 19:19:02 2019  
> > 
> > I better like the first one.
> > 
> > However, it feels like implementation details exposed to the world,
> > isn't it? How useful is this information for the end user? What the
> > user can do
> > with this information? There's noting to fix and this is not actually
> > an error
> > of the current running process.
> > 
> > I still fail to understand why the scheduler doesn't process the
> > history
> > silently, whatever it finds there, then warn for something really
> > important if
> > the final result is not expected...  
> 
> From the scheduler's point of view, it's all relevant information that
> goes into the decision making. Even an old failure can cause new
> actions, for example if quorum was not held at the time but has now
> been reached, or if there is a failure-timeout that just expired. So
> any failure history is important to understanding whatever the
> scheduler says needs to be done.
> 
> Also, the scheduler is run on the DC, which is not necessarily the node
> that executed the action. So it's useful for troubleshooting to present
> a picture of the whole cluster on the DC, rather than just what's the
> situation on the local node.

OK, kind of got it. The scheduler need to summarize the chain of event to
define the state of a resource based on the last event.

> I could see an argument for lowering it from warning to notice, but
> it's a balance between what's most useful during normal operation and
> what's most useful during troubleshooting.

So in my humble opinion, the messages should definitely be at notice level.
Maybe they should even go to debug level. I never had to troubleshoot a bad
decision from the scheduler because of a bad state summary.
Moreover, if needed, the admin can still study the history from cib backed up
on disk, isn't it?

The alternative would be to spit the event chain in details only if the result
of the summary is different from what the scheduler was expecting?

------------------------------

Message: 2
Date: Mon, 18 Nov 2019 16:31:34 -0500
From: Jean-Francois Malouin <Jean-Francois.Malouin at bic.mni.mcgill.ca>
To: The Pacemaker Cluster List <users at clusterlabs.org>
Subject: [ClusterLabs] corosync 3.0.1 on Debian/Buster reports some
	MTU	errors
Message-ID: <20191118213134.huecj2xnbtrtdqmm at bic.mni.mcgill.ca>
Content-Type: text/plain; charset=us-ascii

Hi,

Maybe not directly a pacemaker question but maybe some of you have seen this
problem:

A 2 node pacemaker cluster running corosync-3.0.1 with dual communication ring
sometimes reports errors like this in the corosync log file:

[KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366
[KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366
[KNET  ] pmtud: Global data MTU changed to: 1366
[CFG   ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at run-time
[CFG   ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at run-time

Those do not happen very frequenly, once a week or so...

However the system log on the nodes reports those much more frequently, a few
times a day:

Nov 17 23:26:20 node1 corosync[2258]:   [KNET  ] link: host: 2 link: 1 is down
Nov 17 23:26:20 node1 corosync[2258]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 0)
Nov 17 23:26:26 node1 corosync[2258]:   [KNET  ] rx: host: 2 link: 1 is up
Nov 17 23:26:26 node1 corosync[2258]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)

Are those to be dismissed or are they indicative of a network misconfig/problem?
I tried setting 'knet_transport: udpu' in the totem section (the default value)
but it didn't seem to make a difference...Hard coding netmtu to 1500 and
allowing for longer (10s) token timeout also didn't seem to affect the issue.

Corosync config follows:

/etc/corosync/corosync.conf

totem {
    version: 2
    cluster_name: bicha
    transport: knet
    link_mode: passive
    ip_version: ipv4
    token: 10000
    netmtu: 1500
    knet_transport: sctp
    crypto_model: openssl
    crypto_hash: sha256
    crypto_cipher: aes256
    keyfile: /etc/corosync/authkey
    interface {
        linknumber: 0
        knet_transport: udp
        knet_link_priority: 0
    }
    interface {
        linknumber: 1
        knet_transport: udp
        knet_link_priority: 1
    }
}
quorum {
    provider: corosync_votequorum
    two_node: 1
#    expected_votes: 2
}
nodelist {
    node {
        ring0_addr: xxx.xxx.xxx.xxx
        ring1_addr: zzz.zzz.zzz.zzx
        name: node1
        nodeid: 1
    } 
    node {
        ring0_addr: xxx.xxx.xxx.xxy
        ring1_addr: zzz.zzz.zzz.zzy
        name: node2
        nodeid: 2
    } 
}
logging {
    to_logfile: yes
    to_syslog: yes
    logfile: /var/log/corosync/corosync.log
    syslog_facility: daemon
    debug: off
    timestamp: on
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}

------------------------------

Message: 3
Date: Tue, 19 Nov 2019 13:51:59 +0500
From: ???? ??????? <elias at po-mayak.ru>
To: "  users at clusterlabs.org" <users at clusterlabs.org>
Subject: [ClusterLabs] Dual Primary DRBD + OCFS2
Message-ID: <20191119085203.2771960014A at iwtm.local>
Content-Type: text/plain; charset="utf-8"

Hello!

Configured a cluster (2-node DRBD+DLM+CFS2) and it works.
I heard the opinion that OCFS2 file system is better. Found an old cluster setup description: https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2
but as I understand it, o2cb Service is not supported Pacemaker on Debian.
Where can I get the latest information on setting up the OCFS2.

? ?????????,
???? ???????
elias at po-mayak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20191119/95e4c791/attachment-0001.html>

------------------------------

Message: 4
Date: Tue, 19 Nov 2019 10:01:01 +0000
From: Roger Zhou <ZZhou at suse.com>
To: "users at clusterlabs.org" <users at clusterlabs.org>
Subject: Re: [ClusterLabs] Dual Primary DRBD + OCFS2
Message-ID: <572e29b1-4c05-a985-7419-462310d1c626 at suse.com>
Content-Type: text/plain; charset="utf-8"

On 11/19/19 4:51 PM, ???? ??????? wrote:
> Hello!
> 
> Configured a cluster (2-node DRBD+DLM+CFS2) and it works.
> 
> I heard the opinion that OCFS2 file system is better. Found an old 
> cluster setup 
> description:https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2
> 
> but as I understand it, o2cb Service is not supported Pacemaker on Debian.
> 
> Where can I get the latest information on setting up the OCFS2.

Probably you can refer to SUSE doc for OCFS2 with Pacemaker [1]. Should 
be not much different to adapt to Debian, I feel.

[1] 
https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-ocfs2.html

Cheers,
Roger

> 
> ? ?????????,
> ???? ???????
> elias at po-mayak
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

------------------------------

Message: 5
Date: Tue, 19 Nov 2019 14:58:08 +0100
From: "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
To: <users at clusterlabs.org>
Subject: [ClusterLabs] Q: ldirectord and "checktype = external-perl"
	broken?
Message-ID: <5DD3F4F0020000A10003544E at gwsmtp.uni-regensburg.de>
Content-Type: text/plain; charset=US-ASCII

Hi!

In SLES11 I developed some special check program for ldirectord 3.9.5 in Perl, but then I discovered that it won't work correctly with "checktype = external-perl". Changing to "checktype = external" made it work.
Today I played with it in SLES12 SP4 and ldirectord-4.3.018.a7fb5035-3.25.1.18557.0.PTF.1153889.x86_64, just to discover that it still does not work.

So I wonder: Is it really broken all the time, or is there some special thing to consider that isn't written in the manual page?

Th effec tobservable is that the weight is set to 0 right after starting with weight = 1. If it works, the weight is set to 1.

Regards,
Ulrich

------------------------------

Message: 6
Date: Tue, 19 Nov 2019 15:32:43 +0100
From: "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
To: <users at clusterlabs.org>
Subject: [ClusterLabs] Q: ocf:pacemaker:ping
Message-ID: <5DD3FD0B020000A100035452 at gwsmtp.uni-regensburg.de>
Content-Type: text/plain; charset=US-ASCII

Hi!

Seems today I'm digging out old stuff:
I can remeber in 2011 that the documentation for ping's dampen was not very help ful. I think it still is:

(RA info)
node connectivity (ocf:pacemaker:ping)

Every time the monitor action is run, this resource agent records (in the CIB) the current number of nodes the host can connect to using the system fping (preferred) or ping tool.

Parameters (*: required, []: default):

pidfile (string, [/var/run/ping-ping]):
    PID file

dampen (integer, [5s]): Dampening interval
    The time to wait (dampening) further changes occur

name (string, [pingd]): Attribute name
    The name of the attributes to set.  This is the name to be used in the constraints.

multiplier (integer, [1]): Value multiplier
    The number by which to multiply the number of connected ping nodes by

host_list* (string): Host list
    A space separated list of ping nodes to count.

attempts (integer, [3]): no. of ping attempts
    Number of ping attempts, per host, before declaring it dead

timeout (integer, [2]): ping timeout in seconds
    How long, in seconds, to wait before declaring a ping lost

options (string): Extra Options
    A catch all for any other options that need to be passed to ping.

failure_score (integer):
    Resource is failed if the score is less than failure_score.
    Default never fails.

use_fping (boolean, [1]): Use fping if available
    Use fping rather than ping, if found. If set to 0, fping
    will not be used even if present.

debug (string, [false]): Verbose logging
    Enables to use default attrd_updater verbose logging on every call.

Operations' defaults (advisory minimum):

    start         timeout=60
    stop          timeout=20
    monitor       timeout=60 interval=10
---------

"The name of the attributes to set.": Why plural ("attributes")?
"The time to wait (dampening) further changes occur": Is this an English sentence at all?

Regards,
Ulrich

------------------------------

Subject: Digest Footer

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

------------------------------

End of Users Digest, Vol 58, Issue 20
*************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20191120/0ecee618/attachment-0001.html>