[ClusterLabs] Antw: Re: Dual Primary DRBD + OCFS2 (elias)

Wed Nov 20 06:29:57 EST 2019

Maybe show what you did. Did DLM start successfully?

>>> ???? ??????? <elias at po-mayak.ru> schrieb am 20.11.2019 um 06:12 in
Nachricht
<20191120051305.052936005F7 at iwtm.local>:
> Thanks Roger!
> 
> I configured according to the SUSE doc for OCFS2, but DLM resource stop with

> error -107 (no interface found).
> I think it is necessary to configure the OCFS2 cluster manually, but 
> correctly do it through the RA Pacemaker.
> 
> Ilya Nasonov
> elias at po-mayak
> 
> От: users-request at clusterlabs.org 
> Отправлено: 19 ноября 2019 г. в 19:32
> Кому: users at clusterlabs.org 
> Тема: Users Digest, Vol 58, Issue 20
> 
> Send Users mailing list submissions to
> 	users at clusterlabs.org 
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://lists.clusterlabs.org/mailman/listinfo/users 
> or, via email, send a message with subject or body 'help' to
> 	users-request at clusterlabs.org 
> 
> You can reach the person managing the list at
> 	users-owner at clusterlabs.org 
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Antw: Re:  Pacemaker 2.0.3-rc3 now available
>       (Jehan-Guillaume de Rorthais)
>    2. corosync 3.0.1 on Debian/Buster reports some MTU	errors
>       (Jean-Francois Malouin)
>    3. Dual Primary DRBD + OCFS2 (???? ???????)
>    4. Re: Dual Primary DRBD + OCFS2 (Roger Zhou)
>    5. Q: ldirectord and "checktype = external-perl" broken?
>       (Ulrich Windl)
>    6. Q: ocf:pacemaker:ping (Ulrich Windl)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 18 Nov 2019 18:13:57 +0100
> From: Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> To: Ken Gaillot <kgaillot at redhat.com>
> Cc: Cluster Labs - All topics related to open-source clustering
> 	welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Antw: Re:  Pacemaker 2.0.3-rc3 now
> 	available
> Message-ID: <20191118181357.6899c051 at firost>
> Content-Type: text/plain; charset=UTF-8
> 
> On Mon, 18 Nov 2019 10:45:25 -0600
> Ken Gaillot <kgaillot at redhat.com> wrote:
> 
>> On Fri, 2019-11-15 at 14:35 +0100, Jehan-Guillaume de Rorthais wrote:
>> > On Thu, 14 Nov 2019 11:09:57 -0600
>> > Ken Gaillot <kgaillot at redhat.com> wrote:
>> >   
>> > > On Thu, 2019-11-14 at 15:22 +0100, Ulrich Windl wrote:  
>> > > > > > > Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am
>> > > > > > > 14.11.2019 um    
>> > > > 
>> > > > 15:17 in
>> > > > Nachricht <20191114151719.6cbf4e38 at firost>:    
>> > > > > On Wed, 13 Nov 2019 17:30:31 ?0600
>> > > > > Ken Gaillot <kgaillot at redhat.com> wrote:
>> > > > > ...    
>> > > > > > A longstanding pain point in the logs has been improved.
>> > > > > > Whenever
>> > > > > > the
>> > > > > > scheduler processes resource history, it logs a warning for
>> > > > > > any
>> > > > > > failures it finds, regardless of whether they are new or old,
>> > > > > > which can
>> > > > > > confuse anyone reading the logs. Now, the log will contain
>> > > > > > the
>> > > > > > time of
>> > > > > > the failure, so it's obvious whether you're seeing the same
>> > > > > > event
>> > > > > > or
>> > > > > > not. The log will also contain the exit reason if one was
>> > > > > > provided by
>> > > > > > the resource agent, for easier troubleshooting.    
>> > > > > 
>> > > > > I've been hurt by this in the past and I was wondering what was
>> > > > > the
>> > > > > point of
>> > > > > warning again and again in the logs for past failures during
>> > > > > scheduling? 
>> > > > > What this information brings to the administrator?    
>> > > 
>> > > The controller will log an event just once, when it happens.
>> > > 
>> > > The scheduler, on the other hand, uses the entire recorded resource
>> > > history to determine the current resource state. Old failures (that
>> > > haven't been cleaned) must be taken into account.  
>> > 
>> > OK, I wasn't aware of this. If you have a few minutes, I would be
>> > interested to
>> > know why the full history is needed and not just find the latest
>> > entry from
>> > there. Or maybe there's some comments in the source code that already
>> > cover this question?  
>> 
>> The full *recorded* history consists of the most recent operation that
>> affects the state (like start/stop/promote/demote), the most recent
>> failed operation, and the most recent results of any recurring
>> monitors.
>> 
>> For example there may be a failed monitor, but whether the resource is
>> considered failed or not would depend on whether there was a more
>> recent successful stop or start. Even if the failed monitor has been
>> superseded, it needs to stay in the history for display purposes until
>> the user has cleaned it up.
> 
> OK, understood.
> 
> Maybe that's why "FAILED" appears shortly in crm_mon during a resource move

> on
> a clean resource, but with past failures? Maybe I should dig this weird
> behavior and wrap up a bug report if I confirm this?
> 
>> > > Every run of the scheduler is completely independent, so it doesn't
>> > > know about any earlier runs or what they logged. Think of it like
>> > > Frosty the Snowman saying "Happy Birthday!" every time his hat is
>> > > put
>> > > on.  
>> > 
>> > I don't have this ref :)  
>> 
>> I figured not everybody would, but it was too fun to pass up :)
>> 
>> The snowman comes to life every time his magic hat is put on, but to
>> him each time feels like he's being born for the first time, so he says
>> "Happy Birthday!"
>> 
>> https://www.youtube.com/watch?v=1PbWTEYoN8o 
> 
> heh :)
> 
>> > > As far as each run is concerned, it is the first time it's seen the
>> > > history. This is what allows the DC role to move from node to node,
>> > > and
>> > > the scheduler to be run as a simulation using a saved CIB file.
>> > > 
>> > > We could change the wording further if necessary. The previous
>> > > version
>> > > would log something like:
>> > > 
>> > > warning: Processing failed monitor of my-rsc on node1: not running
>> > > 
>> > > and this latest change will log it like:
>> > > 
>> > > warning: Unexpected result (not running: No process state file
>> > > found)
>> > > was recorded for monitor of my-rsc on node1 at Nov 12 19:19:02 2019  
>> > 
>> > /result/state/ ?  
>> 
>> It's the result of a resource agent action, so it could be for example
>> a timeout or a permissions issue.
> 
> ok
> 
>> > > I wanted to be explicit about the message being about processing
>> > > resource history that may or may not be the first time it's been
>> > > processed and logged, but everything I came up with seemed too long
>> > > for
>> > > a log line. Another possibility might be something like:
>> > > 
>> > > warning: Using my-rsc history to determine its current state on
>> > > node1:
>> > > Unexpected result (not running: No process state file found) was
>> > > recorded for monitor at Nov 12 19:19:02 2019  
>> > 
>> > I better like the first one.
>> > 
>> > However, it feels like implementation details exposed to the world,
>> > isn't it? How useful is this information for the end user? What the
>> > user can do
>> > with this information? There's noting to fix and this is not actually
>> > an error
>> > of the current running process.
>> > 
>> > I still fail to understand why the scheduler doesn't process the
>> > history
>> > silently, whatever it finds there, then warn for something really
>> > important if
>> > the final result is not expected...  
>> 
>> From the scheduler's point of view, it's all relevant information that
>> goes into the decision making. Even an old failure can cause new
>> actions, for example if quorum was not held at the time but has now
>> been reached, or if there is a failure-timeout that just expired. So
>> any failure history is important to understanding whatever the
>> scheduler says needs to be done.
>> 
>> Also, the scheduler is run on the DC, which is not necessarily the node
>> that executed the action. So it's useful for troubleshooting to present
>> a picture of the whole cluster on the DC, rather than just what's the
>> situation on the local node.
> 
> OK, kind of got it. The scheduler need to summarize the chain of event to
> define the state of a resource based on the last event.
> 
>> I could see an argument for lowering it from warning to notice, but
>> it's a balance between what's most useful during normal operation and
>> what's most useful during troubleshooting.
> 
> So in my humble opinion, the messages should definitely be at notice level.
> Maybe they should even go to debug level. I never had to troubleshoot a bad
> decision from the scheduler because of a bad state summary.
> Moreover, if needed, the admin can still study the history from cib backed 
> up
> on disk, isn't it?
> 
> The alternative would be to spit the event chain in details only if the 
> result
> of the summary is different from what the scheduler was expecting?
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 18 Nov 2019 16:31:34 -0500
> From: Jean-Francois Malouin <Jean-Francois.Malouin at bic.mni.mcgill.ca>
> To: The Pacemaker Cluster List <users at clusterlabs.org>
> Subject: [ClusterLabs] corosync 3.0.1 on Debian/Buster reports some
> 	MTU	errors
> Message-ID: <20191118213134.huecj2xnbtrtdqmm at bic.mni.mcgill.ca>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi,
> 
> Maybe not directly a pacemaker question but maybe some of you have seen
this
> problem:
> 
> A 2 node pacemaker cluster running corosync-3.0.1 with dual communication 
> ring
> sometimes reports errors like this in the corosync log file:
> 
> [KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366
> [KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366
> [KNET  ] pmtud: Global data MTU changed to: 1366
> [CFG   ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at

> run-time
> [CFG   ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at

> run-time
> 
> Those do not happen very frequenly, once a week or so...
> 
> However the system log on the nodes reports those much more frequently, a 
> few
> times a day:
> 
> Nov 17 23:26:20 node1 corosync[2258]:   [KNET  ] link: host: 2 link: 1 is 
> down
> Nov 17 23:26:20 node1 corosync[2258]:   [KNET  ] host: host: 2 (passive) 
> best link: 0 (pri: 0)
> Nov 17 23:26:26 node1 corosync[2258]:   [KNET  ] rx: host: 2 link: 1 is up
> Nov 17 23:26:26 node1 corosync[2258]:   [KNET  ] host: host: 2 (passive) 
> best link: 1 (pri: 1)
> 
> Are those to be dismissed or are they indicative of a network 
> misconfig/problem?
> I tried setting 'knet_transport: udpu' in the totem section (the default 
> value)
> but it didn't seem to make a difference...Hard coding netmtu to 1500 and
> allowing for longer (10s) token timeout also didn't seem to affect the 
> issue.
> 
> 
> Corosync config follows:
> 
> /etc/corosync/corosync.conf
> 
> totem {
>     version: 2
>     cluster_name: bicha
>     transport: knet
>     link_mode: passive
>     ip_version: ipv4
>     token: 10000
>     netmtu: 1500
>     knet_transport: sctp
>     crypto_model: openssl
>     crypto_hash: sha256
>     crypto_cipher: aes256
>     keyfile: /etc/corosync/authkey
>     interface {
>         linknumber: 0
>         knet_transport: udp
>         knet_link_priority: 0
>     }
>     interface {
>         linknumber: 1
>         knet_transport: udp
>         knet_link_priority: 1
>     }
> }
> quorum {
>     provider: corosync_votequorum
>     two_node: 1
> #    expected_votes: 2
> }
> nodelist {
>     node {
>         ring0_addr: xxx.xxx.xxx.xxx
>         ring1_addr: zzz.zzz.zzz.zzx
>         name: node1
>         nodeid: 1
>     } 
>     node {
>         ring0_addr: xxx.xxx.xxx.xxy
>         ring1_addr: zzz.zzz.zzz.zzy
>         name: node2
>         nodeid: 2
>     } 
> }
> logging {
>     to_logfile: yes
>     to_syslog: yes
>     logfile: /var/log/corosync/corosync.log
>     syslog_facility: daemon
>     debug: off
>     timestamp: on
>     logger_subsys {
>         subsys: QUORUM
>         debug: off
>     }
> }
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 19 Nov 2019 13:51:59 +0500
> From: ???? ??????? <elias at po-mayak.ru>
> To: "  users at clusterlabs.org" <users at clusterlabs.org>
> Subject: [ClusterLabs] Dual Primary DRBD + OCFS2
> Message-ID: <20191119085203.2771960014A at iwtm.local>
> Content-Type: text/plain; charset="utf-8"
> 
> Hello!
> 
> Configured a cluster (2-node DRBD+DLM+CFS2) and it works.
> I heard the opinion that OCFS2 file system is better. Found an old cluster 
> setup description: 
> https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 
> but as I understand it, o2cb Service is not supported Pacemaker on Debian.
> Where can I get the latest information on setting up the OCFS2.
> 
> ? ?????????,
> ???? ???????
> elias at po-mayak
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
>
<https://lists.clusterlabs.org/pipermail/users/attachments/20191119/95e4c791/

> attachment-0001.html>
> 
> ------------------------------
> 
> Message: 4
> Date: Tue, 19 Nov 2019 10:01:01 +0000
> From: Roger Zhou <ZZhou at suse.com>
> To: "users at clusterlabs.org" <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Dual Primary DRBD + OCFS2
> Message-ID: <572e29b1-4c05-a985-7419-462310d1c626 at suse.com>
> Content-Type: text/plain; charset="utf-8"
> 
> 
> On 11/19/19 4:51 PM, ???? ??????? wrote:
>> Hello!
>> 
>> Configured a cluster (2-node DRBD+DLM+CFS2) and it works.
>> 
>> I heard the opinion that OCFS2 file system is better. Found an old 
>> cluster setup 
>> description:https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 
>> 
>> but as I understand it, o2cb Service is not supported Pacemaker on Debian.
>> 
>> Where can I get the latest information on setting up the OCFS2.
> 
> Probably you can refer to SUSE doc for OCFS2 with Pacemaker [1]. Should 
> be not much different to adapt to Debian, I feel.
> 
> [1] 
> https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-ocfs2.ht

> ml
> 
> Cheers,
> Roger
> 
> 
>> 
>> ? ?????????,
>> ???? ???????
>> elias at po-mayak
>> 
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>> 
> 
> ------------------------------
> 
> Message: 5
> Date: Tue, 19 Nov 2019 14:58:08 +0100
> From: "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
> To: <users at clusterlabs.org>
> Subject: [ClusterLabs] Q: ldirectord and "checktype = external-perl"
> 	broken?
> Message-ID: <5DD3F4F0020000A10003544E at gwsmtp.uni-regensburg.de>
> Content-Type: text/plain; charset=US-ASCII
> 
> Hi!
> 
> In SLES11 I developed some special check program for ldirectord 3.9.5 in 
> Perl, but then I discovered that it won't work correctly with "checktype = 
> external-perl". Changing to "checktype = external" made it work.
> Today I played with it in SLES12 SP4 and 
> ldirectord-4.3.018.a7fb5035-3.25.1.18557.0.PTF.1153889.x86_64, just to 
> discover that it still does not work.
> 
> So I wonder: Is it really broken all the time, or is there some special 
> thing to consider that isn't written in the manual page?
> 
> Th effec tobservable is that the weight is set to 0 right after starting 
> with weight = 1. If it works, the weight is set to 1.
> 
> Regards,
> Ulrich
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Tue, 19 Nov 2019 15:32:43 +0100
> From: "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
> To: <users at clusterlabs.org>
> Subject: [ClusterLabs] Q: ocf:pacemaker:ping
> Message-ID: <5DD3FD0B020000A100035452 at gwsmtp.uni-regensburg.de>
> Content-Type: text/plain; charset=US-ASCII
> 
> Hi!
> 
> Seems today I'm digging out old stuff:
> I can remeber in 2011 that the documentation for ping's dampen was not very

> help ful. I think it still is:
> 
> (RA info)
> node connectivity (ocf:pacemaker:ping)
> 
> Every time the monitor action is run, this resource agent records (in the 
> CIB) the current number of nodes the host can connect to using the system 
> fping (preferred) or ping tool.
> 
> Parameters (*: required, []: default):
> 
> pidfile (string, [/var/run/ping-ping]):
>     PID file
> 
> dampen (integer, [5s]): Dampening interval
>     The time to wait (dampening) further changes occur
> 
> name (string, [pingd]): Attribute name
>     The name of the attributes to set.  This is the name to be used in the 
> constraints.
> 
> multiplier (integer, [1]): Value multiplier
>     The number by which to multiply the number of connected ping nodes by
> 
> host_list* (string): Host list
>     A space separated list of ping nodes to count.
> 
> attempts (integer, [3]): no. of ping attempts
>     Number of ping attempts, per host, before declaring it dead
> 
> timeout (integer, [2]): ping timeout in seconds
>     How long, in seconds, to wait before declaring a ping lost
> 
> options (string): Extra Options
>     A catch all for any other options that need to be passed to ping.
> 
> failure_score (integer):
>     Resource is failed if the score is less than failure_score.
>     Default never fails.
> 
> use_fping (boolean, [1]): Use fping if available
>     Use fping rather than ping, if found. If set to 0, fping
>     will not be used even if present.
> 
> debug (string, [false]): Verbose logging
>     Enables to use default attrd_updater verbose logging on every call.
> 
> Operations' defaults (advisory minimum):
> 
>     start         timeout=60
>     stop          timeout=20
>     monitor       timeout=60 interval=10
> ---------
> 
> "The name of the attributes to set.": Why plural ("attributes")?
> "The time to wait (dampening) further changes occur": Is this an English 
> sentence at all?
> 
> Regards,
> Ulrich
> 
> 
> 
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> ------------------------------
> 
> End of Users Digest, Vol 58, Issue 20
> *************************************