<div dir="ltr">Thanks Ken.<div><br></div><div>Regards,</div><div>Ashutosh  <br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 10, 2017 at 6:57 AM,  <span dir="ltr"><<a href="mailto:users-request@clusterlabs.org" target="_blank">users-request@clusterlabs.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send Users mailing list submissions to<br>

        <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:users-request@clusterlabs.org">users-request@clusterlabs.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:users-owner@clusterlabs.org">users-owner@clusterlabs.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of Users digest..."<br>

<br>

<br>

Today's Topics:<br>

<br>

   1. Re: issues with pacemaker daemonization (Ken Gaillot)<br>

   2. Re: Pacemaker 1.1.18 Release Candidate 4 (Ken Gaillot)<br>

   3. Re: Issue in starting Pacemaker Virtual IP in RHEL 7 (Jan Pokorn?)<br>

   4. Re: One cluster with two groups of nodes (Alberto Mijares)<br>

   5. Pacemaker responsible of DRBD and a systemd resource<br>

      (Derek Wuelfrath)<br>

<br>

<br>

------------------------------<wbr>------------------------------<wbr>----------<br>

<br>

Message: 1<br>

Date: Thu, 09 Nov 2017 09:49:20 -0600<br>

From: Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

To: Cluster Labs - All topics related to open-source clustering<br>

        welcomed        <<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>><br>

Subject: Re: [ClusterLabs] issues with pacemaker daemonization<br>

Message-ID: <<a href="mailto:1510242560.5244.3.camel@redhat.com">1510242560.5244.3.camel@<wbr>redhat.com</a>><br>

Content-Type: text/plain; charset="UTF-8"<br>

<br>

On Thu, 2017-11-09 at 15:59 +0530, ashutosh tiwari wrote:<br>

> Hi,<br>

><br>

> We are observing that sometime pacemaker daemon gets the same<br>

> processgroup id as the process /script calling the "service pacemaker<br>

> start".?<br>

> While child processes of pacemaeker(cib/crmd/pengine) have there<br>

> processgroup id? same as there pid which is how things should be for<br>

> a daemon afaik.<br>

><br>

> Do we expect it to be managed by init.d (centos 6) or pacemaker<br>

> binary.<br>

><br>

> pacemaker version: pacemaker-1.1.14-8.el6_8.1.<wbr>x86_64<br>

><br>

><br>

> Thanks and Regards,<br>

> Ashutosh Tiwari<br>

<br>

When pacemakerd spawns a child (cib etc.), it calls setsid() in the<br>

child to start a new session, which will set the process group ID and<br>

session ID to the child's PID.<br>

<br>

However it doesn't do anything similar for itself. Possibly it should.<br>

It's a longstanding to-do item to make pacemaker daemonize itself more<br>

"properly", but no one's had the time to address it.<br>

--<br>

Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Thu, 09 Nov 2017 10:11:08 -0600<br>

From: Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

To: Kristoffer Gr?nlund <<a href="mailto:kgronlund@suse.com">kgronlund@suse.com</a>>, Cluster   Labs - All<br>

        topics related to open-source clustering welcomed<br>

        <<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>><br>

Subject: Re: [ClusterLabs] Pacemaker 1.1.18 Release Candidate 4<br>

Message-ID: <<a href="mailto:1510243868.5244.5.camel@redhat.com">1510243868.5244.5.camel@<wbr>redhat.com</a>><br>

Content-Type: text/plain; charset="UTF-8"<br>

<br>

On Fri, 2017-11-03 at 08:24 +0100, Kristoffer Gr?nlund wrote:<br>

> Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> writes:<br>

><br>

> > I decided to do another release candidate, because we had a large<br>

> > number of changes since rc3. The fourth release candidate for<br>

> > Pacemaker<br>

> > version 1.1.18 is now available at:<br>

> ><br>

> > <a href="https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1" rel="noreferrer" target="_blank">https://github.com/<wbr>ClusterLabs/pacemaker/<wbr>releases/tag/Pacemaker-1.1</a><br>

> > .18-<br>

> > rc4<br>

> ><br>

> > The big changes are numerous scalability improvements and bundle<br>

> > fixes.<br>

> > We're starting to test Pacemaker with as many as 1,500 bundles<br>

> > (Docker<br>

> > containers) running on 20 guest nodes running on three 56-core<br>

> > physical<br>

> > cluster nodes.<br>

><br>

> Hi Ken,<br>

><br>

> That's really cool. What's the size of the CIB with that kind of<br>

> configuration? I guess it would compress pretty well, but still.<br>

<br>

The test cluster is gone now, so not sure ... Beekhof might know.<br>

<br>

I know it's big enough that the transition graph could get too big to<br>

send via IPC, and we had to re-enable pengine's ability to write it to<br>

disk instead, and have the crmd read it from disk.<br>

<br>

><br>

> Cheers,<br>

> Kristoffer<br>

><br>

> ><br>

> > For details on the changes in this release, see the ChangeLog.<br>

> ><br>

> > This is likely to be the last release candidate before the final<br>

> > release next week. Any testing you can do is very welcome.<br>

--<br>

Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Thu, 9 Nov 2017 20:18:26 +0100<br>

From: Jan Pokorn? <<a href="mailto:jpokorny@redhat.com">jpokorny@redhat.com</a>><br>

To: <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>

Subject: Re: [ClusterLabs] Issue in starting Pacemaker Virtual IP in<br>

        RHEL 7<br>

Message-ID: <<a href="mailto:20171109191826.GD10004@redhat.com">20171109191826.GD10004@<wbr>redhat.com</a>><br>

Content-Type: text/plain; charset="us-ascii"<br>

<br>

On 06/11/17 10:43 +0000, Somanath Jeeva wrote:<br>

> I am using a two node pacemaker cluster with teaming enabled. The cluster has<br>

><br>

> 1.       Two team interfaces with different subents.<br>

><br>

> 2.       The team1 has a NFS VIP plumbed to it.<br>

><br>

> 3.       The VirtualIP from pacemaker is configured to plumb to team0(Corosync ring number is 0)<br>

><br>

> In this case  the corosync takes the NFS IP as its ring address and<br>

> checks the same in the corosync.conf. Since conf file has team0<br>

> hostname the corosync start fails.<br>

><br>

> Outputs:<br>

><br>

><br>

> $ip a output:<br>

><br>

> [...]<br>

> 10: team1: <BROADCAST,MULTICAST,UP,LOWER_<wbr>UP> mtu 1500 qdisc noqueue state UP qlen 1000<br>

>     link/ether 38:63:bb:3f:a4:ad brd ff:ff:ff:ff:ff:ff<br>

>     inet <a href="http://10.64.23.117/28" rel="noreferrer" target="_blank">10.64.23.117/28</a> brd 10.64.23.127 scope global team1<br>

>        valid_lft forever preferred_lft forever<br>

>     inet <a href="http://10.64.23.121/24" rel="noreferrer" target="_blank">10.64.23.121/24</a> scope global secondary team1:~m0<br>

>        valid_lft forever preferred_lft forever<br>

>     inet6 fe80::3a63:bbff:fe3f:a4ad/64 scope link<br>

>        valid_lft forever preferred_lft forever<br>

> 11: team0: <BROADCAST,MULTICAST,UP,LOWER_<wbr>UP> mtu 1500 qdisc noqueue state UP qlen 1000<br>

>     link/ether 38:63:bb:3f:a4:ac brd ff:ff:ff:ff:ff:ff<br>

>     inet <a href="http://10.64.23.103/28" rel="noreferrer" target="_blank">10.64.23.103/28</a> brd 10.64.23.111 scope global team0<br>

>        valid_lft forever preferred_lft forever<br>

>     inet6 fe80::3a63:bbff:fe3f:a4ac/64 scope link<br>

>        valid_lft forever preferred_lft forever<br>

><br>

> Corosync Conf File:<br>

><br>

> cat /etc/corosync/corosync.conf<br>

> totem {<br>

>     version: 2<br>

>     secauth: off<br>

>     cluster_name: DES<br>

>     transport: udp<br>

>     rrp_mode: passive<br>

><br>

>     interface {<br>

>         ringnumber: 0<br>

>         bindnetaddr: 10.64.23.96<br>

>         mcastaddr: 224.1.1.1<br>

>         mcastport: 6860<br>

>     }<br>

> }<br>

><br>

> nodelist {<br>

>     node {<br>

>         ring0_addr: dl380x4415<br>

>         nodeid: 1<br>

>     }<br>

><br>

>     node {<br>

>         ring0_addr: dl360x4405<br>

>         nodeid: 2<br>

>     }<br>

> }<br>

><br>

> quorum {<br>

>     provider: corosync_votequorum<br>

>     two_node: 1<br>

> }<br>

><br>

> logging {<br>

>     to_logfile: yes<br>

>     logfile: /var/log/cluster/corosync.log<br>

>     to_syslog: yes<br>

> }<br>

><br>

> /etc/hosts:<br>

><br>

> $ cat /etc/hosts<br>

> [...]<br>

> 10.64.23.103       dl380x4415<br>

> 10.64.23.105       dl360x4405<br>

> [...]<br>

><br>

> Logs:<br>

><br>

> [3029] dl380x4415 corosyncerror   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.<br>

> [19040] dl380x4415 corosyncnotice  [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.<br>

> [19040] dl380x4415 corosyncinfo    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp pie relro bindnow<br>

> [19040] dl380x4415 corosyncnotice  [TOTEM ] Initializing transport (UDP/IP Multicast).<br>

> [19040] dl380x4415 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none<br>

> [19040] dl380x4415 corosyncnotice  [TOTEM ] The network interface [10.64.23.121] is now up.<br>

> [19040] dl380x4415 corosyncnotice  [SERV  ] Service engine loaded: corosync configuration map access [0]<br>

> [19040] dl380x4415 corosyncinfo    [QB    ] server name: cmap<br>

> [19040] dl380x4415 corosyncnotice  [SERV  ] Service engine loaded: corosync configuration service [1]<br>

> [19040] dl380x4415 corosyncinfo    [QB    ] server name: cfg<br>

> [19040] dl380x4415 corosyncnotice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]<br>

> [19040] dl380x4415 corosyncinfo    [QB    ] server name: cpg<br>

> [19040] dl380x4415 corosyncnotice  [SERV  ] Service engine loaded: corosync profile loading service [4]<br>

> [19040] dl380x4415 corosyncnotice  [QUORUM] Using quorum provider corosync_votequorum<br>

> [19040] dl380x4415 corosynccrit    [QUORUM] Quorum provider: corosync_votequorum failed to initialize.<br>

> [19040] dl380x4415 corosyncerror   [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'<br>

<br>

I suspect whether teaming is involved or not is irrelevant here.<br>

<br>

You are not using the latest greatest 2.4.3, so I'd suggest either the<br>

upgrade or applying this patch (present in that version) if that helps:<br>

<br>

<a href="https://github.com/corosync/corosync/commit/95f9583a25007398e3792bdca2da262db18f658a" rel="noreferrer" target="_blank">https://github.com/corosync/<wbr>corosync/commit/<wbr>95f9583a25007398e3792bdca2da26<wbr>2db18f658a</a><br>

<br>

--<br>

Jan (Poki)<br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: application/pgp-signature<br>

Size: 819 bytes<br>

Desc: not available<br>

URL: <<a href="http://lists.clusterlabs.org/pipermail/users/attachments/20171109/3847e1e8/attachment-0001.sig" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>pipermail/users/attachments/<wbr>20171109/3847e1e8/attachment-<wbr>0001.sig</a>><br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Thu, 9 Nov 2017 17:34:35 -0400<br>

From: Alberto Mijares <<a href="mailto:amijaresp@gmail.com">amijaresp@gmail.com</a>><br>

To: Cluster Labs - All topics related to open-source clustering<br>

        welcomed        <<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>><br>

Subject: Re: [ClusterLabs] One cluster with two groups of nodes<br>

Message-ID:<br>

        <<a href="mailto:CAGZBXN_Lv0pXUkVB_u_MWo_ZpHcFxVC3gYnS9xFYNxUZ46qTaA@mail.gmail.com">CAGZBXN_Lv0pXUkVB_u_MWo_<wbr>ZpHcFxVC3gYnS9xFYNxUZ46qTaA@<wbr>mail.gmail.com</a>><br>

Content-Type: text/plain; charset="UTF-8"<br>

<br>

><br>

> The first thing I'd mention is that a 6-node cluster can only survive<br>

> the loss of two nodes, as 3 nodes don't have quorum. You can tweak that<br>

> behavior with corosync quorum options, or you could add a quorum-only<br>

> node, or use corosync's new qdevice capability to have an arbiter node.<br>

><br>

> Coincidentally, I recently stumbled across a long-time Pacemaker<br>

> feature that I wasn't aware of, that can handle this type of situation.<br>

> It's not documented yet but will be when 1.1.18 is released soon.<br>

><br>

> Colocation constraints may take a "node-attribute" parameter, that<br>

> basically means, "Put this resource on a node of the same class as the<br>

> one running resource X".<br>

><br>

> In this case, you might set a "group" node attribute on all nodes, to<br>

> "1" on the three primary nodes and "2" on the three failover nodes.<br>

> Pick one resource as your base resource that everything else should go<br>

> along with. Configure colocation constraints for all the other<br>

> resources with that one, using "node-attribute=group". That means that<br>

> all the other resources must be one a node with the same "group"<br>

> attribute value as the node that the base resource is running on.<br>

><br>

> "node-attribute" defaults to "#uname" (node name), this giving the<br>

> usual behavior of colocation constraints: put the resource only on a<br>

> node with the same name, i.e. the same node.<br>

><br>

> The remaining question is, how do you want the base resource to fail<br>

> over? If the base resource can fail over to any other node, whether in<br>

> the same group or not, then you're done. If the base resource can only<br>

> run on one node in each group, ban it from the other nodes using<br>

> -INFINITY location constraints. If the base resource should only fail<br>

> over to the opposite group, that's trickier, but something roughly<br>

> similar would be to prefer one node in each group with an equal<br>

> positive score location constraint, and migration-threshold=1.<br>

> --<br>

> Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

<br>

<br>

Thank you very very much for this. I'm starting some tests in my lab tonight.<br>

<br>

I'll let you know my results and I hope I can count on you if a get<br>

lost in the way.<br>

<br>

BTW, every resource is supposed to run only on its designated node<br>

with a group. In example: if nginx normally runs on A1 and it MUST<br>

failover to B1. The same for every resource.<br>

<br>

Best regards,<br>

<br>

<br>

Alberto Mijares<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 5<br>

Date: Thu, 9 Nov 2017 20:27:40 -0500<br>

From: Derek Wuelfrath <<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a>><br>

To: <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>

Subject: [ClusterLabs] Pacemaker responsible of DRBD and a systemd<br>

        resource<br>

Message-ID: <<a href="mailto:57EF4B1D-42A5-4B20-95C7-3A3C95F47803@inverse.ca">57EF4B1D-42A5-4B20-95C7-<wbr>3A3C95F47803@inverse.ca</a>><br>

Content-Type: text/plain; charset="utf-8"<br>

<br>

Hello there,<br>

<br>

First post here but following since a while!<br>

<br>

Here?s my issue,<br>

we are putting in place and running this type of cluster since a while and never really encountered this kind of problem.<br>

<br>

I recently set up a Corosync / Pacemaker / PCS cluster to manage DRBD along with different other resources. Part of theses resources are some systemd resources? this is the part where things are ?breaking?.<br>

<br>

Having a two servers cluster running only DRBD or DRBD with an OCF ipaddr2 resource (Cluser IP in instance) works just fine. I can easily move from one node to the other without any issue.<br>

As soon as I add a systemd resource to the resource group, things are breaking. Moving from one node to the other using standby mode works just fine but as soon as Corosync / Pacemaker restart involves polling of a systemd resource, it seems like it is trying to start the whole resource group and therefore, create a split-brain of the DRBD resource.<br>

<br>

It is the best explanation / description of the situation that I can give. If it need any clarification, examples, ? I am more than open to share them.<br>

<br>

Any guidance would be appreciated :)<br>

<br>

Here?s the output of a ?pcs config?<br>

<br>

<a href="https://pastebin.com/1TUvZ4X9" rel="noreferrer" target="_blank">https://pastebin.com/1TUvZ4X9</a> <<a href="https://pastebin.com/1TUvZ4X9" rel="noreferrer" target="_blank">https://pastebin.com/1TUvZ4X9</a><wbr>><br>

<br>

Cheers!<br>

-dw<br>

<br>

--<br>

Derek Wuelfrath<br>

<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a> <mailto:<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a>> :: +1.514.447.4918 (x110) :: +1.866.353.6153 (x110)<br>

Inverse inc. :: Leaders behind SOGo (<a href="http://www.sogo.nu" rel="noreferrer" target="_blank">www.sogo.nu</a> <<a href="https://www.sogo.nu/" rel="noreferrer" target="_blank">https://www.sogo.nu/</a>>), PacketFence (<a href="http://www.packetfence.org" rel="noreferrer" target="_blank">www.packetfence.org</a> <<a href="https://www.packetfence.org/" rel="noreferrer" target="_blank">https://www.packetfence.org/</a>><wbr>) and Fingerbank (<a href="http://www.fingerbank.org" rel="noreferrer" target="_blank">www.fingerbank.org</a> <<a href="https://www.fingerbank.org/" rel="noreferrer" target="_blank">https://www.fingerbank.org/</a>>)<br>

<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://lists.clusterlabs.org/pipermail/users/attachments/20171109/9be1798b/attachment.html" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>pipermail/users/attachments/<wbr>20171109/9be1798b/attachment.<wbr>html</a>><br>

<br>

------------------------------<br>

<br>

______________________________<wbr>_________________<br>

Users mailing list<br>

<a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

<br>

<br>

End of Users Digest, Vol 34, Issue 18<br>

******************************<wbr>*******<br>

</blockquote></div><br></div></div></div>