<div dir="ltr">Thanks Ken.<div><br></div><div>Regards,</div><div>Ashutosh <br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 10, 2017 at 6:57 AM, <span dir="ltr"><<a href="mailto:users-request@clusterlabs.org" target="_blank">users-request@clusterlabs.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send Users mailing list submissions to<br>
<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:users-request@clusterlabs.org">users-request@clusterlabs.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:users-owner@clusterlabs.org">users-owner@clusterlabs.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of Users digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: issues with pacemaker daemonization (Ken Gaillot)<br>
2. Re: Pacemaker 1.1.18 Release Candidate 4 (Ken Gaillot)<br>
3. Re: Issue in starting Pacemaker Virtual IP in RHEL 7 (Jan Pokorn?)<br>
4. Re: One cluster with two groups of nodes (Alberto Mijares)<br>
5. Pacemaker responsible of DRBD and a systemd resource<br>
(Derek Wuelfrath)<br>
<br>
<br>
------------------------------<wbr>------------------------------<wbr>----------<br>
<br>
Message: 1<br>
Date: Thu, 09 Nov 2017 09:49:20 -0600<br>
From: Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>
To: Cluster Labs - All topics related to open-source clustering<br>
welcomed <<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>><br>
Subject: Re: [ClusterLabs] issues with pacemaker daemonization<br>
Message-ID: <<a href="mailto:1510242560.5244.3.camel@redhat.com">1510242560.5244.3.camel@<wbr>redhat.com</a>><br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
On Thu, 2017-11-09 at 15:59 +0530, ashutosh tiwari wrote:<br>
> Hi,<br>
><br>
> We are observing that sometime pacemaker daemon gets the same<br>
> processgroup id as the process /script calling the "service pacemaker<br>
> start".?<br>
> While child processes of pacemaeker(cib/crmd/pengine) have there<br>
> processgroup id? same as there pid which is how things should be for<br>
> a daemon afaik.<br>
><br>
> Do we expect it to be managed by init.d (centos 6) or pacemaker<br>
> binary.<br>
><br>
> pacemaker version: pacemaker-1.1.14-8.el6_8.1.<wbr>x86_64<br>
><br>
><br>
> Thanks and Regards,<br>
> Ashutosh Tiwari<br>
<br>
When pacemakerd spawns a child (cib etc.), it calls setsid() in the<br>
child to start a new session, which will set the process group ID and<br>
session ID to the child's PID.<br>
<br>
However it doesn't do anything similar for itself. Possibly it should.<br>
It's a longstanding to-do item to make pacemaker daemonize itself more<br>
"properly", but no one's had the time to address it.<br>
--<br>
Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Thu, 09 Nov 2017 10:11:08 -0600<br>
From: Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>
To: Kristoffer Gr?nlund <<a href="mailto:kgronlund@suse.com">kgronlund@suse.com</a>>, Cluster Labs - All<br>
topics related to open-source clustering welcomed<br>
<<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>><br>
Subject: Re: [ClusterLabs] Pacemaker 1.1.18 Release Candidate 4<br>
Message-ID: <<a href="mailto:1510243868.5244.5.camel@redhat.com">1510243868.5244.5.camel@<wbr>redhat.com</a>><br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
On Fri, 2017-11-03 at 08:24 +0100, Kristoffer Gr?nlund wrote:<br>
> Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> writes:<br>
><br>
> > I decided to do another release candidate, because we had a large<br>
> > number of changes since rc3. The fourth release candidate for<br>
> > Pacemaker<br>
> > version 1.1.18 is now available at:<br>
> ><br>
> > <a href="https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1" rel="noreferrer" target="_blank">https://github.com/<wbr>ClusterLabs/pacemaker/<wbr>releases/tag/Pacemaker-1.1</a><br>
> > .18-<br>
> > rc4<br>
> ><br>
> > The big changes are numerous scalability improvements and bundle<br>
> > fixes.<br>
> > We're starting to test Pacemaker with as many as 1,500 bundles<br>
> > (Docker<br>
> > containers) running on 20 guest nodes running on three 56-core<br>
> > physical<br>
> > cluster nodes.<br>
><br>
> Hi Ken,<br>
><br>
> That's really cool. What's the size of the CIB with that kind of<br>
> configuration? I guess it would compress pretty well, but still.<br>
<br>
The test cluster is gone now, so not sure ... Beekhof might know.<br>
<br>
I know it's big enough that the transition graph could get too big to<br>
send via IPC, and we had to re-enable pengine's ability to write it to<br>
disk instead, and have the crmd read it from disk.<br>
<br>
><br>
> Cheers,<br>
> Kristoffer<br>
><br>
> ><br>
> > For details on the changes in this release, see the ChangeLog.<br>
> ><br>
> > This is likely to be the last release candidate before the final<br>
> > release next week. Any testing you can do is very welcome.<br>
--<br>
Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Thu, 9 Nov 2017 20:18:26 +0100<br>
From: Jan Pokorn? <<a href="mailto:jpokorny@redhat.com">jpokorny@redhat.com</a>><br>
To: <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>
Subject: Re: [ClusterLabs] Issue in starting Pacemaker Virtual IP in<br>
RHEL 7<br>
Message-ID: <<a href="mailto:20171109191826.GD10004@redhat.com">20171109191826.GD10004@<wbr>redhat.com</a>><br>
Content-Type: text/plain; charset="us-ascii"<br>
<br>
On 06/11/17 10:43 +0000, Somanath Jeeva wrote:<br>
> I am using a two node pacemaker cluster with teaming enabled. The cluster has<br>
><br>
> 1. Two team interfaces with different subents.<br>
><br>
> 2. The team1 has a NFS VIP plumbed to it.<br>
><br>
> 3. The VirtualIP from pacemaker is configured to plumb to team0(Corosync ring number is 0)<br>
><br>
> In this case the corosync takes the NFS IP as its ring address and<br>
> checks the same in the corosync.conf. Since conf file has team0<br>
> hostname the corosync start fails.<br>
><br>
> Outputs:<br>
><br>
><br>
> $ip a output:<br>
><br>
> [...]<br>
> 10: team1: <BROADCAST,MULTICAST,UP,LOWER_<wbr>UP> mtu 1500 qdisc noqueue state UP qlen 1000<br>
> link/ether 38:63:bb:3f:a4:ad brd ff:ff:ff:ff:ff:ff<br>
> inet <a href="http://10.64.23.117/28" rel="noreferrer" target="_blank">10.64.23.117/28</a> brd 10.64.23.127 scope global team1<br>
> valid_lft forever preferred_lft forever<br>
> inet <a href="http://10.64.23.121/24" rel="noreferrer" target="_blank">10.64.23.121/24</a> scope global secondary team1:~m0<br>
> valid_lft forever preferred_lft forever<br>
> inet6 fe80::3a63:bbff:fe3f:a4ad/64 scope link<br>
> valid_lft forever preferred_lft forever<br>
> 11: team0: <BROADCAST,MULTICAST,UP,LOWER_<wbr>UP> mtu 1500 qdisc noqueue state UP qlen 1000<br>
> link/ether 38:63:bb:3f:a4:ac brd ff:ff:ff:ff:ff:ff<br>
> inet <a href="http://10.64.23.103/28" rel="noreferrer" target="_blank">10.64.23.103/28</a> brd 10.64.23.111 scope global team0<br>
> valid_lft forever preferred_lft forever<br>
> inet6 fe80::3a63:bbff:fe3f:a4ac/64 scope link<br>
> valid_lft forever preferred_lft forever<br>
><br>
> Corosync Conf File:<br>
><br>
> cat /etc/corosync/corosync.conf<br>
> totem {<br>
> version: 2<br>
> secauth: off<br>
> cluster_name: DES<br>
> transport: udp<br>
> rrp_mode: passive<br>
><br>
> interface {<br>
> ringnumber: 0<br>
> bindnetaddr: 10.64.23.96<br>
> mcastaddr: 224.1.1.1<br>
> mcastport: 6860<br>
> }<br>
> }<br>
><br>
> nodelist {<br>
> node {<br>
> ring0_addr: dl380x4415<br>
> nodeid: 1<br>
> }<br>
><br>
> node {<br>
> ring0_addr: dl360x4405<br>
> nodeid: 2<br>
> }<br>
> }<br>
><br>
> quorum {<br>
> provider: corosync_votequorum<br>
> two_node: 1<br>
> }<br>
><br>
> logging {<br>
> to_logfile: yes<br>
> logfile: /var/log/cluster/corosync.log<br>
> to_syslog: yes<br>
> }<br>
><br>
> /etc/hosts:<br>
><br>
> $ cat /etc/hosts<br>
> [...]<br>
> 10.64.23.103 dl380x4415<br>
> 10.64.23.105 dl360x4405<br>
> [...]<br>
><br>
> Logs:<br>
><br>
> [3029] dl380x4415 corosyncerror [MAIN ] Corosync Cluster Engine exiting with status 20 at service.c:356.<br>
> [19040] dl380x4415 corosyncnotice [MAIN ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.<br>
> [19040] dl380x4415 corosyncinfo [MAIN ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp pie relro bindnow<br>
> [19040] dl380x4415 corosyncnotice [TOTEM ] Initializing transport (UDP/IP Multicast).<br>
> [19040] dl380x4415 corosyncnotice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none<br>
> [19040] dl380x4415 corosyncnotice [TOTEM ] The network interface [10.64.23.121] is now up.<br>
> [19040] dl380x4415 corosyncnotice [SERV ] Service engine loaded: corosync configuration map access [0]<br>
> [19040] dl380x4415 corosyncinfo [QB ] server name: cmap<br>
> [19040] dl380x4415 corosyncnotice [SERV ] Service engine loaded: corosync configuration service [1]<br>
> [19040] dl380x4415 corosyncinfo [QB ] server name: cfg<br>
> [19040] dl380x4415 corosyncnotice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]<br>
> [19040] dl380x4415 corosyncinfo [QB ] server name: cpg<br>
> [19040] dl380x4415 corosyncnotice [SERV ] Service engine loaded: corosync profile loading service [4]<br>
> [19040] dl380x4415 corosyncnotice [QUORUM] Using quorum provider corosync_votequorum<br>
> [19040] dl380x4415 corosynccrit [QUORUM] Quorum provider: corosync_votequorum failed to initialize.<br>
> [19040] dl380x4415 corosyncerror [SERV ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'<br>
<br>
I suspect whether teaming is involved or not is irrelevant here.<br>
<br>
You are not using the latest greatest 2.4.3, so I'd suggest either the<br>
upgrade or applying this patch (present in that version) if that helps:<br>
<br>
<a href="https://github.com/corosync/corosync/commit/95f9583a25007398e3792bdca2da262db18f658a" rel="noreferrer" target="_blank">https://github.com/corosync/<wbr>corosync/commit/<wbr>95f9583a25007398e3792bdca2da26<wbr>2db18f658a</a><br>
<br>
--<br>
Jan (Poki)<br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: not available<br>
Type: application/pgp-signature<br>
Size: 819 bytes<br>
Desc: not available<br>
URL: <<a href="http://lists.clusterlabs.org/pipermail/users/attachments/20171109/3847e1e8/attachment-0001.sig" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>pipermail/users/attachments/<wbr>20171109/3847e1e8/attachment-<wbr>0001.sig</a>><br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Thu, 9 Nov 2017 17:34:35 -0400<br>
From: Alberto Mijares <<a href="mailto:amijaresp@gmail.com">amijaresp@gmail.com</a>><br>
To: Cluster Labs - All topics related to open-source clustering<br>
welcomed <<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>><br>
Subject: Re: [ClusterLabs] One cluster with two groups of nodes<br>
Message-ID:<br>
<<a href="mailto:CAGZBXN_Lv0pXUkVB_u_MWo_ZpHcFxVC3gYnS9xFYNxUZ46qTaA@mail.gmail.com">CAGZBXN_Lv0pXUkVB_u_MWo_<wbr>ZpHcFxVC3gYnS9xFYNxUZ46qTaA@<wbr>mail.gmail.com</a>><br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
><br>
> The first thing I'd mention is that a 6-node cluster can only survive<br>
> the loss of two nodes, as 3 nodes don't have quorum. You can tweak that<br>
> behavior with corosync quorum options, or you could add a quorum-only<br>
> node, or use corosync's new qdevice capability to have an arbiter node.<br>
><br>
> Coincidentally, I recently stumbled across a long-time Pacemaker<br>
> feature that I wasn't aware of, that can handle this type of situation.<br>
> It's not documented yet but will be when 1.1.18 is released soon.<br>
><br>
> Colocation constraints may take a "node-attribute" parameter, that<br>
> basically means, "Put this resource on a node of the same class as the<br>
> one running resource X".<br>
><br>
> In this case, you might set a "group" node attribute on all nodes, to<br>
> "1" on the three primary nodes and "2" on the three failover nodes.<br>
> Pick one resource as your base resource that everything else should go<br>
> along with. Configure colocation constraints for all the other<br>
> resources with that one, using "node-attribute=group". That means that<br>
> all the other resources must be one a node with the same "group"<br>
> attribute value as the node that the base resource is running on.<br>
><br>
> "node-attribute" defaults to "#uname" (node name), this giving the<br>
> usual behavior of colocation constraints: put the resource only on a<br>
> node with the same name, i.e. the same node.<br>
><br>
> The remaining question is, how do you want the base resource to fail<br>
> over? If the base resource can fail over to any other node, whether in<br>
> the same group or not, then you're done. If the base resource can only<br>
> run on one node in each group, ban it from the other nodes using<br>
> -INFINITY location constraints. If the base resource should only fail<br>
> over to the opposite group, that's trickier, but something roughly<br>
> similar would be to prefer one node in each group with an equal<br>
> positive score location constraint, and migration-threshold=1.<br>
> --<br>
> Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>
<br>
<br>
Thank you very very much for this. I'm starting some tests in my lab tonight.<br>
<br>
I'll let you know my results and I hope I can count on you if a get<br>
lost in the way.<br>
<br>
BTW, every resource is supposed to run only on its designated node<br>
with a group. In example: if nginx normally runs on A1 and it MUST<br>
failover to B1. The same for every resource.<br>
<br>
Best regards,<br>
<br>
<br>
Alberto Mijares<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 5<br>
Date: Thu, 9 Nov 2017 20:27:40 -0500<br>
From: Derek Wuelfrath <<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a>><br>
To: <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>
Subject: [ClusterLabs] Pacemaker responsible of DRBD and a systemd<br>
resource<br>
Message-ID: <<a href="mailto:57EF4B1D-42A5-4B20-95C7-3A3C95F47803@inverse.ca">57EF4B1D-42A5-4B20-95C7-<wbr>3A3C95F47803@inverse.ca</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Hello there,<br>
<br>
First post here but following since a while!<br>
<br>
Here?s my issue,<br>
we are putting in place and running this type of cluster since a while and never really encountered this kind of problem.<br>
<br>
I recently set up a Corosync / Pacemaker / PCS cluster to manage DRBD along with different other resources. Part of theses resources are some systemd resources? this is the part where things are ?breaking?.<br>
<br>
Having a two servers cluster running only DRBD or DRBD with an OCF ipaddr2 resource (Cluser IP in instance) works just fine. I can easily move from one node to the other without any issue.<br>
As soon as I add a systemd resource to the resource group, things are breaking. Moving from one node to the other using standby mode works just fine but as soon as Corosync / Pacemaker restart involves polling of a systemd resource, it seems like it is trying to start the whole resource group and therefore, create a split-brain of the DRBD resource.<br>
<br>
It is the best explanation / description of the situation that I can give. If it need any clarification, examples, ? I am more than open to share them.<br>
<br>
Any guidance would be appreciated :)<br>
<br>
Here?s the output of a ?pcs config?<br>
<br>
<a href="https://pastebin.com/1TUvZ4X9" rel="noreferrer" target="_blank">https://pastebin.com/1TUvZ4X9</a> <<a href="https://pastebin.com/1TUvZ4X9" rel="noreferrer" target="_blank">https://pastebin.com/1TUvZ4X9</a><wbr>><br>
<br>
Cheers!<br>
-dw<br>
<br>
--<br>
Derek Wuelfrath<br>
<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a> <mailto:<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a>> :: +1.514.447.4918 (x110) :: +1.866.353.6153 (x110)<br>
Inverse inc. :: Leaders behind SOGo (<a href="http://www.sogo.nu" rel="noreferrer" target="_blank">www.sogo.nu</a> <<a href="https://www.sogo.nu/" rel="noreferrer" target="_blank">https://www.sogo.nu/</a>>), PacketFence (<a href="http://www.packetfence.org" rel="noreferrer" target="_blank">www.packetfence.org</a> <<a href="https://www.packetfence.org/" rel="noreferrer" target="_blank">https://www.packetfence.org/</a>><wbr>) and Fingerbank (<a href="http://www.fingerbank.org" rel="noreferrer" target="_blank">www.fingerbank.org</a> <<a href="https://www.fingerbank.org/" rel="noreferrer" target="_blank">https://www.fingerbank.org/</a>>)<br>
<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.clusterlabs.org/pipermail/users/attachments/20171109/9be1798b/attachment.html" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>pipermail/users/attachments/<wbr>20171109/9be1798b/attachment.<wbr>html</a>><br>
<br>
------------------------------<br>
<br>
______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>
<br>
<br>
End of Users Digest, Vol 34, Issue 18<br>
******************************<wbr>*******<br>
</blockquote></div><br></div></div></div>