[ClusterLabs] Antw: [EXT] Re: Bug pacemaker with multiple IP

Reid Wahl nwahl at redhat.com
Wed Dec 21 05:25:34 EST 2022


On Wed, Dec 21, 2022 at 2:15 AM Ulrich Windl
<Ulrich.Windl at rz.uni-regensburg.de> wrote:
>
> Hi!
>
> I wonder: Could the error message be triggered by adding an exclusive manatory
> lock in the ip binary?
> If that triggers the bug, I'm rather sure that the error message is bad.
> Shouldn't that be EWOULDBLOCK then?

I did some cursory reading earlier today, and it seems that ETXTBSY is
becoming less common: https://lwn.net/Articles/866493/

Either way, that would be a question for kernel maintainers.

> (I have no idea how Sophos AV works, though. If they open the files to check
> in write-mode, it's really stupid then IMHO)
>
> Regards,
> Ulrich
>
>
> >>> Reid Wahl <nwahl at redhat.com> schrieb am 21.12.2022 um 10:19 in Nachricht
> <CAPiuu9-FiSqXAPf123ErsRWMWraKe2nnK6Pgwwfq4FmfSNxeYQ at mail.gmail.com>:
> > On Wed, Dec 21, 2022 at 12:24 AM Thomas CAS <tcas at ikoula.com> wrote:
> >>
> >> Ken,
> >>
> >> Antivirus (sophos-av) is running but not in "real time access scanning",
> the
> > scheduled scan is however at 9pm every day.
> >> 7 minutes later, we got these alerts.
> >> The anti virus may indeed be the cause.
> >
> > I see. That does seem fairly likely. At least, there's no other
> > obvious candidate for the cause.
> >
> > I used to work on a customer-facing support team for the ClusterLabs
> > suite, and we received a fair number of cases where bizarre issues
> > (such as hangs and access errors) were apparently caused by an
> > antivirus. In those cases, all other usual lines of investigation were
> > exhausted, and when we asked the customer to disable their AV, the
> > issue disappeared. This happened with several different AV products.
> >
> > I can't say with any certainty that the AV is causing your issue, and
> > I know it's frustrating that you won't know whether any given
> > intervention worked, since this only happens once every few months.
> >
> > You may want to either exclude certain files from the scan, or write a
> > short script to place the cluster in maintenance mode before the scan
> > and take it out of maintenance after the scan is complete.
> >
> >>
> >> I had the case on December 13 (with systemctl here):
> >>
> >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01 pacemaker-controld
>
> > [5082] (process_lrm_event)  notice: wd-websqlng01-NGINX_monitor_15000:454 [
>
> > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd: systemctl: Text
>
> > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd:
> > /bin/systemctl: Text file busy\n ]
> >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01 pacemaker-controld
>
> > [5082] (process_lrm_event)  notice: wd-websqlng01-NGINX_monitor_15000:454 [
>
> > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd: systemctl: Text
>
> > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd:
> > /bin/systemctl: Text file busy\n ]
> >>
> >> After, this happens rarely, we had the case in August:
> >>
> >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01 pacemaker-controld
>
> > [3718] (process_lrm_event)  notice:
> > wd-websqlng01-NGINX-VIP-232_monitor_10000:2877 [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file
> > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ]
> >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01 pacemaker-controld
>
> > [3718] (process_lrm_event)  notice:
> > wd-websqlng01-NGINX-VIP-231_monitor_10000:2880 [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file
> > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ]
> >>
> >> It's always around 9:00-9:07 pm,
> >> I'll move the virus scan to 10pm and see.
> >
> > That also sounds like a good plan to confirm the cause :) It might
> > take a while to find out though.
> >
> >>
> >> Thanks,
> >> Best regards,
> >>
> >> Thomas Cas  |  Technicien du support infogérance
> >> PHONE : +33 3 51 25 23 26       WEB : www.ikoula.com/en
> >> IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE
> >> Before printing this letter, think about the impact on the environment!
> >>
> >> -----Message d'origine-----
> >> De : Reid Wahl <nwahl at redhat.com>
> >> Envoyé : mardi 20 décembre 2022 20:34
> >> À : Cluster Labs - All topics related to open-source clustering welcomed
> > <users at clusterlabs.org>
> >> Cc : Ken Gaillot <kgaillot at redhat.com>; Service Infogérance
> > <infogerance at ikoula.com>
> >> Objet : Re: [ClusterLabs] Bug pacemaker with multiple IP
> >>
> >> [Vous ne recevez pas souvent de courriers de nwahl at redhat.com. Découvrez
> > pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification
> ]
> >>
> >> On Tue, Dec 20, 2022 at 6:25 AM Thomas CAS <tcas at ikoula.com> wrote:
> >> >
> >> > Hello Ken,
> >> >
> >> > Thanks for your answer.
> >> > There was no update running at the time of the bug, which is why I
> thought
> > that having too many IPs caused this type of error.
> >> > The /usr/sbin/ip executable was not being modified either.
> >> >
> >> > We have many clusters, and only this one has so many IPs and this
> problem.
> >>
> >> How often does this happen, and is it reliably reproducible under any
> > circumstances? Any antivirus software running? It'd be nice to check
> > something like lsof or strace while it's happening, but that may not be
> > feasible if it's sporadic; running those at every monitor would generate
> lots
> > of logs.
> >>
> >> AFAICT, having multiple processes execute (or read) the `ip` binary
> > simultaneously *shouldn't* cause problems, as long as nothing opens it for
> > write.
> >>
> >> >
> >> > Best regards,
> >> >
> >> > Thomas Cas  |  Technicien du support infogérance
> >> > PHONE : +33 3 51 25 23 26       WEB :
> >
> https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c
>
> >
> om%2Fen&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a773808dae2c13ae4%7C
> >
> cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF
> >
> pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> >
> C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve
> > d=0
> >> > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE Before
> >> > printing this letter, think about the impact on the environment!
> >> >
> >> > -----Message d'origine-----
> >> > De : Ken Gaillot <kgaillot at redhat.com> Envoyé : lundi 19 décembre 2022
> >> > 22:08 À : Cluster Labs - All topics related to open-source clustering
> >> > welcomed <users at clusterlabs.org> Cc : Service Infogérance
> >> > <infogerance at ikoula.com> Objet : Re: [ClusterLabs] Bug pacemaker with
> >> > multiple IP
> >> >
> >> > [Vous ne recevez pas souvent de courriers de kgaillot at redhat.com.
> >> > Découvrez pourquoi ceci est important à
> >> > https://aka.ms/LearnAboutSenderIdentification ]
> >> >
> >> > On Mon, 2022-12-19 at 09:48 +0000, Thomas CAS wrote:
> >> > > Hello Clusterlabs,
> >> > >
> >> > > I would like to report a bug on Pacemaker with the "IPaddr2"
> >> > > resource:
> >> > >
> >> > > OS: Debian 10
> >> > > Kernel: Linux wd-websqlng01 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1
> >> > > (2021-09-29) x86_64 GNU/Linux
> >> > > Pacemaker version: 2.0.1-5+deb10u2
> >> > >
> >> > > You will find the configuration of our cluster with 2 nodes attached.
> >> > >
> >> > > Bug :
> >> > >
> >> > > We have several IP configured in the cluster configuration (12)
> >> > > Sometimes the cluster is unstable with the following errors in the
> >> > > pacemaker logs:
> >> > >
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 232_monitor_10000:28835:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> >
> >> > This doesn't sound like a bug in the agent; "Text file busy" suggests
> that
> > the system "ip" command is being modified while the command is running. Is a
>
> > software update happening when the problem occurs?
> >> >
> >> > I'm not sure whether there's some other situation that could cause that
> > error, but simply executing the command a bunch of times simultaneously
> > shouldn't cause it as far as I know.
> >> >
> >> > If simultaneous monitors is somehow causing the problem, you should be
> able
> > to work around it by using different intervals for different monitors.
> >> >
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 239_monitor_10000:28877:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 239_monitor_10000:28877:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 234_monitor_10000:28830:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 231_monitor_10000:28900:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 231_monitor_10000:28900:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 235_monitor_10000:28905:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 235_monitor_10000:28905:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 237_monitor_10000:28890:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 237_monitor_10000:28890:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 238_monitor_10000:28876:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP-
> >> > > 238_monitor_10000:28876:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP_monitor_10000:28880:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd     [5079]
> >> > > (operation_finished)   notice: NGINX-VIP_monitor_10000:28880:stderr [
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> >> > >
> >> > > The reason is that there are a lot of IPs configured and if the
> >> > > monitors take place at the same time it causes this type of error.
> >> > >
> >> > > Best regards,
> >> > >
> >> > >  Thomas Cas  |  Technicien du support infogérance
> >> > >  PHONE : +33 3 51 25 23 26       WEB :
> >
> https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c
>
> >
> om%2Fen&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a773808dae2c13ae4%7C
> >
> cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF
> >
> pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> >
> C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve
> > d=0
> >> > >  IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE  Before
> >> > > printing this letter, think about the impact on the environment!
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > Manage your subscription:
> >> > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> >> > > st
> >> > > s.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%4
> >> > > 0i
> >> > > koula.com%7C541f4960600340f90a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f8
> >> > > 0d
> >> > > b4a66f1c7%7C0%7C0%7C638070808660951911%7CUnknown%7CTWFpbGZsb3d8eyJWI
> >> > > jo
> >> > > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
> >> > > C%
> >> > > 7C%7C&sdata=U9osKXkKgjcqp6PN0%2F%2FB%2BzZyX0JMe6WMqRPVDTEGyWg%3D&res
> >> > > er
> >> > > ved=0
> >> > >
> >> > > ClusterLabs home:
> >> > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> >> > > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com%7C541f4960600340
> >> > > f9
> >> > > 0a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638070
> >> > > 80
> >> > > 8660951911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> >> > > zI
> >> > > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FfODTlNES3on
> >> > > Dk
> >> > > %2FfLgs6bWR2iikLdfqx7ePxzZfR%2BIU%3D&reserved=0
> >> > --
> >> > Ken Gaillot <kgaillot at redhat.com>
> >> >
> >> > _______________________________________________
> >> > Manage your subscription:
> >> > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> >> > s.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%40i
> >> > koula.com%7C9aab91944bd6454a773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80d
> >> > b4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> >> > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
> >> > 7C%7C&sdata=3jtVFwvmy127OwWr9ZNbr6B%2FefuvNeZl9YsM31QxHJM%3D&reserved=
> >> > 0
> >> >
> >> > ClusterLabs home:
> >> > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> >> > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a
> >> > 773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C63807161
> >> > 6800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> >> > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2E8uF0uNDw4djwcy
> >> > %2FjVJ%2FDdJu5E77LQZfU9yrf0dVBI%3D&reserved=0
> >> >
> >>
> >>
> >> --
> >> Regards,
> >>
> >> Reid Wahl (He/Him)
> >> Senior Software Engineer, Red Hat
> >> RHEL High Availability - Pacemaker
> >>
> >
> >
> > --
> > Regards,
> >
> > Reid Wahl (He/Him)
> > Senior Software Engineer, Red Hat
> > RHEL High Availability - Pacemaker
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker



More information about the Users mailing list