[ClusterLabs] Antw: [EXT] Re: Bug pacemaker with multiple IP
Klaus Wenninger
kwenning at redhat.com
Wed Dec 21 10:44:06 EST 2022
On Wed, Dec 21, 2022 at 11:26 AM Reid Wahl <nwahl at redhat.com> wrote:
> On Wed, Dec 21, 2022 at 2:15 AM Ulrich Windl
> <Ulrich.Windl at rz.uni-regensburg.de> wrote:
> >
> > Hi!
> >
> > I wonder: Could the error message be triggered by adding an exclusive
> manatory
> > lock in the ip binary?
> > If that triggers the bug, I'm rather sure that the error message is bad.
> > Shouldn't that be EWOULDBLOCK then?
>
> I did some cursory reading earlier today, and it seems that ETXTBSY is
> becoming less common: https://lwn.net/Articles/866493/
>
> Either way, that would be a question for kernel maintainers.
>
Maybe network-stack-guys there or sbdy with deeper insight of how the
ip-tool
is currently interfering with the kernel.
Without knowing any details certain things might be handled calling
bpf-binaries
and ip being the userspace application this might still be shown if it was
actually rather about a bpf-binary to be executed. Thinking of
race-conditions
at that front ...
>
> > (I have no idea how Sophos AV works, though. If they open the files to
> check
> > in write-mode, it's really stupid then IMHO)
> >
> > Regards,
> > Ulrich
> >
> >
> > >>> Reid Wahl <nwahl at redhat.com> schrieb am 21.12.2022 um 10:19 in
> Nachricht
> > <CAPiuu9-FiSqXAPf123ErsRWMWraKe2nnK6Pgwwfq4FmfSNxeYQ at mail.gmail.com>:
> > > On Wed, Dec 21, 2022 at 12:24 AM Thomas CAS <tcas at ikoula.com> wrote:
> > >>
> > >> Ken,
> > >>
> > >> Antivirus (sophos-av) is running but not in "real time access
> scanning",
> > the
> > > scheduled scan is however at 9pm every day.
> > >> 7 minutes later, we got these alerts.
> > >> The anti virus may indeed be the cause.
> > >
> > > I see. That does seem fairly likely. At least, there's no other
> > > obvious candidate for the cause.
> > >
> > > I used to work on a customer-facing support team for the ClusterLabs
> > > suite, and we received a fair number of cases where bizarre issues
> > > (such as hangs and access errors) were apparently caused by an
> > > antivirus. In those cases, all other usual lines of investigation were
> > > exhausted, and when we asked the customer to disable their AV, the
> > > issue disappeared. This happened with several different AV products.
> > >
> > > I can't say with any certainty that the AV is causing your issue, and
> > > I know it's frustrating that you won't know whether any given
> > > intervention worked, since this only happens once every few months.
> > >
> > > You may want to either exclude certain files from the scan, or write a
> > > short script to place the cluster in maintenance mode before the scan
> > > and take it out of maintenance after the scan is complete.
> > >
> > >>
> > >> I had the case on December 13 (with systemctl here):
> > >>
> > >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01
> pacemaker-controld
> >
> > > [5082] (process_lrm_event) notice:
> wd-websqlng01-NGINX_monitor_15000:454 [
> >
> > > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd:
> systemctl: Text
> >
> > > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd:
> > > /bin/systemctl: Text file busy\n ]
> > >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01
> pacemaker-controld
> >
> > > [5082] (process_lrm_event) notice:
> wd-websqlng01-NGINX_monitor_15000:454 [
> >
> > > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd:
> systemctl: Text
> >
> > > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd:
> > > /bin/systemctl: Text file busy\n ]
> > >>
> > >> After, this happens rarely, we had the case in August:
> > >>
> > >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01
> pacemaker-controld
> >
> > > [3718] (process_lrm_event) notice:
> > > wd-websqlng01-NGINX-VIP-232_monitor_10000:2877 [
> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file
> > > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ]
> > >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01
> pacemaker-controld
> >
> > > [3718] (process_lrm_event) notice:
> > > wd-websqlng01-NGINX-VIP-231_monitor_10000:2880 [
> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file
> > > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ]
> > >>
> > >> It's always around 9:00-9:07 pm,
> > >> I'll move the virus scan to 10pm and see.
> > >
> > > That also sounds like a good plan to confirm the cause :) It might
> > > take a while to find out though.
> > >
> > >>
> > >> Thanks,
> > >> Best regards,
> > >>
> > >> Thomas Cas | Technicien du support infogérance
> > >> PHONE : +33 3 51 25 23 26 WEB : www.ikoula.com/en
> > >> IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE
> > >> Before printing this letter, think about the impact on the
> environment!
> > >>
> > >> -----Message d'origine-----
> > >> De : Reid Wahl <nwahl at redhat.com>
> > >> Envoyé : mardi 20 décembre 2022 20:34
> > >> À : Cluster Labs - All topics related to open-source clustering
> welcomed
> > > <users at clusterlabs.org>
> > >> Cc : Ken Gaillot <kgaillot at redhat.com>; Service Infogérance
> > > <infogerance at ikoula.com>
> > >> Objet : Re: [ClusterLabs] Bug pacemaker with multiple IP
> > >>
> > >> [Vous ne recevez pas souvent de courriers de nwahl at redhat.com.
> Découvrez
> > > pourquoi ceci est important à
> https://aka.ms/LearnAboutSenderIdentification
> > ]
> > >>
> > >> On Tue, Dec 20, 2022 at 6:25 AM Thomas CAS <tcas at ikoula.com> wrote:
> > >> >
> > >> > Hello Ken,
> > >> >
> > >> > Thanks for your answer.
> > >> > There was no update running at the time of the bug, which is why I
> > thought
> > > that having too many IPs caused this type of error.
> > >> > The /usr/sbin/ip executable was not being modified either.
> > >> >
> > >> > We have many clusters, and only this one has so many IPs and this
> > problem.
> > >>
> > >> How often does this happen, and is it reliably reproducible under any
> > > circumstances? Any antivirus software running? It'd be nice to check
> > > something like lsof or strace while it's happening, but that may not be
> > > feasible if it's sporadic; running those at every monitor would
> generate
> > lots
> > > of logs.
> > >>
> > >> AFAICT, having multiple processes execute (or read) the `ip` binary
> > > simultaneously *shouldn't* cause problems, as long as nothing opens it
> for
> > > write.
> > >>
> > >> >
> > >> > Best regards,
> > >> >
> > >> > Thomas Cas | Technicien du support infogérance
> > >> > PHONE : +33 3 51 25 23 26 WEB :
> > >
> >
> https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c
> >
> > >
> > om%2Fen&data=05%7C01%7Ctcas%40ikoula.com
> %7C9aab91944bd6454a773808dae2c13ae4%7C
> > >
> >
> cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF
> > >
> >
> pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> > >
> >
> C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve
> > > d=0
> > >> > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE Before
> > >> > printing this letter, think about the impact on the environment!
> > >> >
> > >> > -----Message d'origine-----
> > >> > De : Ken Gaillot <kgaillot at redhat.com> Envoyé : lundi 19 décembre
> 2022
> > >> > 22:08 À : Cluster Labs - All topics related to open-source
> clustering
> > >> > welcomed <users at clusterlabs.org> Cc : Service Infogérance
> > >> > <infogerance at ikoula.com> Objet : Re: [ClusterLabs] Bug pacemaker
> with
> > >> > multiple IP
> > >> >
> > >> > [Vous ne recevez pas souvent de courriers de kgaillot at redhat.com.
> > >> > Découvrez pourquoi ceci est important à
> > >> > https://aka.ms/LearnAboutSenderIdentification ]
> > >> >
> > >> > On Mon, 2022-12-19 at 09:48 +0000, Thomas CAS wrote:
> > >> > > Hello Clusterlabs,
> > >> > >
> > >> > > I would like to report a bug on Pacemaker with the "IPaddr2"
> > >> > > resource:
> > >> > >
> > >> > > OS: Debian 10
> > >> > > Kernel: Linux wd-websqlng01 4.19.0-18-amd64 #1 SMP Debian
> 4.19.208-1
> > >> > > (2021-09-29) x86_64 GNU/Linux
> > >> > > Pacemaker version: 2.0.1-5+deb10u2
> > >> > >
> > >> > > You will find the configuration of our cluster with 2 nodes
> attached.
> > >> > >
> > >> > > Bug :
> > >> > >
> > >> > > We have several IP configured in the cluster configuration (12)
> > >> > > Sometimes the cluster is unstable with the following errors in the
> > >> > > pacemaker logs:
> > >> > >
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 232_monitor_10000:28835:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> >
> > >> > This doesn't sound like a bug in the agent; "Text file busy"
> suggests
> > that
> > > the system "ip" command is being modified while the command is
> running. Is a
> >
> > > software update happening when the problem occurs?
> > >> >
> > >> > I'm not sure whether there's some other situation that could cause
> that
> > > error, but simply executing the command a bunch of times simultaneously
> > > shouldn't cause it as far as I know.
> > >> >
> > >> > If simultaneous monitors is somehow causing the problem, you should
> be
> > able
> > > to work around it by using different intervals for different monitors.
> > >> >
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 239_monitor_10000:28877:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 239_monitor_10000:28877:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 234_monitor_10000:28830:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 231_monitor_10000:28900:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 231_monitor_10000:28900:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 235_monitor_10000:28905:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 235_monitor_10000:28905:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 237_monitor_10000:28890:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 237_monitor_10000:28890:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 238_monitor_10000:28876:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice: NGINX-VIP-
> > >> > > 238_monitor_10000:28876:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice:
> NGINX-VIP_monitor_10000:28880:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > >> > > (operation_finished) notice:
> NGINX-VIP_monitor_10000:28880:stderr [
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > >> > >
> > >> > > The reason is that there are a lot of IPs configured and if the
> > >> > > monitors take place at the same time it causes this type of error.
> > >> > >
> > >> > > Best regards,
> > >> > >
> > >> > > Thomas Cas | Technicien du support infogérance
> > >> > > PHONE : +33 3 51 25 23 26 WEB :
> > >
> >
> https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c
> >
> > >
> > om%2Fen&data=05%7C01%7Ctcas%40ikoula.com
> %7C9aab91944bd6454a773808dae2c13ae4%7C
> > >
> >
> cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF
> > >
> >
> pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> > >
> >
> C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve
> > > d=0
> > >> > > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE
> Before
> > >> > > printing this letter, think about the impact on the environment!
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > _______________________________________________
> > >> > > Manage your subscription:
> > >> > >
> https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > >> > > st
> > >> > > s.clusterlabs.org
> %2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%4
> > >> > > 0i
> > >> > > koula.com
> %7C541f4960600340f90a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f8
> > >> > > 0d
> > >> > >
> b4a66f1c7%7C0%7C0%7C638070808660951911%7CUnknown%7CTWFpbGZsb3d8eyJWI
> > >> > > jo
> > >> > >
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
> > >> > > C%
> > >> > >
> 7C%7C&sdata=U9osKXkKgjcqp6PN0%2F%2FB%2BzZyX0JMe6WMqRPVDTEGyWg%3D&res
> > >> > > er
> > >> > > ved=0
> > >> > >
> > >> > > ClusterLabs home:
> > >> > >
> https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > >> > > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com
> %7C541f4960600340
> > >> > > f9
> > >> > >
> 0a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638070
> > >> > > 80
> > >> > >
> 8660951911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> > >> > > zI
> > >> > >
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FfODTlNES3on
> > >> > > Dk
> > >> > > %2FfLgs6bWR2iikLdfqx7ePxzZfR%2BIU%3D&reserved=0
> > >> > --
> > >> > Ken Gaillot <kgaillot at redhat.com>
> > >> >
> > >> > _______________________________________________
> > >> > Manage your subscription:
> > >> >
> https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> > >> > s.clusterlabs.org
> %2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%40i
> > >> > koula.com
> %7C9aab91944bd6454a773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80d
> > >> >
> b4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> > >> >
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
> > >> >
> 7C%7C&sdata=3jtVFwvmy127OwWr9ZNbr6B%2FefuvNeZl9YsM31QxHJM%3D&reserved=
> > >> > 0
> > >> >
> > >> > ClusterLabs home:
> > >> >
> https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > >> > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com
> %7C9aab91944bd6454a
> > >> >
> 773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C63807161
> > >> >
> 6800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > >> >
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2E8uF0uNDw4djwcy
> > >> > %2FjVJ%2FDdJu5E77LQZfU9yrf0dVBI%3D&reserved=0
> > >> >
> > >>
> > >>
> > >> --
> > >> Regards,
> > >>
> > >> Reid Wahl (He/Him)
> > >> Senior Software Engineer, Red Hat
> > >> RHEL High Availability - Pacemaker
> > >>
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Reid Wahl (He/Him)
> > > Senior Software Engineer, Red Hat
> > > RHEL High Availability - Pacemaker
> > >
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> >
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Regards,
>
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability - Pacemaker
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20221221/f08d3873/attachment-0001.htm>
More information about the Users
mailing list