[ClusterLabs] Antw: [EXT] Re: Bug pacemaker with multiple IP
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Wed Dec 21 05:15:26 EST 2022
Hi!
I wonder: Could the error message be triggered by adding an exclusive manatory
lock in the ip binary?
If that triggers the bug, I'm rather sure that the error message is bad.
Shouldn't that be EWOULDBLOCK then?
(I have no idea how Sophos AV works, though. If they open the files to check
in write-mode, it's really stupid then IMHO)
Regards,
Ulrich
>>> Reid Wahl <nwahl at redhat.com> schrieb am 21.12.2022 um 10:19 in Nachricht
<CAPiuu9-FiSqXAPf123ErsRWMWraKe2nnK6Pgwwfq4FmfSNxeYQ at mail.gmail.com>:
> On Wed, Dec 21, 2022 at 12:24 AM Thomas CAS <tcas at ikoula.com> wrote:
>>
>> Ken,
>>
>> Antivirus (sophos-av) is running but not in "real time access scanning",
the
> scheduled scan is however at 9pm every day.
>> 7 minutes later, we got these alerts.
>> The anti virus may indeed be the cause.
>
> I see. That does seem fairly likely. At least, there's no other
> obvious candidate for the cause.
>
> I used to work on a customer-facing support team for the ClusterLabs
> suite, and we received a fair number of cases where bizarre issues
> (such as hangs and access errors) were apparently caused by an
> antivirus. In those cases, all other usual lines of investigation were
> exhausted, and when we asked the customer to disable their AV, the
> issue disappeared. This happened with several different AV products.
>
> I can't say with any certainty that the AV is causing your issue, and
> I know it's frustrating that you won't know whether any given
> intervention worked, since this only happens once every few months.
>
> You may want to either exclude certain files from the scan, or write a
> short script to place the cluster in maintenance mode before the scan
> and take it out of maintenance after the scan is complete.
>
>>
>> I had the case on December 13 (with systemctl here):
>>
>> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01 pacemaker-controld
> [5082] (process_lrm_event) notice: wd-websqlng01-NGINX_monitor_15000:454 [
> /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd: systemctl: Text
> file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd:
> /bin/systemctl: Text file busy\n ]
>> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01 pacemaker-controld
> [5082] (process_lrm_event) notice: wd-websqlng01-NGINX_monitor_15000:454 [
> /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd: systemctl: Text
> file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd:
> /bin/systemctl: Text file busy\n ]
>>
>> After, this happens rarely, we had the case in August:
>>
>> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01 pacemaker-controld
> [3718] (process_lrm_event) notice:
> wd-websqlng01-NGINX-VIP-232_monitor_10000:2877 [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file
> busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ]
>> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01 pacemaker-controld
> [3718] (process_lrm_event) notice:
> wd-websqlng01-NGINX-VIP-231_monitor_10000:2880 [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file
> busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ]
>>
>> It's always around 9:00-9:07 pm,
>> I'll move the virus scan to 10pm and see.
>
> That also sounds like a good plan to confirm the cause :) It might
> take a while to find out though.
>
>>
>> Thanks,
>> Best regards,
>>
>> Thomas Cas | Technicien du support infogérance
>> PHONE : +33 3 51 25 23 26 WEB : www.ikoula.com/en
>> IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE
>> Before printing this letter, think about the impact on the environment!
>>
>> -----Message d'origine-----
>> De : Reid Wahl <nwahl at redhat.com>
>> Envoyé : mardi 20 décembre 2022 20:34
>> À : Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
>> Cc : Ken Gaillot <kgaillot at redhat.com>; Service Infogérance
> <infogerance at ikoula.com>
>> Objet : Re: [ClusterLabs] Bug pacemaker with multiple IP
>>
>> [Vous ne recevez pas souvent de courriers de nwahl at redhat.com. Découvrez
> pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification
]
>>
>> On Tue, Dec 20, 2022 at 6:25 AM Thomas CAS <tcas at ikoula.com> wrote:
>> >
>> > Hello Ken,
>> >
>> > Thanks for your answer.
>> > There was no update running at the time of the bug, which is why I
thought
> that having too many IPs caused this type of error.
>> > The /usr/sbin/ip executable was not being modified either.
>> >
>> > We have many clusters, and only this one has so many IPs and this
problem.
>>
>> How often does this happen, and is it reliably reproducible under any
> circumstances? Any antivirus software running? It'd be nice to check
> something like lsof or strace while it's happening, but that may not be
> feasible if it's sporadic; running those at every monitor would generate
lots
> of logs.
>>
>> AFAICT, having multiple processes execute (or read) the `ip` binary
> simultaneously *shouldn't* cause problems, as long as nothing opens it for
> write.
>>
>> >
>> > Best regards,
>> >
>> > Thomas Cas | Technicien du support infogérance
>> > PHONE : +33 3 51 25 23 26 WEB :
>
https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c
>
om%2Fen&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a773808dae2c13ae4%7C
>
cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF
>
pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>
C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve
> d=0
>> > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE Before
>> > printing this letter, think about the impact on the environment!
>> >
>> > -----Message d'origine-----
>> > De : Ken Gaillot <kgaillot at redhat.com> Envoyé : lundi 19 décembre 2022
>> > 22:08 À : Cluster Labs - All topics related to open-source clustering
>> > welcomed <users at clusterlabs.org> Cc : Service Infogérance
>> > <infogerance at ikoula.com> Objet : Re: [ClusterLabs] Bug pacemaker with
>> > multiple IP
>> >
>> > [Vous ne recevez pas souvent de courriers de kgaillot at redhat.com.
>> > Découvrez pourquoi ceci est important à
>> > https://aka.ms/LearnAboutSenderIdentification ]
>> >
>> > On Mon, 2022-12-19 at 09:48 +0000, Thomas CAS wrote:
>> > > Hello Clusterlabs,
>> > >
>> > > I would like to report a bug on Pacemaker with the "IPaddr2"
>> > > resource:
>> > >
>> > > OS: Debian 10
>> > > Kernel: Linux wd-websqlng01 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1
>> > > (2021-09-29) x86_64 GNU/Linux
>> > > Pacemaker version: 2.0.1-5+deb10u2
>> > >
>> > > You will find the configuration of our cluster with 2 nodes attached.
>> > >
>> > > Bug :
>> > >
>> > > We have several IP configured in the cluster configuration (12)
>> > > Sometimes the cluster is unstable with the following errors in the
>> > > pacemaker logs:
>> > >
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 232_monitor_10000:28835:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> >
>> > This doesn't sound like a bug in the agent; "Text file busy" suggests
that
> the system "ip" command is being modified while the command is running. Is a
> software update happening when the problem occurs?
>> >
>> > I'm not sure whether there's some other situation that could cause that
> error, but simply executing the command a bunch of times simultaneously
> shouldn't cause it as far as I know.
>> >
>> > If simultaneous monitors is somehow causing the problem, you should be
able
> to work around it by using different intervals for different monitors.
>> >
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 239_monitor_10000:28877:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 239_monitor_10000:28877:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 234_monitor_10000:28830:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 231_monitor_10000:28900:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 231_monitor_10000:28900:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 235_monitor_10000:28905:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 235_monitor_10000:28905:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 237_monitor_10000:28890:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 237_monitor_10000:28890:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 238_monitor_10000:28876:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP-
>> > > 238_monitor_10000:28876:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP_monitor_10000:28880:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
>> > > (operation_finished) notice: NGINX-VIP_monitor_10000:28880:stderr [
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > >
>> > > The reason is that there are a lot of IPs configured and if the
>> > > monitors take place at the same time it causes this type of error.
>> > >
>> > > Best regards,
>> > >
>> > > Thomas Cas | Technicien du support infogérance
>> > > PHONE : +33 3 51 25 23 26 WEB :
>
https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c
>
om%2Fen&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a773808dae2c13ae4%7C
>
cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF
>
pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>
C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve
> d=0
>> > > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE Before
>> > > printing this letter, think about the impact on the environment!
>> > >
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > Manage your subscription:
>> > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>> > > st
>> > > s.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%4
>> > > 0i
>> > > koula.com%7C541f4960600340f90a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f8
>> > > 0d
>> > > b4a66f1c7%7C0%7C0%7C638070808660951911%7CUnknown%7CTWFpbGZsb3d8eyJWI
>> > > jo
>> > > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
>> > > C%
>> > > 7C%7C&sdata=U9osKXkKgjcqp6PN0%2F%2FB%2BzZyX0JMe6WMqRPVDTEGyWg%3D&res
>> > > er
>> > > ved=0
>> > >
>> > > ClusterLabs home:
>> > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
>> > > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com%7C541f4960600340
>> > > f9
>> > > 0a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638070
>> > > 80
>> > > 8660951911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
>> > > zI
>> > > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FfODTlNES3on
>> > > Dk
>> > > %2FfLgs6bWR2iikLdfqx7ePxzZfR%2BIU%3D&reserved=0
>> > --
>> > Ken Gaillot <kgaillot at redhat.com>
>> >
>> > _______________________________________________
>> > Manage your subscription:
>> > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> > s.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%40i
>> > koula.com%7C9aab91944bd6454a773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80d
>> > b4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
>> > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
>> > 7C%7C&sdata=3jtVFwvmy127OwWr9ZNbr6B%2FefuvNeZl9YsM31QxHJM%3D&reserved=
>> > 0
>> >
>> > ClusterLabs home:
>> > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
>> > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a
>> > 773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C63807161
>> > 6800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
>> > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2E8uF0uNDw4djwcy
>> > %2FjVJ%2FDdJu5E77LQZfU9yrf0dVBI%3D&reserved=0
>> >
>>
>>
>> --
>> Regards,
>>
>> Reid Wahl (He/Him)
>> Senior Software Engineer, Red Hat
>> RHEL High Availability - Pacemaker
>>
>
>
> --
> Regards,
>
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability - Pacemaker
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list