[ClusterLabs] Antw: [EXT] Re: Bug pacemaker with multiple IP

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Dec 22 02:37:41 EST 2022


You could also try something like "watch fuser $(which ip)" or (if you can)
write a program using inotify and IN_OPEN to see which procrees are opening the
binary.

>>> Thomas CAS <tcas at ikoula.com> schrieb am 21.12.2022 um 09:24 in Nachricht
<PR3P193MB08297BA8173A31E40E59F3A4A5EB9 at PR3P193MB0829.EURP193.PROD.OUTLOOK.COM>:

> Ken,
> 
> Antivirus (sophos‑av) is running but not in "real time access scanning", the

> scheduled scan is however at 9pm every day.
> 7 minutes later, we got these alerts. 
> The anti virus may indeed be the cause.
> 
> I had the case on December 13 (with systemctl here):
> 
> pacemaker.log‑20221217.gz:Dec 13 21:07:53 wd‑websqlng01 pacemaker‑controld 

> [5082] (process_lrm_event)  notice: wd‑websqlng01‑NGINX_monitor_15000:454 [

> /etc/init.d/nginx: 33: /lib/lsb/init‑functions.d/40‑systemd: systemctl: Text

> file busy\n/etc/init.d/nginx: 82: /lib/lsb/init‑functions.d/40‑systemd: 
> /bin/systemctl: Text file busy\n ]
> pacemaker.log‑20221217.gz:Dec 13 21:07:53 wd‑websqlng01 pacemaker‑controld 

> [5082] (process_lrm_event)  notice: wd‑websqlng01‑NGINX_monitor_15000:454 [

> /etc/init.d/nginx: 33: /lib/lsb/init‑functions.d/40‑systemd: systemctl: Text

> file busy\n/etc/init.d/nginx: 82: /lib/lsb/init‑functions.d/40‑systemd: 
> /bin/systemctl: Text file busy\n ]
> 
> After, this happens rarely, we had the case in August:
> 
> pacemaker.log‑20220826.gz:Aug 25 21:06:31 wd‑websqlng01 pacemaker‑controld 

> [3718] (process_lrm_event)  notice: 
> wd‑websqlng01‑NGINX‑VIP‑232_monitor_10000:2877 [ 
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: 
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file 
> busy\nocf‑exit‑reason:IPaddr2 only supported Linux.\n ]
> pacemaker.log‑20220826.gz:Aug 25 21:06:31 wd‑websqlng01 pacemaker‑controld 

> [3718] (process_lrm_event)  notice: 
> wd‑websqlng01‑NGINX‑VIP‑231_monitor_10000:2880 [ 
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: 
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file 
> busy\nocf‑exit‑reason:IPaddr2 only supported Linux.\n ]
> 
> It's always around 9:00‑9:07 pm, 
> I'll move the virus scan to 10pm and see.
> 
> Thanks,
> Best regards,
> 
> Thomas Cas  |  Technicien du support infogérance
> PHONE : +33 3 51 25 23 26       WEB : www.ikoula.com/en 
> IKOULA Data Center 34 rue Pont Assy ‑ 51100 Reims ‑ FRANCE
> Before printing this letter, think about the impact on the environment!
> 
> ‑‑‑‑‑Message d'origine‑‑‑‑‑
> De : Reid Wahl <nwahl at redhat.com> 
> Envoyé : mardi 20 décembre 2022 20:34
> À : Cluster Labs ‑ All topics related to open‑source clustering welcomed 
> <users at clusterlabs.org>
> Cc : Ken Gaillot <kgaillot at redhat.com>; Service Infogérance 
> <infogerance at ikoula.com>
> Objet : Re: [ClusterLabs] Bug pacemaker with multiple IP
> 
> [Vous ne recevez pas souvent de courriers de nwahl at redhat.com. Découvrez 
> pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification
]
> 
> On Tue, Dec 20, 2022 at 6:25 AM Thomas CAS <tcas at ikoula.com> wrote:
>>
>> Hello Ken,
>>
>> Thanks for your answer.
>> There was no update running at the time of the bug, which is why I thought

> that having too many IPs caused this type of error.
>> The /usr/sbin/ip executable was not being modified either.
>>
>> We have many clusters, and only this one has so many IPs and this problem.
> 
> How often does this happen, and is it reliably reproducible under any 
> circumstances? Any antivirus software running? It'd be nice to check 
> something like lsof or strace while it's happening, but that may not be 
> feasible if it's sporadic; running those at every monitor would generate
lots 
> of logs.
> 
> AFAICT, having multiple processes execute (or read) the `ip` binary 
> simultaneously *shouldn't* cause problems, as long as nothing opens it for 
> write.
> 
>>
>> Best regards,
>>
>> Thomas Cas  |  Technicien du support infogérance
>> PHONE : +33 3 51 25 23 26       WEB : 
>
https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c

>
om%2Fen&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a773808dae2c13ae4%7C
>
cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF
>
pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>
C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve
> d=0
>> IKOULA Data Center 34 rue Pont Assy ‑ 51100 Reims ‑ FRANCE Before 
>> printing this letter, think about the impact on the environment!
>>
>> ‑‑‑‑‑Message d'origine‑‑‑‑‑
>> De : Ken Gaillot <kgaillot at redhat.com> Envoyé : lundi 19 décembre 2022 
>> 22:08 À : Cluster Labs ‑ All topics related to open‑source clustering 
>> welcomed <users at clusterlabs.org> Cc : Service Infogérance 
>> <infogerance at ikoula.com> Objet : Re: [ClusterLabs] Bug pacemaker with 
>> multiple IP
>>
>> [Vous ne recevez pas souvent de courriers de kgaillot at redhat.com. 
>> Découvrez pourquoi ceci est important à 
>> https://aka.ms/LearnAboutSenderIdentification ]
>>
>> On Mon, 2022‑12‑19 at 09:48 +0000, Thomas CAS wrote:
>> > Hello Clusterlabs,
>> >
>> > I would like to report a bug on Pacemaker with the "IPaddr2"
>> > resource:
>> >
>> > OS: Debian 10
>> > Kernel: Linux wd‑websqlng01 4.19.0‑18‑amd64 #1 SMP Debian 4.19.208‑1
>> > (2021‑09‑29) x86_64 GNU/Linux
>> > Pacemaker version: 2.0.1‑5+deb10u2
>> >
>> > You will find the configuration of our cluster with 2 nodes attached.
>> >
>> > Bug :
>> >
>> > We have several IP configured in the cluster configuration (12) 
>> > Sometimes the cluster is unstable with the following errors in the 
>> > pacemaker logs:
>> >
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 232_monitor_10000:28835:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>>
>> This doesn't sound like a bug in the agent; "Text file busy" suggests that

> the system "ip" command is being modified while the command is running. Is a

> software update happening when the problem occurs?
>>
>> I'm not sure whether there's some other situation that could cause that 
> error, but simply executing the command a bunch of times simultaneously 
> shouldn't cause it as far as I know.
>>
>> If simultaneous monitors is somehow causing the problem, you should be able

> to work around it by using different intervals for different monitors.
>>
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 239_monitor_10000:28877:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 239_monitor_10000:28877:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 234_monitor_10000:28830:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 231_monitor_10000:28900:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 231_monitor_10000:28900:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 235_monitor_10000:28905:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 235_monitor_10000:28905:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 237_monitor_10000:28890:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 237_monitor_10000:28890:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 238_monitor_10000:28876:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP‑
>> > 238_monitor_10000:28876:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP_monitor_10000:28880:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker‑execd     [5079]
>> > (operation_finished)   notice: NGINX‑VIP_monitor_10000:28880:stderr [
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
>> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>> >
>> > The reason is that there are a lot of IPs configured and if the 
>> > monitors take place at the same time it causes this type of error.
>> >
>> > Best regards,
>> >
>> >  Thomas Cas  |  Technicien du support infogérance
>> >  PHONE : +33 3 51 25 23 26       WEB : 
>
https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c

>
om%2Fen&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a773808dae2c13ae4%7C
>
cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF
>
pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>
C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve
> d=0
>> >  IKOULA Data Center 34 rue Pont Assy ‑ 51100 Reims ‑ FRANCE  Before 
>> > printing this letter, think about the impact on the environment!
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Manage your subscription:
>> > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli 
>> > st 
>> > s.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%4
>> > 0i 
>> > koula.com%7C541f4960600340f90a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f8
>> > 0d 
>> > b4a66f1c7%7C0%7C0%7C638070808660951911%7CUnknown%7CTWFpbGZsb3d8eyJWI
>> > jo 
>> > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
>> > C% 
>> > 7C%7C&sdata=U9osKXkKgjcqp6PN0%2F%2FB%2BzZyX0JMe6WMqRPVDTEGyWg%3D&res
>> > er
>> > ved=0
>> >
>> > ClusterLabs home:
>> > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
>> > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com%7C541f4960600340
>> > f9
>> > 0a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638070
>> > 80 
>> > 8660951911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
>> > zI 
>> > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FfODTlNES3on
>> > Dk
>> > %2FfLgs6bWR2iikLdfqx7ePxzZfR%2BIU%3D&reserved=0
>> ‑‑
>> Ken Gaillot <kgaillot at redhat.com>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist 
>> s.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%40i
>> koula.com%7C9aab91944bd6454a773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80d
>> b4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
>> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
>> 7C%7C&sdata=3jtVFwvmy127OwWr9ZNbr6B%2FefuvNeZl9YsM31QxHJM%3D&reserved=
>> 0
>>
>> ClusterLabs home: 
>> https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
>> clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a
>> 773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C63807161
>> 6800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
>> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2E8uF0uNDw4djwcy
>> %2FjVJ%2FDdJu5E77LQZfU9yrf0dVBI%3D&reserved=0
>>
> 
> 
> ‑‑
> Regards,
> 
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability ‑ Pacemaker
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list