[ClusterLabs] [rabbitmq] Maximum Number of Sessions (8192) Reached

Tue Aug 27 12:22:04 UTC 2024

Hi,

I'm newly subscribed to the list, hoping to find some pointers. I  
can't seem to find much about rabbitmq and logind, so I wanted to ask  
the list if anyone has encountered the same and if so, how they dealt  
with it.

We're supporting a Victoria cluster (installed with our own deployment  
method) mostly controlled by pacemaker. And on two of the three  
control nodes I see this warning constantly:

---snip---
2024-07-29T14:09:23.552576+02:00 control01 su: pam_unix(su:session):  
session opened for user rabbitmq by (uid=0)
2024-07-29T14:09:24.450657+02:00 control01 su: pam_unix(su:session):  
session closed for user rabbitmq
2024-07-29T14:09:24.500356+02:00 control01 su: (to rabbitmq) root on none
2024-07-29T14:09:24.502370+02:00 control01 su:  
pam_systemd(su:session): Failed to create session: Maximum number of  
sessions (8192) reached, refusing further sessions.
2024-07-29T14:09:24.502681+02:00 control01 su: pam_unix(su:session):  
session opened for user rabbitmq by (uid=0)
2024-07-29T14:09:25.565203+02:00 control01 su: pam_unix(su:session):  
session closed for user rabbitmq
2024-07-29T14:09:25.609613+02:00 control01 su: (to rabbitmq) root on none
---snip---

This is obviously initiated by pacemaker (just grabbed newer logs):

Aug 27 13:16:06 control03 lrmd[297534]: INFO: rabbitmq[296363]:  
su_rabbit_cmd(): the invoked command exited 0: /usr/sbin/rabbitmqctl  
node_health_check -t 128
Aug 27 13:16:06 control03 lrmd[297542]: INFO: rabbitmq[296363]:  
get_monitor(): get_monitor function ready to return 0

Looking into loginctl list-sessions, almost all of them belong to  
rabbitmq and they have a very old timestamp (2023). I'm aware of older  
systemd versions which can't handle closing sessions correctly [0],  
but we already use a version newer than required according to [0]. I  
increased the SessionsMax to 16384 on one of the nodes, and again,  
rabbitmq uses almost all available sessions:

control03:~ # loginctl list-sessions | grep -c rabbit
16325

But everything seems to be working okay, it's just filling up the logs  
apparently. And it seems as if all new sessions are closed properly:

control03:~ # journalctl --since 2024-08-14 | grep -c "session opened  
for user rabbitmq"
7679
control03:~ # journalctl --since 2024-08-14 | grep -c "session closed  
for user rabbitmq"
7679

What I'm wondering about is why only two out of three control nodes  
reach the SessionsMax limit while the third (which joined the cluster  
later) only has 2 rabbitmq sessions. I seem to overlook something, but  
I don't know what it is yet. And I'm curious if this is working "as  
designed". This is a cluster with 3 control nodes and 36 compute  
nodes. What do other operators see in their HA clouds regarding  
rabbitmq?

Or could this be a rabbitmq issue since the ocf ha resource is from  
the rabbitmq-server package?

rpm -qf /usr/lib/ocf/resource.d/rabbitmq/rabbitmq-server-ha
rabbitmq-server-3.8.3-lp152.2.3.1.x86_64

Thanks for any pointers!
Eugen

[0] https://www.suse.com/support/kb/doc/?id=000020549