[ClusterLabs] [rabbitmq] Maximum Number of Sessions (8192) Reached
Eugen Block
eblock at nde.ag
Tue Aug 27 12:22:04 UTC 2024
Hi,
I'm newly subscribed to the list, hoping to find some pointers. I
can't seem to find much about rabbitmq and logind, so I wanted to ask
the list if anyone has encountered the same and if so, how they dealt
with it.
We're supporting a Victoria cluster (installed with our own deployment
method) mostly controlled by pacemaker. And on two of the three
control nodes I see this warning constantly:
---snip---
2024-07-29T14:09:23.552576+02:00 control01 su: pam_unix(su:session):
session opened for user rabbitmq by (uid=0)
2024-07-29T14:09:24.450657+02:00 control01 su: pam_unix(su:session):
session closed for user rabbitmq
2024-07-29T14:09:24.500356+02:00 control01 su: (to rabbitmq) root on none
2024-07-29T14:09:24.502370+02:00 control01 su:
pam_systemd(su:session): Failed to create session: Maximum number of
sessions (8192) reached, refusing further sessions.
2024-07-29T14:09:24.502681+02:00 control01 su: pam_unix(su:session):
session opened for user rabbitmq by (uid=0)
2024-07-29T14:09:25.565203+02:00 control01 su: pam_unix(su:session):
session closed for user rabbitmq
2024-07-29T14:09:25.609613+02:00 control01 su: (to rabbitmq) root on none
---snip---
This is obviously initiated by pacemaker (just grabbed newer logs):
Aug 27 13:16:06 control03 lrmd[297534]: INFO: rabbitmq[296363]:
su_rabbit_cmd(): the invoked command exited 0: /usr/sbin/rabbitmqctl
node_health_check -t 128
Aug 27 13:16:06 control03 lrmd[297542]: INFO: rabbitmq[296363]:
get_monitor(): get_monitor function ready to return 0
Looking into loginctl list-sessions, almost all of them belong to
rabbitmq and they have a very old timestamp (2023). I'm aware of older
systemd versions which can't handle closing sessions correctly [0],
but we already use a version newer than required according to [0]. I
increased the SessionsMax to 16384 on one of the nodes, and again,
rabbitmq uses almost all available sessions:
control03:~ # loginctl list-sessions | grep -c rabbit
16325
But everything seems to be working okay, it's just filling up the logs
apparently. And it seems as if all new sessions are closed properly:
control03:~ # journalctl --since 2024-08-14 | grep -c "session opened
for user rabbitmq"
7679
control03:~ # journalctl --since 2024-08-14 | grep -c "session closed
for user rabbitmq"
7679
What I'm wondering about is why only two out of three control nodes
reach the SessionsMax limit while the third (which joined the cluster
later) only has 2 rabbitmq sessions. I seem to overlook something, but
I don't know what it is yet. And I'm curious if this is working "as
designed". This is a cluster with 3 control nodes and 36 compute
nodes. What do other operators see in their HA clouds regarding
rabbitmq?
Or could this be a rabbitmq issue since the ocf ha resource is from
the rabbitmq-server package?
rpm -qf /usr/lib/ocf/resource.d/rabbitmq/rabbitmq-server-ha
rabbitmq-server-3.8.3-lp152.2.3.1.x86_64
Thanks for any pointers!
Eugen
[0] https://www.suse.com/support/kb/doc/?id=000020549
More information about the Users
mailing list