[ClusterLabs] Q: repeating message " cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN"
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Nov 12 02:46:10 EST 2018
Hi!
While analyzing some odd cluster problem in SLES11 SP4, I found this message repeating quite a lot (several times per second) with the same text:
[...more...]
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
[...many more...]
I wonder: Shouldn't the retry number be incremented? Or are these different retries? If so, where is it visible?
The situation I'm analyzing is when a node should have been fenced, but somehow it wasn't, but also just stopped working (seemed like frozen). The last message a few minutes(!) before the other rnodes complained was:
Nov 10 22:04:18 h01 crmd[16596]: notice: throttle_mode: High CIB load detected: 1.246333
(After this the node seemed dead/frozen).
Regards,
Ulrich
More information about the Users
mailing list