[ClusterLabs] Q: repeating message " cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN"
Gang He
ghe at suse.com
Mon Nov 12 03:25:11 EST 2018
Hello Ulrich,
Could you reproduce this issue stably? if yes, please share your steps.
Since we also encountered a similar issue, it looks that Cmirrord can not join the CPG(corosync related concept), then the resource is timeout, the node is fenced.
Thanks
Gang
>>> On 2018/11/12 at 15:46, in message
<5BE92FC2020000A10002E056 at gwsmtp1.uni-regensburg.de>, "Ulrich Windl"
<Ulrich.Windl at rz.uni-regensburg.de> wrote:
> Hi!
>
> While analyzing some odd cluster problem in SLES11 SP4, I found this message
> repeating quite a lot (several times per second) with the same text:
>
> [...more...]
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> [...many more...]
>
> I wonder: Shouldn't the retry number be incremented? Or are these different
> retries? If so, where is it visible?
>
> The situation I'm analyzing is when a node should have been fenced, but
> somehow it wasn't, but also just stopped working (seemed like frozen). The
> last message a few minutes(!) before the other rnodes complained was:
>
> Nov 10 22:04:18 h01 crmd[16596]: notice: throttle_mode: High CIB load
> detected: 1.246333
> (After this the node seemed dead/frozen).
>
> Regards,
> Ulrich
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list