[ClusterLabs] wireshark cannot recognize corosync packets

cys chaoys155 at 163.com
Thu Mar 16 11:34:27 UTC 2017


I have checked all the config files are the same, except bindnetaddr.
So I'm sending only logs.






在2017年03月16 15时54分, "Jan Friesse"<jfriesse at redhat.com>写道:

> corosync.conf and debug logs are in attachment.

Thanks for them. They look really interesting. As can be seen

Mar 14 11:37:28 [57827] node-132.acloud.vt corosync debug   [TOTEM ]
timer_function_orf_token_timeout The token was lost in the
 OPERATIONAL state.

corosync correctly detected token lost. Also

Mar 14 11:44:41 [57827] node-132.acloud.vt corosync debug   [TOTEM ]
memb_state_gather_enter entering GATHER state from 11(merg
e during join).

says it correctly detected merge. But since then it's becoming weird.
Mar 14 11:44:54 [57827] node-132.acloud.vt corosync debug   [TOTEM ]
memb_state_gather_enter entering GATHER state from 0(conse
nsus timeout).
Mar 14 11:45:06 [57827] node-132.acloud.vt corosync debug   [TOTEM ]
memb_state_gather_enter entering GATHER state from 0(conse
nsus timeout).
...
Mar 14 12:55:47 [154709] node-132.acloud.vt corosync debug   [TOTEM ]
memb_state_gather_enter entering GATHER state from 0(cons
ensus timeout)

So even after two other nodes merged, there is still something what
prevents corosync to reach consensus.

Would it be possible to attach also other nodes logs/configs?

For now I guess reason can be one ofe:
- ifdown on one of other nodes which made whole membership broken
- different node list in config between nodes
- "forget" node with node list containing one of the 200.201.162.x nodes

Regards,
  Honza
>
> And two messages from kernel:
>
> 2017-03-14 11:37:20.097233 - info  e1000: eth0 NIC Link is Down
>
> 2017-03-14 11:44:41.032121 - info  e1000: eth0 NIC Link is Up 1000 Mbps
> Full Duplex, Flow Control: RX
>
>
> Thanks.
>
>
> On 2017/3/15 16:29, Jan Friesse wrote:
>>> Yesterday I found corosync took almost one hour to form a cluster(a
>>> failed node came back online).
>>
>> This for sure shouldn't happen (at least with default timeout settings).
>>
>>>
>>> So I captured some corosync packets, and opened the pcap file in
>>> wireshark.
>>>
>>> But wireshark only displayed raw udp, no totem.
>>>
>>> Wireshark version is 2.2.5. I'm sure it supports corosync totem.
>>>
>>> corosync is 2.4.0.
>>
>> Wireshark has corosync dissector, but only for version 1.x. 2.x is not
>> supported yet.
>>
>>>
>>> And if corosync takes too long to form a cluster, how to diagnose it?
>>>
>>> I read the logs, but could not figure it out.
>>
>> Logs, specially when debug is enabled, has usually enough info. Can
>> paste your config + logs?
>>
>> Regards,
>>   Honza
>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Users mailing list: Users at clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170316/70cc8030/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node-133.log.tgz
Type: application/x-gzip
Size: 1519739 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170316/70cc8030/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node-135.log.tgz
Type: application/x-gzip
Size: 2013352 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170316/70cc8030/attachment-0001.bin>


More information about the Users mailing list