[ClusterLabs] wireshark cannot recognize corosync packets

Jan Friesse jfriesse at redhat.com
Thu Mar 16 08:54:10 CET 2017


> corosync.conf and debug logs are in attachment.

Thanks for them. They look really interesting. As can be seen

Mar 14 11:37:28 [57827] node-132.acloud.vt corosync debug   [TOTEM ] 
timer_function_orf_token_timeout The token was lost in the
  OPERATIONAL state.

corosync correctly detected token lost. Also

Mar 14 11:44:41 [57827] node-132.acloud.vt corosync debug   [TOTEM ] 
memb_state_gather_enter entering GATHER state from 11(merg
e during join).

says it correctly detected merge. But since then it's becoming weird.
Mar 14 11:44:54 [57827] node-132.acloud.vt corosync debug   [TOTEM ] 
memb_state_gather_enter entering GATHER state from 0(conse
nsus timeout).
Mar 14 11:45:06 [57827] node-132.acloud.vt corosync debug   [TOTEM ] 
memb_state_gather_enter entering GATHER state from 0(conse
nsus timeout).
...
Mar 14 12:55:47 [154709] node-132.acloud.vt corosync debug   [TOTEM ] 
memb_state_gather_enter entering GATHER state from 0(cons
ensus timeout)

So even after two other nodes merged, there is still something what 
prevents corosync to reach consensus.

Would it be possible to attach also other nodes logs/configs?

For now I guess reason can be one ofe:
- ifdown on one of other nodes which made whole membership broken
- different node list in config between nodes
- "forget" node with node list containing one of the 200.201.162.x nodes

Regards,
   Honza
>
> And two messages from kernel:
>
> 2017-03-14 11:37:20.097233 - info  e1000: eth0 NIC Link is Down
>
> 2017-03-14 11:44:41.032121 - info  e1000: eth0 NIC Link is Up 1000 Mbps
> Full Duplex, Flow Control: RX
>
>
> Thanks.
>
>
> On 2017/3/15 16:29, Jan Friesse wrote:
>>> Yesterday I found corosync took almost one hour to form a cluster(a
>>> failed node came back online).
>>
>> This for sure shouldn't happen (at least with default timeout settings).
>>
>>>
>>> So I captured some corosync packets, and opened the pcap file in
>>> wireshark.
>>>
>>> But wireshark only displayed raw udp, no totem.
>>>
>>> Wireshark version is 2.2.5. I'm sure it supports corosync totem.
>>>
>>> corosync is 2.4.0.
>>
>> Wireshark has corosync dissector, but only for version 1.x. 2.x is not
>> supported yet.
>>
>>>
>>> And if corosync takes too long to form a cluster, how to diagnose it?
>>>
>>> I read the logs, but could not figure it out.
>>
>> Logs, specially when debug is enabled, has usually enough info. Can
>> paste your config + logs?
>>
>> Regards,
>>   Honza
>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Users mailing list