[Pacemaker] errors in corosync.log

Mon Jan 18 11:20:51 EST 2010

Hi,

I'm seeing following messages in corosync.log
=============
Jan 18 09:50:41 corosync [pcmk  ] ERROR: check_message_sanity: Message
payload is corrupted: expected 1929 bytes, got 669
Jan 18 09:50:41 corosync [pcmk  ] ERROR: check_message_sanity: Child
28857 spawned to record non-fatal assertion failure line 1286: sane
Jan 18 09:50:41 corosync [pcmk  ] ERROR: check_message_sanity: Invalid
message 70: (dest=local:cib, from=node1.itactics.com:cib.22575,
compressed=0, size=1929, total=2521)
......
========

I'm not entirely sure what's casuing them.

Thanks
Shravan

On Mon, Jan 18, 2010 at 9:03 AM, Shravan Mishra
<shravan.mishra at gmail.com> wrote:
> Hi ,
>
> Since the interfaces on the two nodes are connected via cross over
> cable so there is no chance of that happening and since I'm using rrp:
> passive, which means that the other ring i.e. ring 1 will come into
> play only when ring 0 fails,I assume.  I say this because ring 1
> interface is on the network.
>
>
> Once interesting that I observed was that
>  lintomcrypt is being used for crypto reasons because I have secauth: on.
>
> But I couldn't find that library on my machine.
>
> I'm wondering if it's because of that.
>
> Basically we are using 3 interfaces eth0, eth1 and eth2.
>
> eth0 and eth2 are for ring 0 and ring 1 respectively. eth1 is the
> primary interface.
>
> This is what my drbd.conf looks like:
>
>
> ==================
> # please have a a look at the example configuration file in
> # /usr/share/doc/drbd82/drbd.conf
> #
> global {
>        usage-count no;
> }
> common {
>                protocol C;
>      startup {
>        wfc-timeout 120;
>        degr-wfc-timeout 120;
>      }
> }
> resource var_nsm {
>                syncer {
>                rate 333M;
>        }
>                handlers {
>                        fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>                        after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>                }
>                net {
>                        after-sb-1pri discard-secondary;
>                }
>                on node1.itactics.com {
>        device /dev/drbd1;
>         disk /dev/sdb3;
>         address 172.20.20.1:7791;
>         meta-disk internal;
>      }
>    on node2.itactics.com {
>        device /dev/drbd1;
>         disk /dev/sdb3;
>         address 172.20.20.2:7791;
>         meta-disk internal;
>                }
> }
> =================
>
>
> eth0's of the two nodes are connected via cross over as I mentioned
> and eth1 and eth2 are on the network.
>
> I'm not a networking expert but is it possible that broadcast done by
> ,let's say, any node not in my cluster, will still cause it to come to
> my nodes through other interfaces which are attached to the network?
>
>
> We in the dev and the QA guys are testing this in parallel.
>
> And let's say there is QA cluster of two nodes and dev cluster of 2 nodes.
>
> And interfaces for both of them are hooked as I mentioned above and that
> corosync.conf for both the clusters have  "bindnetaddr: 192.168.2.0".
>
> Is there possibility of bad messages for the cluster casused by the other.
>
>
> We are in the final leg of the testing and this came up.
>
> Thanks for the help.
>
>
> Shravan
>
>
>
>
>
>
> On Mon, Jan 18, 2010 at 2:58 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>> On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
>> <shravan.mishra at gmail.com> wrote:
>>> Hi Guys,
>>>
>>> I'm running the following version of pacemaker and corosync
>>> corosync=1.1.1-1-2
>>> pacemaker=1.0.9-2-1
>>>
>>> Every thing had been running fine for quite some time now but then I
>>> started seeing following errors in the corosync logs,
>>>
>>>
>>> =========
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> ========
>>>
>>> I can perform all the crm shell commands and what not but it's
>>> troubling that the above is happening.
>>>
>>> My crm_mon output looks good.
>>>
>>>
>>> I also checked the authkey and did md5sum on both it's same.
>>>
>>> Then I stopped corosync and regenerated the authkey with
>>> corosync-keygen and copied it to the the other machine but I still get
>>> the above message in the corosync log.
>>
>> Are you sure there's not a third node somewhere broadcasting on that
>> mcast and port combination?
>>
>>>
>>> Is there anything other authkey that I should look into ?
>>>
>>>
>>> corosync.conf
>>>
>>> ============
>>>
>>> # Please read the corosync.conf.5 manual page
>>> compatibility: whitetank
>>>
>>> totem {
>>>        version: 2
>>>        token: 3000
>>>        token_retransmits_before_loss_const: 10
>>>        join: 60
>>>        consensus: 1500
>>>        vsftype: none
>>>        max_messages: 20
>>>        clear_node_high_bit: yes
>>>        secauth: on
>>>        threads: 0
>>>        rrp_mode: passive
>>>
>>>        interface {
>>>                ringnumber: 0
>>>                bindnetaddr: 192.168.2.0
>>>                #mcastaddr: 226.94.1.1
>>>                broadcast: yes
>>>                mcastport: 5405
>>>        }
>>>        interface {
>>>                ringnumber: 1
>>>                bindnetaddr: 172.20.20.0
>>>                #mcastaddr: 226.94.1.1
>>>                broadcast: yes
>>>                mcastport: 5405
>>>        }
>>> }
>>>
>>>
>>> logging {
>>>        fileline: off
>>>        to_stderr: yes
>>>        to_logfile: yes
>>>        to_syslog: yes
>>>        logfile: /tmp/corosync.log
>>>        debug: off
>>>        timestamp: on
>>>        logger_subsys {
>>>                subsys: AMF
>>>                debug: off
>>>        }
>>> }
>>>
>>> service {
>>>        name: pacemaker
>>>        ver: 0
>>> }
>>>
>>> aisexec {
>>>        user:root
>>>        group: root
>>> }
>>>
>>> amf {
>>>        mode: disabled
>>> }
>>>
>>>
>>> ===============
>>>
>>>
>>> Thanks
>>> Shravan
>>>
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>