[Pacemaker] Resource "ping" fails on passive node after upgrading to second nic

Mon Jan 9 04:38:23 EST 2012

Hello everybody,

last week I installed and configured in each cluster node a second network interface.
After configuring the corosync.cfg the passive node stops the primative ping (three ping targets).
totem {
        version: 2
        token: 3000
        token_retransmits_before_loss_const: 10
        join: 60
        consensus: 5000
        vsftype: none
        max_messages: 20
        clear_node_high_bit: yes
        secauth: off
        threads: 0
        rrp_mode: active
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.138.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
        interface {
                ringnumber: 1
                bindnetaddr: 220.0.0.0
                mcastaddr: 226.94.1.2
                mcastport: 5415
        }
}
amf {
        mode: disabled
}
service {
        ver:       0
        name:      pacemaker
}
aisexec {
        user:   root
        group:  root
}
logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        logfile: /var/log/corosync.log
        to_syslog: no
        syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

Such errors are in the corosync.log:

Jan 09 10:12:28 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan 09 10:12:28 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Jan 09 10:12:30 corosync [TOTEM ] ring 1 active with no faults
Jan 09 10:12:37 lxds05 crmd: [1347]: info: process_lrm_event: LRM operation pri_ping:1_start_0 (call=11, rc=0, cib-update=17, confirmed=true) ok
Jan 09 10:12:42 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: pingd (3000)
Jan 09 10:13:37 lxds05 crmd: [1347]: WARN: cib_rsc_callback: Resource update 17 failed: (rc=-41) Remote node did not respond
Jan 09 10:17:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: master-pri_drbd_omd:0 (10000)
Jan 09 10:17:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 22: master-pri_drbd_omd:0=10000
Jan 09 10:19:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 22 for master-pri_drbd_omd:0=10000 failed: Remote node did not respond
Jan 09 10:22:08 lxds05 cib: [1343]: info: cib_stats: Processed 67 operations (1044.00us average, 0% utilization) in the last 10min
Jan 09 10:22:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: master-pri_drbd_omd:0 (10000)
Jan 09 10:22:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 24: master-pri_drbd_omd:0=10000
Jan 09 10:24:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 24 for master-pri_drbd_omd:0=10000 failed: Remote node did not respond
Jan 09 10:27:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: master-pri_drbd_omd:0 (10000)
Jan 09 10:27:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 26: master-pri_drbd_omd:0=10000
Jan 09 10:29:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 26 for master-pri_drbd_omd:0=10000 failed: Remote node did not respond
Jan 09 10:32:08 lxds05 cib: [1343]: info: cib_stats: Processed 6 operations (1666.00us average, 0% utilization) in the last 10min
Jan 09 10:32:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: master-pri_drbd_omd:0 (10000)
Jan 09 10:32:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 28: master-pri_drbd_omd:0=10000
Jan 09 10:34:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 28 for master-pri_drbd_omd:0=10000 failed: Remote node did not respond

The check with corosync-cfg -s runs without errors on both nodes.

I do not know, what is wrong, because the targets used in the crm config can be pinged successfully.
Can someone help me, please? Thanks in advance.

Regards
Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 8864 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120109/06984ecd/attachment-0002.bin>