[Pacemaker] CoroSync's UDPu transport for public IP addresses?

Tue Dec 30 07:21:15 EST 2014

Oh, seems I've found the solution! At least two mistakes was in my
corosync.conf (BTW logs did not say about any errors, so my conclusion is
based on my experiments only).

1. nodelist.node MUST contain only IP addresses. No hostnames! They simply
do not work, "crm status" shows no nodes. And no warnings are in logs
regarding this.
2. quorum {} MUST NOT be empty (in the config sample it IS empty): in my
case, the following fixed the problem together with (1):

quorum {
    provider: corosync_votequorum
    two_node: 1
}

So, below is my final corosync.conf. Now "crm status" shows "Online: [
node1 node2 ]", UDPu transport is used, no virtual network exists at all
(only public IP addresses are specified in corosync.conf).

========================

# This seems to be a really WORKING configuration.
# Ubuntu 14.04, corosync 2.3.3, pacemaker 1.1.10
totem {
    version: 2
    cluster_name: cluster
    crypto_cipher: none
    crypto_hash: none
    clear_node_high_bit: yes
    interface {
        ringnumber: 0
        bindnetaddr: <public-ip-address-of-the-current-machine>
        mcastport: 5405
        ttl: 1
    }
    transport: udpu
    heartbeat_failures_allowed: 3
}
logging {
    fileline: off
    to_logfile: no
    to_syslog: yes
    debug: on
    timestamp: off
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}
nodelist {
  node {
    ring0_addr: <public-ip-address-of-the-first-machine>
  }
  node {
    ring0_addr: <public-ip-address-of-the-second-machine>
  }
}
quorum {
    provider: corosync_votequorum
    two_node: 1
}

=========================

On Tue, Dec 30, 2014 at 12:34 PM, Dmitry Koterov <dmitry.koterov at gmail.com>
wrote:

> On Mon, Dec 29, 2014 at 1:50 PM, Dejan Muhamedagic <dejanmm at fastmail.fm>
>> wrote:
>> >> On Mon, Dec 29, 2014 at 06:11:49AM +0300, Dmitry Koterov wrote:
>> >> Hello.
>> >>
>> >> I have a geographically distributed cluster, all machines have public
>> IP
>> >> addresses. No virtual IP subnet exists, so no multicast is available.
>> >>
>> >> I thought that UDPu transport can work in such environment, doesn't it?
>> >>
>> >> To test everything in advance, I've set up a corosync+pacemaker on
>> Ubuntu
>> >> 14.04 with the following corosync.conf:
>> >>
>> >> totem {
>> >>   transport: udpu
>> >>   interface {
>> >>         ringnumber: 0
>> >>         bindnetaddr: ip-address-of-the-current-machine
>> >>         mcastport: 5405
>> >>   }
>>
> >>   ...
>
> >> }
>
> >> nodelist {
>> >>   node {
>> >>     ring0_addr: node1
>> >>   }
>> >>   node {
>> >>     ring0_addr: node2
>> >>   }
>> >> }
>
> >> root at node1:/etc/corosync# crm status | grep node
>> >> OFFLINE: [ node1 node2 ]
>> >> and "crm node online" (as all other attempts to make crm to do
>> something) are timed out with "communication error".
>
>
>> Dmitry, which version do you have?
>
>
> root at node1:~# corosync -v
> Corosync Cluster Engine, version '2.3.3'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
> - so nodelist is defenitely enough, and totem->interface->member is
> deprecated.
>
> So, am I at least right that the configuration with UDPu SHOULD work with
> geo-distributed nodes with only public IP addresses and no private/virtual
> subnetwork? If yes, how could I debug it?
>
> Here's some more info (x.x.x.x is a public IP associated to node1):
>
> root at node1:~# netstat -nap|grep coro
> udp        0      0  x.x.x.x:41083     0.0.0.0:*
>   7037/corosync
> udp        0      0  x.x.x.x:49299     0.0.0.0:*
>   7037/corosync
> udp        0      0  x.x.x.x:5405      0.0.0.0:*
>   7037/corosync
> unix  2      [ ACC ]     STREAM     LISTENING     52458    7037/corosync
>     @quorum
> unix  2      [ ACC ]     STREAM     LISTENING     52455    7037/corosync
>     @cmap
> unix  2      [ ACC ]     STREAM     LISTENING     52456    7037/corosync
>     @cfg
> unix  2      [ ACC ]     STREAM     LISTENING     52457    7037/corosync
>     @cpg
> unix  3      [ ]         STREAM     CONNECTED     52512    7037/corosync
>     @cpg
> unix  3      [ ]         STREAM     CONNECTED     52625    7037/corosync
>     @cpg
> unix  3      [ ]         STREAM     CONNECTED     52504    7037/corosync
>     @cfg
> unix  3      [ ]         STREAM     CONNECTED     52520    7037/corosync
>     @quorum
> unix  2      [ ]         DGRAM                    52420    7037/corosync
> unix  3      [ ]         STREAM     CONNECTED     52643    7037/corosync
>     @quorum
> unix  3      [ ]         STREAM     CONNECTED     52568    7037/corosync
>     @cpg
> unix  3      [ ]         STREAM     CONNECTED     52588    7037/corosync
>     @cpg
> unix  3      [ ]         STREAM     CONNECTED     52554    7037/corosync
>     @cpg
>
> root at node1:~# crm status
> Last updated: Tue Dec 30 04:33:40 2014
> Last change: Sun Dec 28 21:40:41 2014 via crmd on node2
> Stack: corosync
> Current DC: NONE
> 2 Nodes configured
> 0 Resources configured
> OFFLINE: [ node1 node2 ]
>
> root at node1:~# crm node online
> Error setting standby=off (section=nodes, set=nodes-1084751873):
> Communication error on send
> Error performing operation: Communication error on send
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141230/5ea542d5/attachment-0003.html>