[Pacemaker] Impossible to add a 4th node to a cluster

Thu Oct 28 12:30:07 EDT 2010

  Le 28/10/2010 17:55, Pavlos Parissis a écrit :
> On 28 October 2010 16:09, Guillaume Chanaud
> <guillaume.chanaud at connecting-nature.com>  wrote:
>>   Hello,
>>
>> i have a cluster of two master/slave drbd server running into a vlan
>> (machines are dedicated servers)
>> (filer1 and filer2)
>> I added a third node to the cluster (a "blank node" for the moment)
>> correctly
>> (server1)
>> When i add a 4th node to the cluster (which is a "mirror" of server1)
>> (server2)
>> this node start as standalone...Here is the message.log :
>>
>> Oct 28 15:59:27 ns209045 corosync[16543]:   [TOTEM ] A processor joined or
>> left the membership and a new membership was formed.
>> Oct 28 15:59:28 ns209045 corosync[16543]:   [pcmk  ] notice:
>> pcmk_peer_update: Transitional membership event on ring 945392: memb=1,
>> new=0, lost=0
>> Oct 28 15:59:28 ns209045 corosync[16543]:   [pcmk  ] info: pcmk_peer_update:
>> memb: server2 16820416
>> Oct 28 15:59:28 ns209045 corosync[16543]:   [pcmk  ] notice:
>> pcmk_peer_update: Stable membership event on ring 945392: memb=1, new=0,
>> lost=0
>> Oct 28 15:59:28 ns209045 corosync[16543]:   [pcmk  ] info: pcmk_peer_update:
>> MEMB: server2 16820416
>> Oct 28 15:59:28 ns209045 corosync[16543]:   [TOTEM ] A processor joined or
>> left the membership and a new membership was formed.
>> Oct 28 15:59:29 ns209045 corosync[16543]:   [pcmk  ] notice:
>> pcmk_peer_update: Transitional membership event on ring 945416: memb=1,
>> new=0, lost=0
>> Oct 28 15:59:29 ns209045 corosync[16543]:   [pcmk  ] info: pcmk_peer_update:
>> memb: server2 16820416
>> Oct 28 15:59:29 ns209045 corosync[16543]:   [pcmk  ] notice:
>> pcmk_peer_update: Stable membership event on ring 945416: memb=1, new=0,
>> lost=0
>> Oct 28 15:59:29 ns209045 corosync[16543]:   [pcmk  ] info: pcmk_peer_update:
>> MEMB: server2 16820416
>>
>> [...] Message repeat many many times
>>
>> Now i stop the server1, and i start the server2...server2 start correctly
>> and is added to the cluster...but when
>> i want to start server1, same thing happens...(so things are inverted but
>> result is the same...when i start one the serverX, the other can't start...)
>>
>> My corosync.conf is configured in broadcast, not multicast....I have lots of
>> problem with multicast because lots of briged VM on the vlan
>> doesn't see the multicast packets, or doesn't join the multicast group
>> correctly...
>>
>> Any hint on this ??
> corosync and auth files are the same on server2?
>

Yes of course :D (copied by scp), as i told server1 can join when 
server2 is offline, and server 2 can join when server1 is offline, but 
if one is online, the other can't join and log the above things in loop...

In fact i have loooooooottttttssssss of problem with 
corosync/pacemaker...multicast/broadcast between physical 
servers/virtual....lots of different shit everywhere, error log are 
always different depending on what i try...

The strange things is that the filer1 filer2 server2 and server1 are all 
running the same distro (gentoo) with same tools and are on the same 
vlan (which is working for lots of services like nfs...)