[Pacemaker] Unable to bring up cluster after crash and update

Jonathan jdccdevel at gmail.com
Mon Jul 27 12:43:34 EDT 2009


Not yet. I reverted to corosync/openais 1.0.0 and the issue went away,
so I am continuing my configuration and testing with those versions for now.


Andrew Beekhof wrote:
> Have you had a chance to try with Steve's latest corosync changes?
> He thinks it should now be working.
>
> On Fri, Jul 24, 2009 at 6:39 PM, Jonathan<jdccdevel at gmail.com> wrote:
>   
>> Sorry to reply to myself, but I wanted to update
>>
>> I modified my corosync.conf to set nodeid manually, and I am getting the
>> same error.
>>
>> I have noticed something interesting in the debug logs, which may
>> provide an indication of what is going on:
>>
>> The host name is Aries, and the nodeid in the config is 9001001
>>
>> Just before a bunch of forking happens we have the following:
>>
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:419 info: pcmk_plugin_init:
>> Local node id: 0
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:420 info: pcmk_plugin_init:
>> Local hostname: Aries
>> Jul 24 10:20:07 corosync [pcmk  ] utils.c:234 info: update_member:
>> Creating entry for node 0 born on 0
>> Jul 24 10:20:07 corosync [pcmk  ] utils.c:261 info: update_member:
>> 0x818be68 Node 0 now known as Aries (was: (null))
>> Jul 24 10:20:07 corosync [pcmk  ] utils.c:277 info: update_member: Node
>> Aries now has 1 quorum votes (was 0)
>> Jul 24 10:20:07 corosync [pcmk  ] utils.c:287 info: update_member: Node
>> 0/Aries is now: member
>>
>> Then, just before the segfault, we have:
>>
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:633 notice: pcmk_peer_update:
>> Stable membership event on ring 780: memb=1, new=1, lost=0
>> Jul 24 10:20:07 corosync [pcmk  ] utils.c:234 info: update_member:
>> Creating entry for node 9001001 born on 780
>> Jul 24 10:20:07 corosync [pcmk  ] utils.c:287 info: update_member: Node
>> 9001001/unknown is now: member
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:661 info: pcmk_peer_update:
>> NEW:  .pending. 9001001
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:667 debug: pcmk_peer_update:
>> Node 9001001 has address r(0) ip(172.29.1.1)
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:679 info: pcmk_peer_update:
>> MEMB: .pending. 9001001
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:596 info:
>> ais_mark_unseen_peer_dead: Node Aries was not seen in the previous
>> transition
>> Jul 24 10:20:07 corosync [pcmk  ] utils.c:287 info: update_member: Node
>> 0/Aries is now: lost
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:712 debug: pcmk_peer_update:
>> 2 nodes changed
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:1187 info:
>> send_member_notification: Sending membership update 780 to 0 children
>> Jul 24 10:20:07 corosync [pcmk  ] plugin.c:1439 CRIT: send_cluster_id:
>> Assertion failure line 1439: local_nodeid != 0
>>
>> This is a change event for 2 nodes, Node 0/Aries and 9001001/unknown
>>
>> There is only one node in the cluster at this point, so shouldn't the
>> node be 9001001/Aries ?
>>
>>
>> Jonathan wrote:
>>     
>>> I am not setting nodeid: in corosync.conf
>>>
>>> My understanding is that, in that case, the nodeid should be automaticly
>>> generated from the ip address of ring0
>>> Is that not the case?
>>>
>>> I have included a copy of my corosync.conf
>>>
>>> The only difference between the corosync.conf on the different nodes is
>>> the IP address.
>>>
>>> Jonathan
>>>
>>> Andrew Beekhof wrote:
>>>
>>>       
>>>> On Thu, Jul 23, 2009 at 5:36 AM, Jonathan deBoer<jono.deboer at gmail.com> wrote:
>>>>
>>>> I replied to this on the openais list
>>>>
>>>>
>>>>
>>>>         
>>>>> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:1439 CRIT: send_cluster_id: Assertion failure line 1439: local_nodeid != 0
>>>>>
>>>>>
>>>>>           
>>>> What value of nodeid: is set in corosync.conf?
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list
>>>> Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>>
>>>>
>>>>         
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>>       
>> --
>> J. deBoer Computer Consulting.
>> 42 Birchmont Dr.
>> Leduc, AB.
>> T9E-8S4
>> cell: 780-717-0669
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>     
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>   


-- 
J. deBoer Computer Consulting.
42 Birchmont Dr.
Leduc, AB.
T9E-8S4
cell: 780-717-0669 





More information about the Pacemaker mailing list