<br><br><div class="gmail_quote">On Mon, Oct 12, 2009 at 8:40 PM, Andrew Beekhof <span dir="ltr"><<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
The crmd process looks to have stalled.<br>
Can you re-run with debug turned on in openais.conf?<br>
<div><div></div><div class="h5"><br>
On Mon, Oct 12, 2009 at 6:09 PM, Stratos Zolotas <<a href="mailto:strzol@gmail.com">strzol@gmail.com</a>> wrote:<br>
><br>
><br>
> On Mon, Oct 12, 2009 at 5:57 PM, Dejan Muhamedagic <<a href="mailto:dejanmm@fastmail.fm">dejanmm@fastmail.fm</a>><br>
> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> On Mon, Oct 12, 2009 at 03:32:15PM +0300, Stratos Zolotas wrote:<br>
>> > On Mon, Oct 12, 2009 at 3:10 PM, Dejan Muhamedagic<br>
>> > <<a href="mailto:dejanmm@fastmail.fm">dejanmm@fastmail.fm</a>>wrote:<br>
>> ><br>
>> > > On Mon, Oct 12, 2009 at 02:57:29PM +0300, Stratos Zolotas wrote:<br>
>> > > > On Mon, Oct 12, 2009 at 2:51 PM, Dejan Muhamedagic<br>
>> > > > <<a href="mailto:dejanmm@fastmail.fm">dejanmm@fastmail.fm</a><br>
>> > > >wrote:<br>
>> > > ><br>
>> > > > > Hi,<br>
>> > > > ><br>
>> > > > > On Mon, Oct 12, 2009 at 02:42:25PM +0300, Stratos Zolotas wrote:<br>
>> > > > > > Hello to the list!!!<br>
>> > > > > ><br>
>> > > > > > This is my first question to the list and my first attempt to<br>
>> > > > > > built a<br>
>> > > two<br>
>> > > > > > node cluster on opensuse 11.1 with pacemaker 1.0.5 and openais<br>
>> > > 0.80.5, so<br>
>> > > > > > please forgive my lack of knowledge.<br>
>> > > > > ><br>
>> > > > > > I'm trying to build a Active/Passive scenario but i have the<br>
>> > > following on<br>
>> > > > > > both nodes:<br>
>> > > > > ><br>
>> > > > > > Oct 12 14:05:57 alpha kernel: crmd[30704]: segfault at 18 ip<br>
>> > > > > > 00007f7770526eee sp 00007fffc7379810 error 4 in<br>
>> > > > > > libplumb.so.2.0.0[7f777050a000+30000]<br>
>> > > > ><br>
>> > > > > It'd be excellent to see the backtrace, providing that there are<br>
>> > > > > core files. Please enable core file generation if there are none.<br>
>> > > > > If you don't know about backtraces, just use hb_report to capture<br>
>> > > > > it.<br>
>> > > > ><br>
>> > > > > > As result i'm getting the following:<br>
>> > > > ><br>
>> > > > > That's not the consequence of the previous problem.<br>
>> > > > ><br>
>> > > > > > alpha:/etc/ais # crm_mon --one-shot -V<br>
>> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources: No<br>
>> > > STONITH<br>
>> > > > > > resources have been defined<br>
>> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources:<br>
>> > > > > > Either<br>
>> > > > > > configure some or disable STONITH with the stonith-enabled<br>
>> > > > > > option<br>
>> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources:<br>
>> > > > > > NOTE:<br>
>> > > > > Clusters<br>
>> > > > > > with shared data need STONITH to ensure data integrity<br>
>> > > > ><br>
>> > > > > Thanks,<br>
>> > > > ><br>
>> > > > > Dejan<br>
>> > > > ><br>
>> > > > > ><br>
>> > > > > > ============<br>
>> > > > > > Last updated: Mon Oct 12 14:39:00 2009<br>
>> > > > > > Current DC: NONE<br>
>> > > > > > 0 Nodes configured, unknown expected votes<br>
>> > > > > > 0 Resources configured.<br>
>> > > > > > ============<br>
>> > > > > ><br>
>> > > > > > The errors are regarding the configuration (i have search about<br>
>> > > > > > them)<br>
>> > > > > that i<br>
>> > > > > > am unable to do at the moment because "crm configure" cannot<br>
>> > > > > > connect<br>
>> > > to<br>
>> > > > > the<br>
>> > > > > > cluster.<br>
>> > > > > ><br>
>> > > > > > Both nodes are running opensuse 11.1 x86_64 with the latest<br>
>> > > > > > updates<br>
>> > > and<br>
>> > > > > the<br>
>> > > > > > version that i said above.<br>
>> > > > > ><br>
>> > > > > > Any help is appreciated and please again forgive my lack of<br>
>> > > knowledge.<br>
>> > > > > ><br>
>> > > > > > Thank you in advance.<br>
>> > > > > ><br>
>> > > > > > Stratos.<br>
>> > > > ><br>
>> > > > > > _______________________________________________<br>
>> > > > > > Pacemaker mailing list<br>
>> > > > > > <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
>> > > > > > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
>> > > > ><br>
>> > > > ><br>
>> > > > > _______________________________________________<br>
>> > > > > Pacemaker mailing list<br>
>> > > > > <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
>> > > > > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
>> > > > ><br>
>> > > ><br>
>> > > ><br>
>> > > > Thank you for the immediate response. I know about the errors (I<br>
>> > > > have to<br>
>> > > > disable stonith on the config) but i cannot configure anything with<br>
>> > > > crm.<br>
>> > > > After commit i get something like "node did not respond"<br>
>> > > ><br>
>> > > > The problem is that there is no nodes as you can see after the<br>
>> > > > errors.<br>
>> > > ><br>
>> > > > I want to help to eliminate the problem, but i'm not a programmer.<br>
>> > > > So if<br>
>> > > you<br>
>> > > > can please guide me so i can execute hb_report and provide the<br>
>> > > > necessary<br>
>> > > > logs. When i have to execute hb_report and with what parametes?<br>
>> > ><br>
>> > > First check if you have core dumps:<br>
>> > ><br>
>> > > # ls -lR /var/lib/heartbeat/cores<br>
>> > ><br>
>> > > Then run<br>
>> > ><br>
>> > > # hb_report -f <time> -A -n "<nodes>" /tmp/problem-1<br>
>> > ><br>
>> > > Replace <time> with whichever time you started cluster at (say<br>
>> > > 13:00). <nodes> with a space separated list of nodes.<br>
>> > ><br>
>> > > Thanks,<br>
>> > ><br>
>> > > Dejan<br>
>> > ><br>
>> > > > Again please forgive my luck of knowledge (it is my first time with<br>
>> > > > clusters).<br>
>> > > ><br>
>> > > > Thanks again.<br>
>> > > ><br>
>> > > > Stratos.<br>
>> > ><br>
>> > > > _______________________________________________<br>
>> > > > Pacemaker mailing list<br>
>> > > > <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
>> > > > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
>> > ><br>
>> > ><br>
>> > > _______________________________________________<br>
>> > > Pacemaker mailing list<br>
>> > > <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
>> > > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
>> > ><br>
>> ><br>
>> > I don't think that there are any core dumps. The three folders returned<br>
>> > from<br>
>> > the command are empty.<br>
>> ><br>
>> > alpha:~ # ls -IR /var/lib/heartbeat/cores/<br>
>> > hacluster nobody root<br>
>> > alpha:~ #<br>
>> ><br>
>> > hb_report -f 15:27 -A -n "alpha bravo" -u root /root/problem-3<br>
>> ><br>
>> > returns<br>
>><br>
>> The magic is:<br>
>><br>
>> # ulimit -c unlimited<br>
>><br>
>> You should put it somewhere so that it is run on boot. For now,<br>
>> just run it before /etc/init.d/openais start.<br>
>><br>
>> > Password:<br>
>> > alpha: WARN: could not find the log file on alpha<br>
>> > Password: /etc/ha.d/shellfuncs: line 211: maketempdir: command not found<br>
>> > alpha: WARN: sorry, can't create temoary file for find_files<br>
>> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found<br>
>> > alpha: WARN: sorry, can't create temoary file for find_files<br>
>> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found<br>
>> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found<br>
>> > alpha: ERROR: cannot create temporary files<br>
>><br>
>> This looks funny. Can you please show the package versions? And<br>
>> where did the packages come from?<br>
>><br>
>> Thanks,<br>
>><br>
>> Dejan<br>
>><br>
>> > I have attached the generated folder as zip file, but with a quick look,<br>
>> > i<br>
>> > don't think that has something useful. Maybe its better to guide me how<br>
>> > to<br>
>> > produce dump core files.<br>
>> ><br>
>> > I have also tried without the -u option<br>
>> ><br>
>> > Thanks<br>
>> ><br>
>> > Stratos<br>
>> ><br>
>> ><br>
>> ><br>
>> > --<br>
>> > Kernel IT Solutions Ltd<br>
>> > <a href="http://www.kernelit.gr" target="_blank">http://www.kernelit.gr</a><br>
>> ><br>
>> > Cyclades Wireless Network<br>
>> > <a href="http://www.cywn.gr" target="_blank">http://www.cywn.gr</a><br>
>><br>
>><br>
>> > _______________________________________________<br>
>> > Pacemaker mailing list<br>
>> > <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
>> > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
>><br>
>><br>
>> _______________________________________________<br>
>> Pacemaker mailing list<br>
>> <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
>> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
> After i have reinstalled all the packages, i'm running for about half an<br>
> hour without segfault.<br>
><br>
> crm_mon still reports:<br>
> ============<br>
> Last updated: Mon Oct 12 19:02:43 2009<br>
> Current DC: NONE<br>
> 0 Nodes configured, unknown expected votes<br>
> 0 Resources configured.<br>
> ============<br>
><br>
> and when i try to "commit" a configuration (through crm configure) i get a<br>
> "Remote node did not respond"<br>
><br>
> What i have to to do to make the nodes appear? (at least until a segfault<br>
> occurs and we have a core dump)<br>
><br>
> I'm attaching my /var/log/messages from the first node after the last run of<br>
> openais.<br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Pacemaker mailing list<br>
> <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
><br>
<br>
_______________________________________________<br>
Pacemaker mailing list<br>
<a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
</div></div></blockquote></div><br>After restarting in debug mode i have a segfault.<br><br>I'm attaching a core file found in /var/lib/heartbeat/cores/hacluster.<br><br>Hope it helps....<br clear="all"><br>-- <br>Kernel IT Solutions Ltd<br>
<a href="http://www.kernelit.gr">http://www.kernelit.gr</a><br><br>Cyclades Wireless Network<br><a href="http://www.cywn.gr">http://www.cywn.gr</a><br>