[Pacemaker] pacemaker segfault

Wed Dec 8 06:10:59 EST 2010

i build pacemaker from latest source and problem gone

2010/12/6 Dejan Muhamedagic <dejanmm at fastmail.fm>

> Hi,
>
> On Mon, Dec 06, 2010 at 03:11:03PM +0300, ruslan usifov wrote:
> > hello
> >
> > I run pacemaker on ubuntu (Ubuntu 10.04.1 LTS) with corosync, i installed
> it
> > from apt, and my pacemaker version is:
> >
> > root at storage0:/var/log# dpkg -l | grep 'pacemaker'
> > ii  pacemaker                           1.0.8+hg15494-2ubuntu2
>  HA
> > cluster resource manager
> >
> >
> > and have follow problem with pacemaker, with follow configration:
> > root at storage0:/var/log# crm configure show
> > node storage0
> > node storage1
> > primitive drbd_web ocf:linbit:drbd \
> >         params drbd_resource="web" \
> >         op monitor interval="10s" timeout="60s"
> > primitive iscsi_ip ocf:heartbeat:IPaddr2 \
> >         params ip="192.168.17.19" nic="eth1:1" cidr_netmask="24" \
> >         op monitor interval="10s" \
> >         meta target-role="Started"
> > primitive iscsi_web_target ocf:heartbeat:iSCSITarget \
> >         params iqn="iqn.2010-06.playrix.local:san.web"
> implementation="iet"
> > \
> >         op monitor interval="10s" timeout="30s" depth="0" \
> >         meta target-role="Started"
> > primitive iscsi_web_target_lun1 ocf:heartbeat:iSCSILogicalUnit \
> >         params lun="1" path="/dev/drbd1"
> > target_iqn="iqn.2010-06.playrix.local:san.web" implementation="iet" \
> >         op monitor interval="10s" timeout="30s"
> > group iscsi iscsi_ip iscsi_web_target iscsi_web_target_lun1
> > ms ms_drbd_web drbd_web \
> >         meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > colocation iscsi_on_drbd inf: ms_drbd_web:Master iscsi
> > order iscsi_target_after_drbd inf: ms_drbd_web:promote iscsi_web_target
> > order iscsi_target_lun_after_iscsi_target inf: iscsi_web_target
> > iscsi_web_target_lun1
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> >         cluster-infrastructure="openais" \
> >         expected-quorum-votes="2" \
> >         stonith-enabled="false" \
> >         no-quorum-policy="ignore"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="100"
> >
> >
> > When i shutdown node storage1, node storage0 doesn't  accept Master drbd
> > role, so output from crm_mon -1 lokks like this:
> > ============
> > Last updated: Mon Dec  6 15:04:18 2010
> > Stack: openais
> > Current DC: storage0 - partition WITHOUT quorum
> > Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
> > 2 Nodes configured, 2 expected votes
> > 2 Resources configured.
> > ============
> >
> > Online: [ storage0 ]
> > OFFLINE: [ storage1 ]
> >
> >  Master/Slave Set: ms_drbd_web
> >      Slaves: [ storage0 ]
> >      Stopped: [ drbd_web:1 ]
> >  Resource Group: iscsi
> >      iscsi_ip   (ocf::heartbeat:IPaddr2):       Started storage0
> >      iscsi_web_target   (ocf::heartbeat:iSCSITarget):   Started storage0
> >      iscsi_web_target_lun1      (ocf::heartbeat:iSCSILogicalUnit):
> > Started storage0 FAILED
> >
> > Failed actions:
> >     iscsi_web_target_lun1_start_0 (node=storage0, call=91, rc=1,
> > status=complete): unknown error
> >
> >
> > and when try to promote node got folow error:
> > crm(live)resource# promote ms_drbd_web
> > Error performing operation: Remote node did not respond
> >
> >
> > and periodicaly in /var/log/messages, i see folow error:
> > Dec  6 14:49:35 storage0 kernel: [ 5048.618562] pengine[8584]: segfault
> at 8
> > ip b76ad094 sp bf8261d0 error 4 in libpengine.so.3.0.0[b76a2000+32000]
> > Dec  6 14:50:37 storage0 kernel: [ 5111.505491] pengine[8681]: segfault
> at 0
> > ip b7831ef3 sp bfd28b30 error 4 in libpengine.so.3.0.0[b7821000+32000]
> > Dec  6 14:51:41 storage0 kernel: [ 5174.746349] pengine[8770]: segfault
> at 8
> > ip b7751094 sp bfe1ccb0 error 4 in libpengine.so.3.0.0[b7746000+32000]
> >
> >
> >
> > Why pacemacker doesn't switch role of live node to master? And why
> segfault
> > happens?
>
> Looks like you ran into problems because of segfaults. I suspect
> that the segfault has been fixed in the meantime, but hard to
> say unless you show the backtrace. Best to open a bugzilla with
> your vendor.
>
> Thanks,
>
> Dejan
>
>
> > Please help
>
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101208/1926c33d/attachment-0001.html>