[Pacemaker] stonith pacemaker problem

Vladislav Bogdanov bubble at hoster-ok.com
Tue Oct 12 03:27:06 EDT 2010


12.10.2010 07:25, Andrew Beekhof wrote:
> On Mon, Oct 11, 2010 at 9:51 PM, Vladislav Bogdanov
> <bubble at hoster-ok.com> wrote:
>> 11.10.2010 09:14, Andrew Beekhof wrote:
>>> strictly speaking you don't.
>>> but at least on fedora, the policy is that $x-libs always requires $x
>>> so just building against heartbeat-libs means that yum will suck in
>>> the main heartbeat package :-(
>>
>> And this seem to be a bit incorrect statement btw:
> 
> no, you're wrong sorry.
> 
> From: http://fedoraproject.org/wiki/Packaging/ReviewGuidelines
> 
> "SHOULD: Usually, subpackages other than devel should require the base
> package using a fully versioned dependency."
> 
> http://fedoraproject.org/wiki/Packaging/Guidelines#RequiringBasePackage
> 
> In any case, its not something Pacemaker has control over.

I think this is where packaging guidelines seem to be incomplete.
Frankly speaking, "-libs" is not a "subpackage" in a general meaning,
but rather a "superpackage". Base package almost always requires "-libs".

That means that "-libs" is a kinda special case. Guidelines are valid
for subpackages like modules, "-data", "-docs", "-servers", "-clients",
whatever else.
You can run (one line):

rpm -qa|grep -- "-libs"|grep -v -- "-devel"| while read rpm ; do echo
"$rpm:"; rpm -q --requires $rpm ; done|grep -Ev "^(lib|rpmlib|rtld|/)"

And you'll see very small number of "-libs" packages which actually
require base package.
So majority of fedora packagers either do not follow guidelines or
realize that "-libs" is a different story.

Actually, what is the hidden meaning of splitting package to "base" and
"-libs" if "-libs" depend on "base"?

The main idea of a such split is to provide a way to have shared
libraries installed where main package (together which all its
dependencies) is not needed.

And I agree, this is for another mailing list anyways.
Falling silent...

> 
>> usually application
>> (binary) requires some libraries, and some of that libraries are
>> provided by -libs package which is built together with the binary. But,
>> libraries themselves require something from the main package very
>> rarely. That rare cases are configuration files which are read from
>> inside of libraries without straight request from an application. And
>> even in that case that configurations files are (should be) provided by
>> -common subpackage (which -libs can depend on).
>> The only point in such requirements is the licenses which are usually
>> included in main packages. But from my point of view nothing prevents
>> packager from including license file in %doc stanza for -libs too, so
>> any 'reverse' dependencies could be easily avoided, leaving only
>> 'straight' ones - what libraries actually depend on.
>> This is what I'm surprised from corosync, openais and pacemaker - I need
>> to install corosync and openais packages on development host only
>> because I need corresponding -libs and -devel packages. This is actually
>> not a usual for Fedora, and this is really not needed. The main idea of
>> -libs is to provide dso's which can be used by another applications
>> without need to install 'main' package (together with all daemons,
>> initscripts and dependencies on other libs). The same is for -devel - it
>> really need -libs because it provides .so symlinks to libs for ld, but
>> it shouldn't depend on main application.
>>
>> Best,
>> Vladislav
>>
>>>
>>> glad you found a path forward though
>>>
>>>>  understand that /usr/lib/ocf/resource.d/heartbeat has ocf scripts
>>>> provided by heartbeat but that can be part of the "Reusable cluster
>>>> agents" subsystem.
>>>>
>>>> Frankly I thought the way I had installed the system by erasing and
>>>> installing the fresh packages it should have worked.
>>>>
>>>> But all said and done I learned a lot of cluster code by gdbing it.
>>>> I'll be having a peaceful thanksgiving.
>>>>
>>>> Thanks and happy thanks giving.
>>>> Shravan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Oct 10, 2010 at 2:46 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>> Not enough information.
>>>>> We'd need more than just the lrmd's logs, they only show what happened not why.
>>>>>
>>>>> On Thu, Oct 7, 2010 at 11:02 PM, Shravan Mishra
>>>>> <shravan.mishra at gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Description of my environment:
>>>>>>   corosync=1.2.8
>>>>>>   pacemaker=1.1.3
>>>>>>   Linux= 2.6.29.6-0.6.smp.gcc4.1.x86_64 #1 SMP
>>>>>>
>>>>>>
>>>>>> We are having a problem with our pacemaker which is continuously
>>>>>> canceling the monitoring operation of our stonith devices.
>>>>>>
>>>>>> We ran:
>>>>>>
>>>>>> stonith -d -t external/safe/ipmi hostname=ha2.itactics.com
>>>>>> ipaddr=192.168.2.7 userid=hellouser passwd=hello interface=lanplus -S
>>>>>>
>>>>>> it's output is attached as stonith.output.
>>>>>>
>>>>>> We have been trying to debug this issue for  a few days now with no success.
>>>>>> We are hoping that someone can help us as we are under immense
>>>>>> pressure to move to RCS unless we can solve this issue in a day or two
>>>>>> ,which I personally don't want to because we like the product.
>>>>>>
>>>>>> Any help will be greatly appreciated.
>>>>>>
>>>>>>
>>>>>> Here is an excerpt from the /var/log/messages:
>>>>>> =========================
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11155: start
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11156: monitor
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
>>>>>> monitor[11156] on
>>>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>>>> userid=[safe_ipmi_admin]  cancelled
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11157: stop
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11158: start
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11159: monitor
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
>>>>>> monitor[11159] on
>>>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>>>> userid=[safe_ipmi_admin]  cancelled
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11160: stop
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11161: start
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11162: monitor
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
>>>>>> monitor[11162] on
>>>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>>>> userid=[safe_ipmi_admin]  cancelled
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11163: stop
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11164: start
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11165: monitor
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
>>>>>> monitor[11165] on
>>>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>>>> userid=[safe_ipmi_admin]  cancelled
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11166: stop
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11167: start
>>>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11168: monitor
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation
>>>>>> monitor[11168] on
>>>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>>>> userid=[safe_ipmi_admin]  cancelled
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11169: stop
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11170: start
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: stonithRA plugin: got
>>>>>> metadata: <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM
>>>>>> "ra-api-1.dtd"> <resource-agent name="external/safe/ipmi">
>>>>>> <version>1.0</version>   <longdesc lang="en"> ipmitool based power
>>>>>> management. Apparently, the power off method of ipmitool is
>>>>>> intercepted by ACPI which then makes a regular shutdown. If case of a
>>>>>> split brain on a two-node it may happen that no node survives. For
>>>>>> two-node clusters use only the reset method.    </longdesc>
>>>>>> <shortdesc lang="en">IPMI STONITH external device </shortdesc>
>>>>>> <parameters> <parameter name="hostname" unique="1"> <content
>>>>>> type="string" /> <shortdesc lang="en"> Hostname </shortdesc> <longdesc
>>>>>> lang="en"> The name of the host to be managed by this STONITH device.
>>>>>> </longdesc> </parameter>  <parameter name="ipaddr" unique="1">
>>>>>> <content type="string" /> <shortdesc lang="en"> IP Address
>>>>>> </shortdesc> <longdesc lang="en"> The IP address of the STONITH
>>>>>> device. </longdesc> </parameter>  <parameter name="userid" unique="1">
>>>>>> <content type="string" /> <shortdesc lang="en"> Login </shortdesc>
>>>>>> <longdesc lang="en"> The username used for logging in to the STONITH
>>>>>> device. </longdesc> </parameter>  <parameter name="passwd" unique="1">
>>>>>> <content type="string" /> <shortdesc lang="en"> Password </shortdesc>
>>>>>> <longdesc lang="en"> The password used for logging in to the STONITH
>>>>>> device. </longdesc> </parameter>  <parameter name="interface"
>>>>>> unique="1"> <content type="string" default="lan"/> <shortdesc
>>>>>> lang="en"> IPMI interface </shortdesc> <longdesc lang="en"> IPMI
>>>>>> interface to use, such as "lan" or "lanplus". </longdesc> </parameter>
>>>>>>  </parameters>    <actions>     <action name="start"   timeout="15" />
>>>>>>    <action name="stop"    timeout="15" />     <action name="status"
>>>>>> timeout="15" />     <action name="monitor" timeout="15" interval="15"
>>>>>> start-delay="15" />     <action name="meta-data"  timeout="15" />
>>>>>> </actions>   <special tag="heartbeat">     <version>2.0</version>
>>>>>> </special> </resource-agent>
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11171: monitor
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation
>>>>>> monitor[11171] on
>>>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>>>> userid=[safe_ipmi_admin]  cancelled
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11172: stop
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11173: start
>>>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
>>>>>> rsc:ha2.itactics.com-stonith:11174: monitor
>>>>>>
>>>>>> ==========================
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Shravan
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list