<br><br><div class="gmail_quote">On Fri, Nov 12, 2010 at 6:46 AM, jiaju liu <span dir="ltr"><<a href="mailto:liujiaju86@yahoo.com.cn">liujiaju86@yahoo.com.cn</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<table cellspacing="0" cellpadding="0" border="0"><tbody><tr><td valign="top" style="font:inherit"><br>

<blockquote style="padding-left:5px;margin-left:5px;border-left:rgb(16,16,255) 2px solid">

<div><div class="im"><br><br>> Hi<br>> I reboot my node, and it appears<br>> node2 pingd: [3932]: info: stand_alone_ping: Node 192.168.10.100 is<br>> unreachable (read)<br>> and the node could not start<br>

><br>>  192.168.10.100  is ib network I will start ib after the node start, so do<br>> you have any idea let the node start first?Thanks very much.:-)<br>><br>><br><br>Don't use IP resources as ping nodes.<br>

You should use the IP of something outside of your cluster, like an external<br>router<br><br></div>stand_alone_ping is start automatically, I have never start it by hand, so how to set it ping external router.</div></blockquote>

</td></tr></tbody></table></blockquote><div><br></div><div>See where you set "<span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; color: rgb(80, 0, 80); ">192.168.10.100", set it to something else</span></div>

<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; color: rgb(80, 0, 80); "><br></span></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<table cellspacing="0" cellpadding="0" border="0"><tbody><tr><td valign="top" style="font:inherit"><blockquote style="padding-left:5px;margin-left:5px;border-left:rgb(16,16,255) 2px solid">

<div>Thanks <br><div class="im">><br>><br>> _______________________________________________<br>> Pacemaker mailing list: <a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>><br>> Project Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>

> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>> Bugs:<br>> <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

><br>><br></div>-------------- next part

 --------------<br>An HTML attachment was scrubbed...<br>URL: <<a href="http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/4d7f3ea1/attachment-0001.htm" target="_blank">http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/4d7f3ea1/attachment-0001.htm</a>><br>

<br>------------------------------<br><br>Message: 3<br>Date: Thu, 11 Nov 2010 11:38:24 +0100<br>From: Simon Jansen <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=simon.jansen1@googlemail.com" target="_blank">simon.jansen1@googlemail.com</a>><br>

To: <a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a><br>Subject: Re: [Pacemaker] Multistate Resources is not promoted<br>    automatically<br>

Message-ID:<br>    <AANLkTikwgMy4nutZ4807vv2x=nN_sMj+<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=E8Y1PRu6X1eT@mail.gmail.com" target="_blank">E8Y1PRu6X1eT@mail.gmail.com</a>><br>Content-Type: text/plain; charset="iso-8859-1"<br>

<br>Hi Andrew,<br><br>thank you for your answer.<br><br>Does the ocf:heartbeat:Rsyslog script call crm_master?<br>> It needs to to tell pacemaker which instance to promote.<br>><br>Yes it does. But I forgot to call crm_master with the option -D in the stop<br>

action. I think that this was the error. After correcting this issue the ra<br>starts as expected.<br><br>Two questions though...<br>> 1) Why use master/slave for rsyslog?<br>><br>In the master role the rsyslog daemon should function as central log server<br>

and write the entries received on UDP port 514 into a MySQL database.<br>On the passive node the rsyslog service should be started with the standard<br>config.<br>Do you think there is a better solution to solve this

 requirement?<br><br><br>> 2) Is this an upstream RA? If not, you shouldn't be using the<br>> ocf:heartbeat namespace.<br>><br>Ok thank you for the advice. Should I use the pacemaker class instead or<br>should I define a custom namespace?<br>

<br>--<br><br>Regards,<br><br>Simon Jansen<br><br><br>---------------------------<br>Simon Jansen<br>64291 Darmstadt<br>-------------- next part --------------<br>An HTML attachment was scrubbed...<br>URL: <<a href="http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/9e6d50bf/attachment-0001.htm" target="_blank">http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/9e6d50bf/attachment-0001.htm</a>><br>

<br>------------------------------<br><br>Message: 4<br>Date: Thu, 11 Nov 2010 11:44:47 +0100<br>From: Andrew Beekhof <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>><br>

To: The

 Pacemaker cluster resource manager<br>    <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a>><br>Subject: Re: [Pacemaker] Infinite fail-count and migration-threshold<br>

    after node fail-back<br>Message-ID:<br>    <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=AANLkTimmLWZMhKxSZCHu95x0d2WGnJcujN-B7eowbFXE@mail.gmail.com" target="_blank">AANLkTimmLWZMhKxSZCHu95x0d2WGnJcujN-B7eowbFXE@mail.gmail.com</a>><br>

Content-Type: text/plain; charset=ISO-8859-1<br><br>On Mon, Oct 11, 2010 at 9:40 AM, Dan Frincu <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=dfrincu@streamwide.ro" target="_blank">dfrincu@streamwide.ro</a>> wrote:<br>

> Hi all,<br>><br>> I've managed to make this

 setup work, basically the issue with a<br>> symmetric-cluster="false" and specifying the resources' location manually<br>> means that the resources will always obey the location constraint, and (as<br>

> far as I could see) disregard the rsc_defaults resource-stickiness values.<br><br>This definitely should not be the case.<br>Possibly your stickiness setting is being eclipsed by the combination<br>of the location constraint scores.<br>

Try INFINITY instead.<br><br>> This behavior is not the expected one, in theory, setting<br>> symmetric-cluster="false" should affect whether resources are allowed to run<br>> anywhere by default and the resource-stickiness should lock in place the<br>

> resources so they don't bounce from node to node. Again, this didn't happen,<br>> but by setting symmetric-cluster="true", using the same ordering and<br>> collocation constraints and the resource-stickiness, the behavior is the<br>

> expected

 one.<br>><br>> I don't remember seeing anywhere in the docs from <a href="http://clusterlabs.org" target="_blank">clusterlabs.org</a> being<br>> mentioned that the resource-stickiness only works on<br>> symmetric-cluster="true", so for anyone that also stumbles upon this issue,<br>

> I hope this helps.<br>><br>> Regards,<br>><br>> Dan<br>><br>> Dan Frincu wrote:<br>>><br>>> Hi,<br>>><br>>> Since it was brought to my attention that I should upgrade from<br>

>> openais-0.80 to a more recent version of corosync, I've done just that,<br>>> however I'm experiencing a strange behavior on the cluster.<br>>><br>>> The same setup was used with the below packages:<br>

>><br>>> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"<br>>> openais-0.80.5-15.2<br>>> cluster-glue-1.0-12.2<br>>> pacemaker-1.0.5-4.2<br>>> cluster-glue-libs-1.0-12.2<br>

>> resource-agents-1.0-31.5<br>>>

 pacemaker-libs-1.0.5-4.2<br>>> pacemaker-mgmt-1.99.2-7.2<br>>> libopenais2-0.80.5-15.2<br>>> heartbeat-3.0.0-33.3<br>>> pacemaker-mgmt-client-1.99.2-7.2<br>>><br>>> Now I've migrated to the most recent stable packages I could find (on the<br>

>> <a href="http://clusterlabs.org" target="_blank">clusterlabs.org</a> website) for RHEL5:<br>>><br>>> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"<br>>> cluster-glue-1.0.6-1.6.el5<br>

>> pacemaker-libs-1.0.9.1-1.el5<br>>> pacemaker-1.0.9.1-1.el5<br>>> heartbeat-libs-3.0.3-2.el5<br>>> heartbeat-3.0.3-2.el5<br>>> openaislib-1.1.3-1.6.el5<br>>> resource-agents-1.0.3-2.el5<br>

>> cluster-glue-libs-1.0.6-1.6.el5<br>>> openais-1.1.3-1.6.el5<br>>><br>>> Expected behavior:<br>>> - all the resources the in group should go (based on location preference)<br>>> to bench1<br>

>> - if bench1 goes down, resources migrate to

 bench2<br>>> - if bench1 comes back up, resources stay on bench2, unless manually told<br>>> otherwise.<br>>><br>>> On the previous incantation, this worked, by using the new packages, not<br>>> so much. Now if bench1 goes down (crm node standby `uname -n`), failover<br>

>> occurs, but when bench1 comes backup up, resources migrate back, even if<br>>> default-resource-stickiness is set, and more than that, 2 drbd block devices<br>>> reach infinite metrics, most notably because they try to promote the<br>

>> resources to a Master state on bench1, but fail to do so due to the resource<br>>> being held open (by some process, I could not identify it).<br>>><br>>> Strangely enough, the resources (drbd) fail to be promoted to a Master<br>

>> status on bench1, so they fail back to bench2, where they are mounted<br>>> (functional), but crm_mon shows:<br>>><br>>>

 Migration summary:<br>>> * Node <a href="http://bench2.streamwide.ro" target="_blank">bench2.streamwide.ro</a>:<br>>> ?drbd_mysql:1: migration-threshold=1000000 fail-count=1000000<br>>> ?drbd_home:1: migration-threshold=1000000 fail-count=1000000<br>

>> * Node <a href="http://bench1.streamwide.ro" target="_blank">bench1.streamwide.ro</a>:<br>>><br>>> .... infinite metrics on bench2, while the drbd resources are available<br>>><br>>> version: 8.3.2 (api:88/proto:86-90)<br>

>> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by<br>>> <a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=mockbuild@v20z-x86-64.home.local" target="_blank">mockbuild@v20z-x86-64.home.local</a>, 2009-08-29 14:07:55<br>

>> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----<br>>> ? ns:1632 nr:1864 dw:3512 dr:3933 al:11 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1<br>>> wo:b oos:0<br>>> 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----<br>

>> ? ns:4

 nr:24 dw:28 dr:25 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0<br>>> 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----<br>>> ? ns:4 nr:24 dw:28 dr:85 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0<br>

>><br>>> and mounted<br>>><br>>> /dev/drbd1 on /home type ext3 (rw,noatime,nodiratime)<br>>> /dev/drbd0 on /mysql type ext3 (rw,noatime,nodiratime)<br>>> /dev/drbd2 on /storage type ext3 (rw,noatime,nodiratime)<br>

>><br>>> Attached is the hb_report.<br>>><br>>> Thank you in advance.<br>>><br>>> Best regards<br>>><br>><br>> --<br>> Dan FRINCU<br>> Systems Engineer<br>> CCNA, RHCE<br>

> Streamwide Romania<div class="im"><br>><br>><br>> _______________________________________________<br>> Pacemaker mailing list: <a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>><br>> Project Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>

> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>> Bugs:<br>> <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

><br><br><br><br></div>------------------------------<br><br>Message: 5<br>Date: Thu, 11 Nov 2010 11:46:42 +0100<br>From: Andrew Beekhof <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>><br>

To: The Pacemaker cluster resource manager<br>    <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a>><br>Subject: Re: [Pacemaker] Multistate Resources is not promoted<br>

    automatically<br>Message-ID:<br>    <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=AANLkTinAqC-vNWHYCDRrRhgyh5JUtcJux5E9YBvrXZc6@mail.gmail.com" target="_blank">AANLkTinAqC-vNWHYCDRrRhgyh5JUtcJux5E9YBvrXZc6@mail.gmail.com</a>><br>

Content-Type: text/plain; charset=ISO-8859-1<br><br>On Thu, Nov 11, 2010 at 11:38 AM, Simon Jansen<br><<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=simon.jansen1@googlemail.com" target="_blank">simon.jansen1@googlemail.com</a>> wrote:<br>

> Hi Andrew,<br>><br>> thank you for your answer.<br>><br>>> Does the ocf:heartbeat:Rsyslog script call crm_master?<br>>> It needs to to tell pacemaker which instance to promote.<br>><br>> Yes it does. But I forgot to call crm_master with the option -D in the stop<br>

> action. I think that this was the error. After correcting this issue the ra<br>> starts as expected.<br>><br>>> Two questions though...<br>>> 1) Why use master/slave for rsyslog?<br>><br>> In the master role the rsyslog daemon should function as central log server<br>

> and write the entries received on UDP port 514 into a MySQL database.<br>> On the passive node the rsyslog service should be started with the standard<br>> config.<br><br>Interesting<br><br>> Do you think there is a better solution to solve this

 requirement?<br><br>No, I'd just never heard rsyslog being used in this way.<br><br>>><br>>> 2) Is this an upstream RA? If not, you shouldn't be using the<br>>> ocf:heartbeat namespace.<br>><br>

> Ok thank you for the advice. Should I use the pacemaker class instead or<br>> should I define a custom namespace?<br><br>Custom.<br><br>><br>> --<br>><br>> Regards,<br>><br>> Simon Jansen<br>><br>

><br>> ---------------------------<br>> Simon Jansen<br>> 64291 Darmstadt<div class="im"><br>><br>><br>> _______________________________________________<br>> Pacemaker mailing list: <a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>><br>> Project

 Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

> Bugs:<br>> <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>><br>><br><br><br>

<br></div>------------------------------<br><br>Message: 6<br>Date: Thu, 11 Nov 2010 11:47:35 +0100<br>From: Andrew Beekhof <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>><br>

To: The Pacemaker cluster resource manager<br>    <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a>><br>Subject: Re: [Pacemaker] start error because "not installed" - stop<br>

    fails with "not installed" - stonith<br>Message-ID:<br>    <AANLkTikXwe6wS2F-LtLF3dvKjEt1gvPZ=<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=5BSVNj1eZ2q@mail.gmail.com" target="_blank">5BSVNj1eZ2q@mail.gmail.com</a>><br>

Content-Type: text/plain; charset=ISO-8859-1<br><br>On Sat, Oct 9, 2010 at 12:36 AM, Andreas Kurz <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andreas.kurz@linbit.com" target="_blank">andreas.kurz@linbit.com</a>> wrote:<br>

> Hello,<br>><br>> if a resource has encounters a start error with rc=5 "not installed" the<br>> stop action is not skipped before a restart is tried.<br><br>I'd not expect a stop action at all.  What version?<br>

<br>><br>>

 Typically in such a situation the stop will also fail with the same<br>> error and the node will be fenced ?... even worse there is a good change<br>> this happens on all remaining nodes e.g. if there is a typo in a parameter.<br>

><br>> I would expect the cluster to skip the stop action after a "not<br>> installed" start failure followed by a start retry on a different node.<br>><br>> So ... is this a feature or a bug? ;-)<br>

><br>> Regards,<br>> Andreas<div class="im"><br>><br>> _______________________________________________<br>> Pacemaker mailing list: <a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>><br>> Project Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>

> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>> Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

><br><br><br><br></div>------------------------------<br><br>Message: 7<br>Date: Thu, 11 Nov 2010 11:48:59 +0100<br>From: Andrew Beekhof <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>><br>

To: The Pacemaker cluster resource manager<br>    <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a>><br>Subject: Re: [Pacemaker] [Problem]Number of

 times control of the<br>    fail-count    is late.<br>Message-ID:<br>    <AANLkTinMfWBqmW_jcA8a+ic7zmfb6HMiEfBD1_SuEe=<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=G@mail.gmail.com" target="_blank">G@mail.gmail.com</a>><br>

Content-Type: text/plain; charset=ISO-8859-1<br><br>On Wed, Nov 10, 2010 at 5:20 AM,  <<a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=renayama19661014@ybb.ne.jp" target="_blank">renayama19661014@ybb.ne.jp</a>> wrote:<br>

> Hi,<br>><br>> We constituted a cluster by two node constitution.<br>> The migration-threshold set it to 2.<br>><br>> We confirmed a phenomenon in the next procedure.<br>><br>> Step1) Start two nodes and send config5.crm. (The clnDiskd-resources is original.)<br>

><br>> ============<br>> Last updated: Tue Nov ?9 21:10:49 2010<br>> Stack: Heartbeat<br>> Current

 DC: srv02 (8c93dc22-a27e-409b-8112-4073de622daf) - partition with quorum<br>> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438<br>> 2 Nodes configured, unknown expected votes<br>> 5 Resources configured.<br>

> ============<br>><br>> Online: [ srv01 srv02 ]<br>><br>> ?vip ? ?(ocf::heartbeat:IPaddr2): ? ? ? Started srv01<br>> ?Clone Set: clnDiskd<br>> ? ? Started: [ srv01 srv02 ]<br>> ?Clone Set: clnDummy2<br>

> ? ? Started: [ srv01 srv02 ]<br>> ?Clone Set: clnPingd1<br>> ? ? Started: [ srv01 srv02 ]<br>><br>> Node Attributes:<br>> * Node srv01:<br>> ? ?+ default_ping_set1 ? ? ? ? ? ? ? ? : 100<br>> ? ?+ diskcheck_status_internal ? ? ? ? : normal<br>

> * Node srv02:<br>> ? ?+ default_ping_set1 ? ? ? ? ? ? ? ? : 100<br>> ? ?+ diskcheck_status_internal ? ? ? ? : normal<br>><br>> Migration summary:<br>> * Node srv02:<br>> * Node srv01:<br>><br>><br>

> Step2) We edit a

 clnDummy2 resource to raise time-out in start. (add sleep)<br>><br>> ?dummy_start() {<br>> ? ?sleep 180 ----> add sleep<br>> ? ?dummy_monitor<br>> ? ?if [ $? = ?$OCF_SUCCESS ]; then<br>><br>><br>

> Step3) It causes a monitor error in a clnDummy2 resource.<br>><br>> ?# rm -rf /var/run/Dummy-Dummy2.state<br>><br>> Step4) clnDummy2 causes time-out by restart.<br>><br>> But, as for clnDummy2, a lot of starts are up after time-out once when they watch log.<br>

> In fact, the reason is because pengine does not know that fail-count became INFINITY.<br>><br>> Because the reason is because fail-count does not yet become INFINITY in pe-input-2001.bz2.<br>> In pe-input-2002.bz2, fail-count becomes INFINITY.<br>

><br>> (snip)<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: WARN: status_from_rc: Action 25 (Dummy2:0_start_0) on srv01 failed<br>> (target: 0 vs. rc: -2): Error<br>> Nov ?9 21:12:35 srv02

 crmd: [5896]: WARN: update_failcount: Updating failcount for Dummy2:0 on srv01<br>> after failed start: rc=-2 (update=INFINITY, time=1289304755)<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: abort_transition_graph: match_graph_event:272 - Triggered<br>

> transition abort (complete=0, tag=lrm_rsc_op, id=Dummy2:0_start_0,<br>> magic=2:-2;25:5:0:275da7f9-7f43-43a2-8308-41d0ab78346e, cib=0.9.39) : Event failed<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: match_graph_event: Action Dummy2:0_start_0 (25) confirmed on<br>

> srv01 (rc=4)<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 29 fired and confirmed<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: run_graph:<br>> ====================================================<br>

> Nov ?9 21:12:35 srv02 crmd: [5896]: notice: run_graph: Transition 5 (Complete=7, Pending=0, Fired=0,<br>> Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-2000.bz2):

 Stopped<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_graph_trigger: Transition 5 is now complete<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: State transition S_TRANSITION_ENGINE -><br>

> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: All 2 cluster nodes are eligible to run<br>> resources.<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_pe_invoke: Query 72: Requesting the current CIB:<br>

> S_POLICY_ENGINE<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_pe_invoke_callback: Invoking the PE: query=72,<br>> ref=pe_calc-dc-1289304755-58, seq=2, quorate=1<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: unpack_config: On loss of CCM Quorum: Ignore<br>

> Nov ?9 21:12:35 srv02 pengine: [7208]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' =<br>> 0, 'green' = 0<br>> Nov ?9

 21:12:35 srv02 pengine: [7208]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: determine_online_status: Node srv02 is online<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: determine_online_status: Node srv01 is online<br>

> Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_rsc_op: Processing failed op<br>> Dummy2:0_monitor_15000 on srv01: not running (7)<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_rsc_op: Processing failed op Dummy2:0_start_0 on<br>

> srv01: unknown exec error (-2)<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: Dummy ? ? ?(ocf::pacemaker:Dummy): Started<br>> srv01<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: vip ? ? ? ?(ocf::heartbeat:IPaddr2): ? ? ? Started<br>

> srv01<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnDiskd<br>> Nov ?9 21:12:35 srv02

 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv01 srv02 ]<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnDummy2<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: ? ? ?Dummy2:0 ? ? ?(ocf::pacemaker:Dummy2):<br>

> Started srv01 FAILED<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv02 ]<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnPingd1<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv01 srv02 ]<br>

> Nov ?9 21:12:35 srv02 pengine: [7208]: info: get_failcount: clnDummy2 has failed 1 times on srv01<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: common_apply_stickiness: clnDummy2 can fail 1 more<br>> times on srv01 before being forced off<br>

> Nov ?9 21:12:35 srv02 pengine: [7208]: info: get_failcount: clnDummy2 has failed 1 times on srv01<br>> Nov ?9

 21:12:35 srv02 pengine: [7208]: notice: common_apply_stickiness: clnDummy2 can fail 1 more<br>> times on srv01 before being forced off<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: ERROR: unpack_operation: Specifying on_fail=fence and<br>

> stonith-enabled=false makes no sense<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: RecurringOp: ?Start recurring monitor (15s) for<br>> Dummy2:0 on srv01<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource Dummy (Started srv01)<br>

> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource vip ? (Started srv01)<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmDiskd:0 ? ?(Started srv01)<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmDiskd:1 ? ?(Started srv02)<br>

> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Recover resource Dummy2:0 ? ?(Started srv01)<br>>

 Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource Dummy2:1 ? ? ?(Started srv02)<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmPingd1:0 ? (Started srv01)<br>

> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmPingd1:1 ? (Started srv02)<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: State transition S_POLICY_ENGINE -><br>

> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: unpack_graph: Unpacked transition 6: 8 actions in 8 synapses<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_te_invoke: Processing graph 6<br>

> (ref=pe_calc-dc-1289304755-58) derived from /var/lib/pengine/pe-input-2001.bz2<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 30 fired and confirmed<br>> Nov ?9 21:12:35 srv02 crmd: [5896]: info:

 te_rsc_command: Initiating action 5: stop Dummy2:0_stop_0 on<br>> srv01<br>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: process_pe_message: Transition 6: PEngine Input stored<br>> in: /var/lib/pengine/pe-input-2001.bz2<br>

> Nov ?9 21:12:35 srv02 pengine: [7208]: info: process_pe_message: Configuration ERRORs found during PE<br>> processing. ?Please run "crm_verify -L" to identify issues.<br>> Nov ?9 21:12:37 srv02 attrd: [5895]: info: attrd_ha_callback: flush message from srv01<br>

> Nov ?9 21:12:37 srv02 crmd: [5896]: info: match_graph_event: Action Dummy2:0_stop_0 (5) confirmed on<br>> srv01 (rc=0)<br>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 31 fired and confirmed<br>

> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 8 fired and confirmed<br>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 28 fired and confirmed<br>> Nov ?9

 21:12:37 srv02 crmd: [5896]: info: te_rsc_command: Initiating action 24: start Dummy2:0_start_0<br>> on srv01<br>><br>> ?-----> Must not carry out this start.<br>><br>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: abort_transition_graph: te_update_diff:146 - Triggered<br>

> transition abort (complete=0, tag=transient_attributes, id=519bb7a2-3c31-414a-b6b2-eaef36a09fb7,<br>> magic=NA, cib=0.9.41) : Transient attribute: update<br>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: update_abort_priority: Abort priority upgraded from 0 to<br>

> 1000000<br>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: update_abort_priority: Abort action done superceeded by<br>> restart<br>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: abort_transition_graph: te_update_diff:146 - Triggered<br>

> transition abort (complete=0, tag=transient_attributes, id=519bb7a2-3c31-414a-b6b2-eaef36a09fb7,<br>> magic=NA, cib=0.9.42) : Transient attribute:

 update<br>> (snip)<br>><br>> It seems to be a problem that update of fail-count was late.<br>> But, this problem seems to occur by a timing.<br>><br>> It affects it in fail over time of the resource that the control number of times of fail-count is<br>

> wrong.<br>><br>> Is this problem already discussed?<br><br>Not that I know of<br><br>> Is not a delay of the update of fail-count which went by way of attrd a problem?<br><br>Indeed.<br><br>><br>> ?* I attach log and some pe-files at Bugzilla.<br>

> ?* <a href="http://developerbugs.linux-foundation.org/show_bug.cgi?id=2520" target="_blank">http://developerbugs.linux-foundation.org/show_bug.cgi?id=2520</a><br><br>Ok, I'll follow up there.<br><br>><br>> Best Regards,<br>

> Hideo Yamauchi.<div class="im"><br>><br>><br>><br>> _______________________________________________<br>> Pacemaker mailing list: <a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>><br>> Project Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>

> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>> Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

><br><br><br><br></div>------------------------------<div class="im"><br><br>_______________________________________________<br>Pacemaker mailing list<br><a href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br><br><br></div>End of Pacemaker Digest, Vol 36, Issue 34<br>*****************************************<br>

</div></blockquote></td></tr></tbody></table><br>


       <br>_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

<br></blockquote></div><br>