[Pacemaker] stand_alone_ping stop Node start

Andrew Beekhof andrew at beekhof.net
Fri Nov 12 02:30:37 EST 2010


On Fri, Nov 12, 2010 at 6:46 AM, jiaju liu <liujiaju86 at yahoo.com.cn> wrote:

>
>
>
> > Hi
> > I reboot my node, and it appears
> > node2 pingd: [3932]: info: stand_alone_ping: Node 192.168.10.100 is
> > unreachable (read)
> > and the node could not start
> >
> >  192.168.10.100  is ib network I will start ib after the node start, so
> do
> > you have any idea let the node start first?Thanks very much.:-)
> >
> >
>
> Don't use IP resources as ping nodes.
> You should use the IP of something outside of your cluster, like an
> external
> router
>
> stand_alone_ping is start automatically, I have never start it by hand, so
> how to set it ping external router.
>
>
See where you set "192.168.10.100", set it to something else



> Thanks
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> >
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/4d7f3ea1/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 3
> Date: Thu, 11 Nov 2010 11:38:24 +0100
> From: Simon Jansen <simon.jansen1 at googlemail.com<http://cn.mc157.mail.yahoo.com/mc/compose?to=simon.jansen1@googlemail.com>
> >
> To: pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org>
> Subject: Re: [Pacemaker] Multistate Resources is not promoted
>     automatically
> Message-ID:
>     <AANLkTikwgMy4nutZ4807vv2x=nN_sMj+E8Y1PRu6X1eT at mail.gmail.com<http://cn.mc157.mail.yahoo.com/mc/compose?to=E8Y1PRu6X1eT@mail.gmail.com>
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Andrew,
>
> thank you for your answer.
>
> Does the ocf:heartbeat:Rsyslog script call crm_master?
> > It needs to to tell pacemaker which instance to promote.
> >
> Yes it does. But I forgot to call crm_master with the option -D in the stop
> action. I think that this was the error. After correcting this issue the ra
> starts as expected.
>
> Two questions though...
> > 1) Why use master/slave for rsyslog?
> >
> In the master role the rsyslog daemon should function as central log server
> and write the entries received on UDP port 514 into a MySQL database.
> On the passive node the rsyslog service should be started with the standard
> config.
> Do you think there is a better solution to solve this requirement?
>
>
> > 2) Is this an upstream RA? If not, you shouldn't be using the
> > ocf:heartbeat namespace.
> >
> Ok thank you for the advice. Should I use the pacemaker class instead or
> should I define a custom namespace?
>
> --
>
> Regards,
>
> Simon Jansen
>
>
> ---------------------------
> Simon Jansen
> 64291 Darmstadt
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/9e6d50bf/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 4
> Date: Thu, 11 Nov 2010 11:44:47 +0100
> From: Andrew Beekhof <andrew at beekhof.net<http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net>
> >
> To: The Pacemaker cluster resource manager
>     <pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org>
> >
> Subject: Re: [Pacemaker] Infinite fail-count and migration-threshold
>     after node fail-back
> Message-ID:
>     <AANLkTimmLWZMhKxSZCHu95x0d2WGnJcujN-B7eowbFXE at mail.gmail.com<http://cn.mc157.mail.yahoo.com/mc/compose?to=AANLkTimmLWZMhKxSZCHu95x0d2WGnJcujN-B7eowbFXE@mail.gmail.com>
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Mon, Oct 11, 2010 at 9:40 AM, Dan Frincu <dfrincu at streamwide.ro<http://cn.mc157.mail.yahoo.com/mc/compose?to=dfrincu@streamwide.ro>>
> wrote:
> > Hi all,
> >
> > I've managed to make this setup work, basically the issue with a
> > symmetric-cluster="false" and specifying the resources' location manually
> > means that the resources will always obey the location constraint, and
> (as
> > far as I could see) disregard the rsc_defaults resource-stickiness
> values.
>
> This definitely should not be the case.
> Possibly your stickiness setting is being eclipsed by the combination
> of the location constraint scores.
> Try INFINITY instead.
>
> > This behavior is not the expected one, in theory, setting
> > symmetric-cluster="false" should affect whether resources are allowed to
> run
> > anywhere by default and the resource-stickiness should lock in place the
> > resources so they don't bounce from node to node. Again, this didn't
> happen,
> > but by setting symmetric-cluster="true", using the same ordering and
> > collocation constraints and the resource-stickiness, the behavior is the
> > expected one.
> >
> > I don't remember seeing anywhere in the docs from clusterlabs.org being
> > mentioned that the resource-stickiness only works on
> > symmetric-cluster="true", so for anyone that also stumbles upon this
> issue,
> > I hope this helps.
> >
> > Regards,
> >
> > Dan
> >
> > Dan Frincu wrote:
> >>
> >> Hi,
> >>
> >> Since it was brought to my attention that I should upgrade from
> >> openais-0.80 to a more recent version of corosync, I've done just that,
> >> however I'm experiencing a strange behavior on the cluster.
> >>
> >> The same setup was used with the below packages:
> >>
> >> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"
> >> openais-0.80.5-15.2
> >> cluster-glue-1.0-12.2
> >> pacemaker-1.0.5-4.2
> >> cluster-glue-libs-1.0-12.2
> >> resource-agents-1.0-31.5
> >> pacemaker-libs-1.0.5-4.2
> >> pacemaker-mgmt-1.99.2-7.2
> >> libopenais2-0.80.5-15.2
> >> heartbeat-3.0.0-33.3
> >> pacemaker-mgmt-client-1.99.2-7.2
> >>
> >> Now I've migrated to the most recent stable packages I could find (on
> the
> >> clusterlabs.org website) for RHEL5:
> >>
> >> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"
> >> cluster-glue-1.0.6-1.6.el5
> >> pacemaker-libs-1.0.9.1-1.el5
> >> pacemaker-1.0.9.1-1.el5
> >> heartbeat-libs-3.0.3-2.el5
> >> heartbeat-3.0.3-2.el5
> >> openaislib-1.1.3-1.6.el5
> >> resource-agents-1.0.3-2.el5
> >> cluster-glue-libs-1.0.6-1.6.el5
> >> openais-1.1.3-1.6.el5
> >>
> >> Expected behavior:
> >> - all the resources the in group should go (based on location
> preference)
> >> to bench1
> >> - if bench1 goes down, resources migrate to bench2
> >> - if bench1 comes back up, resources stay on bench2, unless manually
> told
> >> otherwise.
> >>
> >> On the previous incantation, this worked, by using the new packages, not
> >> so much. Now if bench1 goes down (crm node standby `uname -n`), failover
> >> occurs, but when bench1 comes backup up, resources migrate back, even if
> >> default-resource-stickiness is set, and more than that, 2 drbd block
> devices
> >> reach infinite metrics, most notably because they try to promote the
> >> resources to a Master state on bench1, but fail to do so due to the
> resource
> >> being held open (by some process, I could not identify it).
> >>
> >> Strangely enough, the resources (drbd) fail to be promoted to a Master
> >> status on bench1, so they fail back to bench2, where they are mounted
> >> (functional), but crm_mon shows:
> >>
> >> Migration summary:
> >> * Node bench2.streamwide.ro:
> >> ?drbd_mysql:1: migration-threshold=1000000 fail-count=1000000
> >> ?drbd_home:1: migration-threshold=1000000 fail-count=1000000
> >> * Node bench1.streamwide.ro:
> >>
> >> .... infinite metrics on bench2, while the drbd resources are available
> >>
> >> version: 8.3.2 (api:88/proto:86-90)
> >> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
> >> mockbuild at v20z-x86-64.home.local<http://cn.mc157.mail.yahoo.com/mc/compose?to=mockbuild@v20z-x86-64.home.local>,
> 2009-08-29 14:07:55
> >> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
> >> ? ns:1632 nr:1864 dw:3512 dr:3933 al:11 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1
> >> wo:b oos:0
> >> 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
> >> ? ns:4 nr:24 dw:28 dr:25 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
> >> 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
> >> ? ns:4 nr:24 dw:28 dr:85 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
> >>
> >> and mounted
> >>
> >> /dev/drbd1 on /home type ext3 (rw,noatime,nodiratime)
> >> /dev/drbd0 on /mysql type ext3 (rw,noatime,nodiratime)
> >> /dev/drbd2 on /storage type ext3 (rw,noatime,nodiratime)
> >>
> >> Attached is the hb_report.
> >>
> >> Thank you in advance.
> >>
> >> Best regards
> >>
> >
> > --
> > Dan FRINCU
> > Systems Engineer
> > CCNA, RHCE
> > Streamwide Romania
>
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> >
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
>
>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 11 Nov 2010 11:46:42 +0100
> From: Andrew Beekhof <andrew at beekhof.net<http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net>
> >
> To: The Pacemaker cluster resource manager
>     <pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org>
> >
> Subject: Re: [Pacemaker] Multistate Resources is not promoted
>     automatically
> Message-ID:
>     <AANLkTinAqC-vNWHYCDRrRhgyh5JUtcJux5E9YBvrXZc6 at mail.gmail.com<http://cn.mc157.mail.yahoo.com/mc/compose?to=AANLkTinAqC-vNWHYCDRrRhgyh5JUtcJux5E9YBvrXZc6@mail.gmail.com>
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Thu, Nov 11, 2010 at 11:38 AM, Simon Jansen
> <simon.jansen1 at googlemail.com<http://cn.mc157.mail.yahoo.com/mc/compose?to=simon.jansen1@googlemail.com>>
> wrote:
> > Hi Andrew,
> >
> > thank you for your answer.
> >
> >> Does the ocf:heartbeat:Rsyslog script call crm_master?
> >> It needs to to tell pacemaker which instance to promote.
> >
> > Yes it does. But I forgot to call crm_master with the option -D in the
> stop
> > action. I think that this was the error. After correcting this issue the
> ra
> > starts as expected.
> >
> >> Two questions though...
> >> 1) Why use master/slave for rsyslog?
> >
> > In the master role the rsyslog daemon should function as central log
> server
> > and write the entries received on UDP port 514 into a MySQL database.
> > On the passive node the rsyslog service should be started with the
> standard
> > config.
>
> Interesting
>
> > Do you think there is a better solution to solve this requirement?
>
> No, I'd just never heard rsyslog being used in this way.
>
> >>
> >> 2) Is this an upstream RA? If not, you shouldn't be using the
> >> ocf:heartbeat namespace.
> >
> > Ok thank you for the advice. Should I use the pacemaker class instead or
> > should I define a custom namespace?
>
> Custom.
>
> >
> > --
> >
> > Regards,
> >
> > Simon Jansen
> >
> >
> > ---------------------------
> > Simon Jansen
> > 64291 Darmstadt
>
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> >
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
>
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 11 Nov 2010 11:47:35 +0100
> From: Andrew Beekhof <andrew at beekhof.net<http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net>
> >
> To: The Pacemaker cluster resource manager
>     <pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org>
> >
> Subject: Re: [Pacemaker] start error because "not installed" - stop
>     fails with "not installed" - stonith
> Message-ID:
>     <AANLkTikXwe6wS2F-LtLF3dvKjEt1gvPZ=5BSVNj1eZ2q at mail.gmail.com<http://cn.mc157.mail.yahoo.com/mc/compose?to=5BSVNj1eZ2q@mail.gmail.com>
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Sat, Oct 9, 2010 at 12:36 AM, Andreas Kurz <andreas.kurz at linbit.com<http://cn.mc157.mail.yahoo.com/mc/compose?to=andreas.kurz@linbit.com>>
> wrote:
> > Hello,
> >
> > if a resource has encounters a start error with rc=5 "not installed" the
> > stop action is not skipped before a restart is tried.
>
> I'd not expect a stop action at all.  What version?
>
> >
> > Typically in such a situation the stop will also fail with the same
> > error and the node will be fenced ?... even worse there is a good change
> > this happens on all remaining nodes e.g. if there is a typo in a
> parameter.
> >
> > I would expect the cluster to skip the stop action after a "not
> > installed" start failure followed by a start retry on a different node.
> >
> > So ... is this a feature or a bug? ;-)
> >
> > Regards,
> > Andreas
>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
>
>
>
> ------------------------------
>
> Message: 7
> Date: Thu, 11 Nov 2010 11:48:59 +0100
> From: Andrew Beekhof <andrew at beekhof.net<http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net>
> >
> To: The Pacemaker cluster resource manager
>     <pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org>
> >
> Subject: Re: [Pacemaker] [Problem]Number of times control of the
>     fail-count    is late.
> Message-ID:
>     <AANLkTinMfWBqmW_jcA8a+ic7zmfb6HMiEfBD1_SuEe=G at mail.gmail.com<http://cn.mc157.mail.yahoo.com/mc/compose?to=G@mail.gmail.com>
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Wed, Nov 10, 2010 at 5:20 AM,  <renayama19661014 at ybb.ne.jp<http://cn.mc157.mail.yahoo.com/mc/compose?to=renayama19661014@ybb.ne.jp>>
> wrote:
> > Hi,
> >
> > We constituted a cluster by two node constitution.
> > The migration-threshold set it to 2.
> >
> > We confirmed a phenomenon in the next procedure.
> >
> > Step1) Start two nodes and send config5.crm. (The clnDiskd-resources is
> original.)
> >
> > ============
> > Last updated: Tue Nov ?9 21:10:49 2010
> > Stack: Heartbeat
> > Current DC: srv02 (8c93dc22-a27e-409b-8112-4073de622daf) - partition with
> quorum
> > Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
> > 2 Nodes configured, unknown expected votes
> > 5 Resources configured.
> > ============
> >
> > Online: [ srv01 srv02 ]
> >
> > ?vip ? ?(ocf::heartbeat:IPaddr2): ? ? ? Started srv01
> > ?Clone Set: clnDiskd
> > ? ? Started: [ srv01 srv02 ]
> > ?Clone Set: clnDummy2
> > ? ? Started: [ srv01 srv02 ]
> > ?Clone Set: clnPingd1
> > ? ? Started: [ srv01 srv02 ]
> >
> > Node Attributes:
> > * Node srv01:
> > ? ?+ default_ping_set1 ? ? ? ? ? ? ? ? : 100
> > ? ?+ diskcheck_status_internal ? ? ? ? : normal
> > * Node srv02:
> > ? ?+ default_ping_set1 ? ? ? ? ? ? ? ? : 100
> > ? ?+ diskcheck_status_internal ? ? ? ? : normal
> >
> > Migration summary:
> > * Node srv02:
> > * Node srv01:
> >
> >
> > Step2) We edit a clnDummy2 resource to raise time-out in start. (add
> sleep)
> >
> > ?dummy_start() {
> > ? ?sleep 180 ----> add sleep
> > ? ?dummy_monitor
> > ? ?if [ $? = ?$OCF_SUCCESS ]; then
> >
> >
> > Step3) It causes a monitor error in a clnDummy2 resource.
> >
> > ?# rm -rf /var/run/Dummy-Dummy2.state
> >
> > Step4) clnDummy2 causes time-out by restart.
> >
> > But, as for clnDummy2, a lot of starts are up after time-out once when
> they watch log.
> > In fact, the reason is because pengine does not know that fail-count
> became INFINITY.
> >
> > Because the reason is because fail-count does not yet become INFINITY in
> pe-input-2001.bz2.
> > In pe-input-2002.bz2, fail-count becomes INFINITY.
> >
> > (snip)
> > Nov ?9 21:12:35 srv02 crmd: [5896]: WARN: status_from_rc: Action 25
> (Dummy2:0_start_0) on srv01 failed
> > (target: 0 vs. rc: -2): Error
> > Nov ?9 21:12:35 srv02 crmd: [5896]: WARN: update_failcount: Updating
> failcount for Dummy2:0 on srv01
> > after failed start: rc=-2 (update=INFINITY, time=1289304755)
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: abort_transition_graph:
> match_graph_event:272 - Triggered
> > transition abort (complete=0, tag=lrm_rsc_op, id=Dummy2:0_start_0,
> > magic=2:-2;25:5:0:275da7f9-7f43-43a2-8308-41d0ab78346e, cib=0.9.39) :
> Event failed
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: match_graph_event: Action
> Dummy2:0_start_0 (25) confirmed on
> > srv01 (rc=4)
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action
> 29 fired and confirmed
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: run_graph:
> > ====================================================
> > Nov ?9 21:12:35 srv02 crmd: [5896]: notice: run_graph: Transition 5
> (Complete=7, Pending=0, Fired=0,
> > Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-2000.bz2):
> Stopped
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_graph_trigger: Transition 5
> is now complete
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE ->
> > S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd
> ]
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: All 2
> cluster nodes are eligible to run
> > resources.
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_pe_invoke: Query 72:
> Requesting the current CIB:
> > S_POLICY_ENGINE
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_pe_invoke_callback: Invoking
> the PE: query=72,
> > ref=pe_calc-dc-1289304755-58, seq=2, quorate=1
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: unpack_config: On loss of
> CCM Quorum: Ignore
> > Nov ?9 21:12:35 srv02 pengine: [7208]: info: unpack_config: Node scores:
> 'red' = -INFINITY, 'yellow' =
> > 0, 'green' = 0
> > Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_nodes: Blind faith:
> not fencing unseen nodes
> > Nov ?9 21:12:35 srv02 pengine: [7208]: info: determine_online_status:
> Node srv02 is online
> > Nov ?9 21:12:35 srv02 pengine: [7208]: info: determine_online_status:
> Node srv01 is online
> > Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_rsc_op: Processing
> failed op
> > Dummy2:0_monitor_15000 on srv01: not running (7)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_rsc_op: Processing
> failed op Dummy2:0_start_0 on
> > srv01: unknown exec error (-2)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: Dummy ? ?
> ?(ocf::pacemaker:Dummy): Started
> > srv01
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: vip ? ? ?
> ?(ocf::heartbeat:IPaddr2): ? ? ? Started
> > srv01
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set:
> clnDiskd
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started:
> [ srv01 srv02 ]
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set:
> clnDummy2
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: ? ?
> ?Dummy2:0 ? ? ?(ocf::pacemaker:Dummy2):
> > Started srv01 FAILED
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started:
> [ srv02 ]
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set:
> clnPingd1
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started:
> [ srv01 srv02 ]
> > Nov ?9 21:12:35 srv02 pengine: [7208]: info: get_failcount: clnDummy2 has
> failed 1 times on srv01
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: common_apply_stickiness:
> clnDummy2 can fail 1 more
> > times on srv01 before being forced off
> > Nov ?9 21:12:35 srv02 pengine: [7208]: info: get_failcount: clnDummy2 has
> failed 1 times on srv01
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: common_apply_stickiness:
> clnDummy2 can fail 1 more
> > times on srv01 before being forced off
> > Nov ?9 21:12:35 srv02 pengine: [7208]: ERROR: unpack_operation:
> Specifying on_fail=fence and
> > stonith-enabled=false makes no sense
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: RecurringOp: ?Start
> recurring monitor (15s) for
> > Dummy2:0 on srv01
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource
> Dummy (Started srv01)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource
> vip ? (Started srv01)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource
> prmDiskd:0 ? ?(Started srv01)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource
> prmDiskd:1 ? ?(Started srv02)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Recover
> resource Dummy2:0 ? ?(Started srv01)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource
> Dummy2:1 ? ? ?(Started srv02)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource
> prmPingd1:0 ? (Started srv01)
> > Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource
> prmPingd1:1 ? (Started srv02)
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: State
> transition S_POLICY_ENGINE ->
> > S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response ]
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: unpack_graph: Unpacked
> transition 6: 8 actions in 8 synapses
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_te_invoke: Processing graph
> 6
> > (ref=pe_calc-dc-1289304755-58) derived from
> /var/lib/pengine/pe-input-2001.bz2
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action
> 30 fired and confirmed
> > Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_rsc_command: Initiating
> action 5: stop Dummy2:0_stop_0 on
> > srv01
> > Nov ?9 21:12:35 srv02 pengine: [7208]: info: process_pe_message:
> Transition 6: PEngine Input stored
> > in: /var/lib/pengine/pe-input-2001.bz2
> > Nov ?9 21:12:35 srv02 pengine: [7208]: info: process_pe_message:
> Configuration ERRORs found during PE
> > processing. ?Please run "crm_verify -L" to identify issues.
> > Nov ?9 21:12:37 srv02 attrd: [5895]: info: attrd_ha_callback: flush
> message from srv01
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: match_graph_event: Action
> Dummy2:0_stop_0 (5) confirmed on
> > srv01 (rc=0)
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action
> 31 fired and confirmed
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action
> 8 fired and confirmed
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action
> 28 fired and confirmed
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_rsc_command: Initiating
> action 24: start Dummy2:0_start_0
> > on srv01
> >
> > ?-----> Must not carry out this start.
> >
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: abort_transition_graph:
> te_update_diff:146 - Triggered
> > transition abort (complete=0, tag=transient_attributes,
> id=519bb7a2-3c31-414a-b6b2-eaef36a09fb7,
> > magic=NA, cib=0.9.41) : Transient attribute: update
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: update_abort_priority: Abort
> priority upgraded from 0 to
> > 1000000
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: update_abort_priority: Abort
> action done superceeded by
> > restart
> > Nov ?9 21:12:37 srv02 crmd: [5896]: info: abort_transition_graph:
> te_update_diff:146 - Triggered
> > transition abort (complete=0, tag=transient_attributes,
> id=519bb7a2-3c31-414a-b6b2-eaef36a09fb7,
> > magic=NA, cib=0.9.42) : Transient attribute: update
> > (snip)
> >
> > It seems to be a problem that update of fail-count was late.
> > But, this problem seems to occur by a timing.
> >
> > It affects it in fail over time of the resource that the control number
> of times of fail-count is
> > wrong.
> >
> > Is this problem already discussed?
>
> Not that I know of
>
> > Is not a delay of the update of fail-count which went by way of attrd a
> problem?
>
> Indeed.
>
> >
> > ?* I attach log and some pe-files at Bugzilla.
> > ?* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2520
>
> Ok, I'll follow up there.
>
> >
> > Best Regards,
> > Hideo Yamauchi.
>
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
>
>
>
> ------------------------------
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org<http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> End of Pacemaker Digest, Vol 36, Issue 34
> *****************************************
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101112/178871f9/attachment-0001.html>


More information about the Pacemaker mailing list