[Pacemaker] stand_alone_ping stop Node start

Fri Nov 12 00:46:47 EST 2010

> Hi
> I reboot my node, and it appears
> node2 pingd: [3932]: info: stand_alone_ping: Node 192.168.10.100 is
> unreachable (read)
> and the node could not start
>
>  192.168.10.100  is ib network I will start ib after the node start, so do
> you have any idea let the node start first?Thanks very much.:-)
>
>

Don't use IP resources as ping nodes.
You should use the IP of something outside of your cluster, like an external
router

stand_alone_ping is start automatically, I have never start it by hand, so how to set it ping external router.
Thanks 
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/4d7f3ea1/attachment-0001.htm>

------------------------------

Message: 3
Date: Thu, 11 Nov 2010 11:38:24 +0100
From: Simon Jansen <simon.jansen1 at googlemail.com>
To: pacemaker at oss.clusterlabs.org
Subject: Re: [Pacemaker] Multistate Resources is not promoted
    automatically
Message-ID:
    <AANLkTikwgMy4nutZ4807vv2x=nN_sMj+E8Y1PRu6X1eT at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Andrew,

thank you for your answer.

Does the ocf:heartbeat:Rsyslog script call crm_master?
> It needs to to tell pacemaker which instance to promote.
>
Yes it does. But I forgot to call crm_master with the option -D in the stop
action. I think that this was the error. After correcting this issue the ra
starts as expected.

Two questions though...
> 1) Why use master/slave for rsyslog?
>
In the master role the rsyslog daemon should function as central log server
and write the entries received on UDP port 514 into a MySQL database.
On the passive node the rsyslog service should be started with the standard
config.
Do you think there is a better solution to solve this requirement?

> 2) Is this an upstream RA? If not, you shouldn't be using the
> ocf:heartbeat namespace.
>
Ok thank you for the advice. Should I use the pacemaker class instead or
should I define a custom namespace?

--

Regards,

Simon Jansen

---------------------------
Simon Jansen
64291 Darmstadt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/9e6d50bf/attachment-0001.htm>

------------------------------

Message: 4
Date: Thu, 11 Nov 2010 11:44:47 +0100
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager
    <pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] Infinite fail-count and migration-threshold
    after node fail-back
Message-ID:
    <AANLkTimmLWZMhKxSZCHu95x0d2WGnJcujN-B7eowbFXE at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Oct 11, 2010 at 9:40 AM, Dan Frincu <dfrincu at streamwide.ro> wrote:
> Hi all,
>
> I've managed to make this setup work, basically the issue with a
> symmetric-cluster="false" and specifying the resources' location manually
> means that the resources will always obey the location constraint, and (as
> far as I could see) disregard the rsc_defaults resource-stickiness values.

This definitely should not be the case.
Possibly your stickiness setting is being eclipsed by the combination
of the location constraint scores.
Try INFINITY instead.

> This behavior is not the expected one, in theory, setting
> symmetric-cluster="false" should affect whether resources are allowed to run
> anywhere by default and the resource-stickiness should lock in place the
> resources so they don't bounce from node to node. Again, this didn't happen,
> but by setting symmetric-cluster="true", using the same ordering and
> collocation constraints and the resource-stickiness, the behavior is the
> expected one.
>
> I don't remember seeing anywhere in the docs from clusterlabs.org being
> mentioned that the resource-stickiness only works on
> symmetric-cluster="true", so for anyone that also stumbles upon this issue,
> I hope this helps.
>
> Regards,
>
> Dan
>
> Dan Frincu wrote:
>>
>> Hi,
>>
>> Since it was brought to my attention that I should upgrade from
>> openais-0.80 to a more recent version of corosync, I've done just that,
>> however I'm experiencing a strange behavior on the cluster.
>>
>> The same setup was used with the below packages:
>>
>> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"
>> openais-0.80.5-15.2
>> cluster-glue-1.0-12.2
>> pacemaker-1.0.5-4.2
>> cluster-glue-libs-1.0-12.2
>> resource-agents-1.0-31.5
>> pacemaker-libs-1.0.5-4.2
>> pacemaker-mgmt-1.99.2-7.2
>> libopenais2-0.80.5-15.2
>> heartbeat-3.0.0-33.3
>> pacemaker-mgmt-client-1.99.2-7.2
>>
>> Now I've migrated to the most recent stable packages I could find (on the
>> clusterlabs.org website) for RHEL5:
>>
>> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"
>> cluster-glue-1.0.6-1.6.el5
>> pacemaker-libs-1.0.9.1-1.el5
>> pacemaker-1.0.9.1-1.el5
>> heartbeat-libs-3.0.3-2.el5
>> heartbeat-3.0.3-2.el5
>> openaislib-1.1.3-1.6.el5
>> resource-agents-1.0.3-2.el5
>> cluster-glue-libs-1.0.6-1.6.el5
>> openais-1.1.3-1.6.el5
>>
>> Expected behavior:
>> - all the resources the in group should go (based on location preference)
>> to bench1
>> - if bench1 goes down, resources migrate to bench2
>> - if bench1 comes back up, resources stay on bench2, unless manually told
>> otherwise.
>>
>> On the previous incantation, this worked, by using the new packages, not
>> so much. Now if bench1 goes down (crm node standby `uname -n`), failover
>> occurs, but when bench1 comes backup up, resources migrate back, even if
>> default-resource-stickiness is set, and more than that, 2 drbd block devices
>> reach infinite metrics, most notably because they try to promote the
>> resources to a Master state on bench1, but fail to do so due to the resource
>> being held open (by some process, I could not identify it).
>>
>> Strangely enough, the resources (drbd) fail to be promoted to a Master
>> status on bench1, so they fail back to bench2, where they are mounted
>> (functional), but crm_mon shows:
>>
>> Migration summary:
>> * Node bench2.streamwide.ro:
>> ?drbd_mysql:1: migration-threshold=1000000 fail-count=1000000
>> ?drbd_home:1: migration-threshold=1000000 fail-count=1000000
>> * Node bench1.streamwide.ro:
>>
>> .... infinite metrics on bench2, while the drbd resources are available
>>
>> version: 8.3.2 (api:88/proto:86-90)
>> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
>> mockbuild at v20z-x86-64.home.local, 2009-08-29 14:07:55
>> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
>> ? ns:1632 nr:1864 dw:3512 dr:3933 al:11 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1
>> wo:b oos:0
>> 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
>> ? ns:4 nr:24 dw:28 dr:25 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
>> 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
>> ? ns:4 nr:24 dw:28 dr:85 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
>>
>> and mounted
>>
>> /dev/drbd1 on /home type ext3 (rw,noatime,nodiratime)
>> /dev/drbd0 on /mysql type ext3 (rw,noatime,nodiratime)
>> /dev/drbd2 on /storage type ext3 (rw,noatime,nodiratime)
>>
>> Attached is the hb_report.
>>
>> Thank you in advance.
>>
>> Best regards
>>
>
> --
> Dan FRINCU
> Systems Engineer
> CCNA, RHCE
> Streamwide Romania
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

------------------------------

Message: 5
Date: Thu, 11 Nov 2010 11:46:42 +0100
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager
    <pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] Multistate Resources is not promoted
    automatically
Message-ID:
    <AANLkTinAqC-vNWHYCDRrRhgyh5JUtcJux5E9YBvrXZc6 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Nov 11, 2010 at 11:38 AM, Simon Jansen
<simon.jansen1 at googlemail.com> wrote:
> Hi Andrew,
>
> thank you for your answer.
>
>> Does the ocf:heartbeat:Rsyslog script call crm_master?
>> It needs to to tell pacemaker which instance to promote.
>
> Yes it does. But I forgot to call crm_master with the option -D in the stop
> action. I think that this was the error. After correcting this issue the ra
> starts as expected.
>
>> Two questions though...
>> 1) Why use master/slave for rsyslog?
>
> In the master role the rsyslog daemon should function as central log server
> and write the entries received on UDP port 514 into a MySQL database.
> On the passive node the rsyslog service should be started with the standard
> config.

Interesting

> Do you think there is a better solution to solve this requirement?

No, I'd just never heard rsyslog being used in this way.

>>
>> 2) Is this an upstream RA? If not, you shouldn't be using the
>> ocf:heartbeat namespace.
>
> Ok thank you for the advice. Should I use the pacemaker class instead or
> should I define a custom namespace?

Custom.

>
> --
>
> Regards,
>
> Simon Jansen
>
>
> ---------------------------
> Simon Jansen
> 64291 Darmstadt
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

------------------------------

Message: 6
Date: Thu, 11 Nov 2010 11:47:35 +0100
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager
    <pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] start error because "not installed" - stop
    fails with "not installed" - stonith
Message-ID:
    <AANLkTikXwe6wS2F-LtLF3dvKjEt1gvPZ=5BSVNj1eZ2q at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Sat, Oct 9, 2010 at 12:36 AM, Andreas Kurz <andreas.kurz at linbit.com> wrote:
> Hello,
>
> if a resource has encounters a start error with rc=5 "not installed" the
> stop action is not skipped before a restart is tried.

I'd not expect a stop action at all.  What version?

>
> Typically in such a situation the stop will also fail with the same
> error and the node will be fenced ?... even worse there is a good change
> this happens on all remaining nodes e.g. if there is a typo in a parameter.
>
> I would expect the cluster to skip the stop action after a "not
> installed" start failure followed by a start retry on a different node.
>
> So ... is this a feature or a bug? ;-)
>
> Regards,
> Andreas
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

------------------------------

Message: 7
Date: Thu, 11 Nov 2010 11:48:59 +0100
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager
    <pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] [Problem]Number of times control of the
    fail-count    is late.
Message-ID:
    <AANLkTinMfWBqmW_jcA8a+ic7zmfb6HMiEfBD1_SuEe=G at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Nov 10, 2010 at 5:20 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> Hi,
>
> We constituted a cluster by two node constitution.
> The migration-threshold set it to 2.
>
> We confirmed a phenomenon in the next procedure.
>
> Step1) Start two nodes and send config5.crm. (The clnDiskd-resources is original.)
>
> ============
> Last updated: Tue Nov ?9 21:10:49 2010
> Stack: Heartbeat
> Current DC: srv02 (8c93dc22-a27e-409b-8112-4073de622daf) - partition with quorum
> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
> 2 Nodes configured, unknown expected votes
> 5 Resources configured.
> ============
>
> Online: [ srv01 srv02 ]
>
> ?vip ? ?(ocf::heartbeat:IPaddr2): ? ? ? Started srv01
> ?Clone Set: clnDiskd
> ? ? Started: [ srv01 srv02 ]
> ?Clone Set: clnDummy2
> ? ? Started: [ srv01 srv02 ]
> ?Clone Set: clnPingd1
> ? ? Started: [ srv01 srv02 ]
>
> Node Attributes:
> * Node srv01:
> ? ?+ default_ping_set1 ? ? ? ? ? ? ? ? : 100
> ? ?+ diskcheck_status_internal ? ? ? ? : normal
> * Node srv02:
> ? ?+ default_ping_set1 ? ? ? ? ? ? ? ? : 100
> ? ?+ diskcheck_status_internal ? ? ? ? : normal
>
> Migration summary:
> * Node srv02:
> * Node srv01:
>
>
> Step2) We edit a clnDummy2 resource to raise time-out in start. (add sleep)
>
> ?dummy_start() {
> ? ?sleep 180 ----> add sleep
> ? ?dummy_monitor
> ? ?if [ $? = ?$OCF_SUCCESS ]; then
>
>
> Step3) It causes a monitor error in a clnDummy2 resource.
>
> ?# rm -rf /var/run/Dummy-Dummy2.state
>
> Step4) clnDummy2 causes time-out by restart.
>
> But, as for clnDummy2, a lot of starts are up after time-out once when they watch log.
> In fact, the reason is because pengine does not know that fail-count became INFINITY.
>
> Because the reason is because fail-count does not yet become INFINITY in pe-input-2001.bz2.
> In pe-input-2002.bz2, fail-count becomes INFINITY.
>
> (snip)
> Nov ?9 21:12:35 srv02 crmd: [5896]: WARN: status_from_rc: Action 25 (Dummy2:0_start_0) on srv01 failed
> (target: 0 vs. rc: -2): Error
> Nov ?9 21:12:35 srv02 crmd: [5896]: WARN: update_failcount: Updating failcount for Dummy2:0 on srv01
> after failed start: rc=-2 (update=INFINITY, time=1289304755)
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: abort_transition_graph: match_graph_event:272 - Triggered
> transition abort (complete=0, tag=lrm_rsc_op, id=Dummy2:0_start_0,
> magic=2:-2;25:5:0:275da7f9-7f43-43a2-8308-41d0ab78346e, cib=0.9.39) : Event failed
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: match_graph_event: Action Dummy2:0_start_0 (25) confirmed on
> srv01 (rc=4)
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 29 fired and confirmed
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: run_graph:
> ====================================================
> Nov ?9 21:12:35 srv02 crmd: [5896]: notice: run_graph: Transition 5 (Complete=7, Pending=0, Fired=0,
> Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-2000.bz2): Stopped
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_graph_trigger: Transition 5 is now complete
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: State transition S_TRANSITION_ENGINE ->
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: All 2 cluster nodes are eligible to run
> resources.
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_pe_invoke: Query 72: Requesting the current CIB:
> S_POLICY_ENGINE
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_pe_invoke_callback: Invoking the PE: query=72,
> ref=pe_calc-dc-1289304755-58, seq=2, quorate=1
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Nov ?9 21:12:35 srv02 pengine: [7208]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' =
> 0, 'green' = 0
> Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> Nov ?9 21:12:35 srv02 pengine: [7208]: info: determine_online_status: Node srv02 is online
> Nov ?9 21:12:35 srv02 pengine: [7208]: info: determine_online_status: Node srv01 is online
> Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_rsc_op: Processing failed op
> Dummy2:0_monitor_15000 on srv01: not running (7)
> Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_rsc_op: Processing failed op Dummy2:0_start_0 on
> srv01: unknown exec error (-2)
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: Dummy ? ? ?(ocf::pacemaker:Dummy): Started
> srv01
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: vip ? ? ? ?(ocf::heartbeat:IPaddr2): ? ? ? Started
> srv01
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnDiskd
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv01 srv02 ]
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnDummy2
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: ? ? ?Dummy2:0 ? ? ?(ocf::pacemaker:Dummy2):
> Started srv01 FAILED
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv02 ]
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnPingd1
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv01 srv02 ]
> Nov ?9 21:12:35 srv02 pengine: [7208]: info: get_failcount: clnDummy2 has failed 1 times on srv01
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: common_apply_stickiness: clnDummy2 can fail 1 more
> times on srv01 before being forced off
> Nov ?9 21:12:35 srv02 pengine: [7208]: info: get_failcount: clnDummy2 has failed 1 times on srv01
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: common_apply_stickiness: clnDummy2 can fail 1 more
> times on srv01 before being forced off
> Nov ?9 21:12:35 srv02 pengine: [7208]: ERROR: unpack_operation: Specifying on_fail=fence and
> stonith-enabled=false makes no sense
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: RecurringOp: ?Start recurring monitor (15s) for
> Dummy2:0 on srv01
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource Dummy (Started srv01)
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource vip ? (Started srv01)
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmDiskd:0 ? ?(Started srv01)
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmDiskd:1 ? ?(Started srv02)
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Recover resource Dummy2:0 ? ?(Started srv01)
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource Dummy2:1 ? ? ?(Started srv02)
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmPingd1:0 ? (Started srv01)
> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmPingd1:1 ? (Started srv02)
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: unpack_graph: Unpacked transition 6: 8 actions in 8 synapses
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_te_invoke: Processing graph 6
> (ref=pe_calc-dc-1289304755-58) derived from /var/lib/pengine/pe-input-2001.bz2
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 30 fired and confirmed
> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_rsc_command: Initiating action 5: stop Dummy2:0_stop_0 on
> srv01
> Nov ?9 21:12:35 srv02 pengine: [7208]: info: process_pe_message: Transition 6: PEngine Input stored
> in: /var/lib/pengine/pe-input-2001.bz2
> Nov ?9 21:12:35 srv02 pengine: [7208]: info: process_pe_message: Configuration ERRORs found during PE
> processing. ?Please run "crm_verify -L" to identify issues.
> Nov ?9 21:12:37 srv02 attrd: [5895]: info: attrd_ha_callback: flush message from srv01
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: match_graph_event: Action Dummy2:0_stop_0 (5) confirmed on
> srv01 (rc=0)
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 31 fired and confirmed
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 8 fired and confirmed
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 28 fired and confirmed
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_rsc_command: Initiating action 24: start Dummy2:0_start_0
> on srv01
>
> ?-----> Must not carry out this start.
>
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: abort_transition_graph: te_update_diff:146 - Triggered
> transition abort (complete=0, tag=transient_attributes, id=519bb7a2-3c31-414a-b6b2-eaef36a09fb7,
> magic=NA, cib=0.9.41) : Transient attribute: update
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: update_abort_priority: Abort priority upgraded from 0 to
> 1000000
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: update_abort_priority: Abort action done superceeded by
> restart
> Nov ?9 21:12:37 srv02 crmd: [5896]: info: abort_transition_graph: te_update_diff:146 - Triggered
> transition abort (complete=0, tag=transient_attributes, id=519bb7a2-3c31-414a-b6b2-eaef36a09fb7,
> magic=NA, cib=0.9.42) : Transient attribute: update
> (snip)
>
> It seems to be a problem that update of fail-count was late.
> But, this problem seems to occur by a timing.
>
> It affects it in fail over time of the resource that the control number of times of fail-count is
> wrong.
>
> Is this problem already discussed?

Not that I know of

> Is not a delay of the update of fail-count which went by way of attrd a problem?

Indeed.

>
> ?* I attach log and some pe-files at Bugzilla.
> ?* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2520

Ok, I'll follow up there.

>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

End of Pacemaker Digest, Vol 36, Issue 34
*****************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101112/f38f0c16/attachment-0001.html>