[Pacemaker] stopped resource was judged to be active
    Kazunori INOUE 
    kazunori.inoue3 at gmail.com
       
    Mon Feb 17 23:46:03 EST 2014
    
    
  
2014-02-18 10:43 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>
> On 10 Feb 2014, at 5:28 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>
>> Hi,
>>
>> Pacemaker stopped, but it was judged that a resource was active.
>> I put crm_report here.
>> https://drive.google.com/file/d/0B9eNn1AWfKD4S29JWk1ldUJJNGs/edit?usp=sharing
>>
>> [Steps to reproduce]
>> 1) start up the cluster
>>
>> Stack: corosync
>> Current DC: bl460g1n7 (3232261593) - partition with quorum
>> Version: 1.1.10-21de3a0
>> 2 Nodes configured
>> 34 Resources configured
>>
>>
>> Online: [ bl460g1n6 bl460g1n7 ]
>>
>> Full list of resources:
>> ...snip...
>>
>>
>> * election-attrd exists in bl460g1n7.
>> Feb  4 14:06:38 bl460g1n7 attrd[28811]:     info: election_complete:
>> Election election-attrd complete
>>
>>
>> 2) banish election-attrd from DC node
>> I suppose that it is a condition that there are DC and election-attrd
>> in a different node.
>>
>> [bl460g1n7]$ pkill -9 attrd
>> Feb  4 14:07:15 bl460g1n6 attrd[16927]:     info: election_complete:
>> Election election-attrd complete
>>
>>
>> 3) stop DC ( after making a resource fail )
>> [bl460g1n7]$ stop pacemaker.combined
>> Feb  4 14:09:39 bl460g1n7 crmd[28813]:   notice: process_lrm_event:
>> LRM operation prmClone9_stop_0 (call=150, rc=0, cib-update=98,
>> confirmed=true) ok
>
> There are cases when <= .11 could loose resource updates like this.
> The subsequent behaviour by pacemaker (fencing the node) is correct but clearly suboptimal.
>
> Happily the same code that improves the CIB's performance also makes this impossible.
> So if you should find this problem gone if you try with the current git master.
>
OK, I'll try.
Thanks.
>> :
>> Feb  4 14:09:39 bl460g1n7 pacemakerd[28803]:     info: main: Exiting pacemakerd
>> Feb  4 14:09:39 bl460g1n7 pacemakerd[28803]:     info:
>> crm_xml_cleanup: Cleaning up memory from libxml2
>>
>> * pacemaker of bl460g1n7 stopped normally, but bl460g1n6 judged that a
>>  resource was active.
>> Feb  4 14:09:41 bl460g1n6 pengine[16928]:  warning: pe_fence_node:
>> Node bl460g1n7 will be fenced because prmClone9:0 is thought to be
>> active there
>>
>>
>> Best regards,
>> Kazunori INOUE
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
    
    
More information about the Pacemaker
mailing list