[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

Andrew Beekhof andrew at beekhof.net
Tue Mar 11 17:43:28 EDT 2014

On 12 Mar 2014, at 8:40 am, Andrew Beekhof <andrew at beekhof.net> wrote:

> On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>> 07.03.2014 10:30, Vladislav Bogdanov wrote:
>>> 07.03.2014 05:43, Andrew Beekhof wrote:
>>>> On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>>> 18.02.2014 03:49, Andrew Beekhof wrote:
>>>>>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>>>> Hi, all
>>>>>>> I measure the performance of Pacemaker in the following combinations.
>>>>>>> Pacemaker-1.1.11.rc1
>>>>>>> libqb-0.16.0
>>>>>>> corosync-2.3.2
>>>>>>> All nodes are KVM virtual machines.
>>>>>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>>>>>> "virsh destroy vm01" was used for the stop.
>>>>>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>>>>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>>>>>> Probably best to poke the corosync guys about this.
>>>>>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>>>>>> I can easily imagine this staving corosync of resources and causing breakage.
>>>>>> I would _highly_ recommend retesting with the current git master of pacemaker.
>>>>>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
>>>>> Andrew, current git master (ee094a2) almost works, the only issue is
>>>>> that crm_diff calculates incorrect diff digest. If I replace digest in
>>>>> diff by hands with what cib calculates as "expected". it applies
>>>>> correctly. Otherwise - -206.
>>>> More details?
>>> Hmmm...
>>> seems to be crmsh-specific,
>>> Cannot reproduce with pure-XML editing.
>>> Kristoffer, does 
>>> http://hg.savannah.gnu.org/hgweb/crmsh/rev/c42d9361a310 address this?
>> The problem seems to be caused by the fact that crmsh does not provide
>> <status> section in both orig and new XMLs to crm_diff, and digest
>> generation seems to rely on that, so crm_diff and cib daemon produce
>> different digests.
>> Attached are two sets of XML files, one (orig.xml, new.xml, patch.xml)
>> are related to the full CIB operation (with status section included),
>> another (orig-edited.xml, new-edited.xml, patch-edited.xml) have that
>> section removed like crmsh does do.
>> Resulting diffs differ only by digest, and that seems to be the exact issue.
> This should help.  As long as crmsh isn't passing -c to crm_diff, then the digest will no longer be present.
>  https://github.com/beekhof/pacemaker/commit/c8d443d

Github seems to be doing something weird at the moment... here's the raw patch:

commit c8d443d8d1604dde2727cf716951231ed05926e4
Author: Andrew Beekhof <andrew at beekhof.net>
Date:   Wed Mar 12 08:38:58 2014 +1100

    Fix: crm_diff: Allow the generation of xml patchsets without digests

diff --git a/tools/xml_diff.c b/tools/xml_diff.c
index c8673b9..b98859e 100644
--- a/tools/xml_diff.c
+++ b/tools/xml_diff.c
@@ -199,7 +199,7 @@ main(int argc, char **argv)
         xml_calculate_changes(object_1, object_2);
         crm_log_xml_debug(object_2, xml_file_2?xml_file_2:"target");
-        output = xml_create_patchset(0, object_1, object_2, NULL, FALSE, TRUE);
+        output = xml_create_patchset(0, object_1, object_2, NULL, FALSE, as_cib);
         if(as_cib && output) {
             int add[] = { 0, 0, 0 };

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140312/964d90da/attachment-0003.sig>

More information about the Pacemaker mailing list