[Pacemaker] Question on fix for Bug lf#2433

Andrew Beekhof andrew at beekhof.net
Fri Aug 27 09:56:21 EDT 2010


On Tue, Aug 10, 2010 at 6:57 PM, Stepan, Troy <troy.stepan at unisys.com> wrote:
> Hi,
>
> I applied the changeset for Bug lf#2433 (No services should be stopped until probes finish) to pacemaker 1.0.7-4.1.

The PE is sufficiently complex that its quite normal for backports
like this not to have the intended result.
Its quite possible that this fix built upon another one from 1.0.8 or .9

If the problem persists with .9, please let me know.

> Either I misinterpreted the bugfix or it's not working that way I thought it would.  While both of my dummy rscs are running, issuing a clean to dummy0 stops dummy1 (dummy1 is ordered after dummy0).  It looks like the stop is issued to dummy1 without waiting for the monitor of dummy0 to return.
>
> CIB:
>
> node $id="281aeabe-f895-4499-8e45-b380b3e82e0b" qpr1
> node $id="c2493a06-ff09-40cb-b47d-04dae5a00802" qpr2
> primitive dummy0 ocf:heartbeat:Dummy \
>        op monitor interval="60s" timeout="120s"
> primitive dummy1 ocf:heartbeat:Dummy \
>        op monitor interval="60s" timeout="120s"
> colocation col-dummy0_dummy1 inf: dummy0 dummy1
> order order-dummy0_dummy1 inf: dummy0:start dummy1:start
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782" \
>        cluster-infrastructure="Heartbeat" \
>        stonith-enabled="false" \
>
> Syslog:
>
> Aug 10 07:49:30 qpr1 crm_shadow: [27517]: info: Invoked: crm_shadow
> Aug 10 07:49:30 qpr1 cibadmin: [27518]: info: Invoked: cibadmin -Ql -o nodes
> Aug 10 07:49:30 qpr1 cibadmin: [27519]: info: Invoked: cibadmin -Ql -o resources
> Aug 10 07:49:30 qpr1 crm_resource: [27520]: info: Invoked: crm_resource -C -r dummy0 -H qpr1
> Aug 10 07:49:30 qpr1 crmd: [27131]: info: do_lrm_invoke: Removing resource dummy0 from the LRM
> Aug 10 07:49:30 qpr1 crmd: [27131]: info: send_direct_ack: ACK'ing resource op dummy0_delete_60000 from 0:0:crm-resource-27520: lrm_invoke-lrmd-1281440970-14
> Aug 10 07:49:30 qpr1 crmd: [27131]: info: lrm_remove_deleted_op: Removing op dummy0_monitor_60000:17 for deleted resource dummy0
> Aug 10 07:49:31 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=5:14:7:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy0_monitor_0 )
> Aug 10 07:49:31 qpr1 lrmd: [27128]: info: rsc:dummy0:20: monitor
> Aug 10 07:49:31 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=9:14:0:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy1_stop_0 )
> Aug 10 07:49:31 qpr1 lrmd: [27128]: info: rsc:dummy1:21: stop
> Aug 10 07:49:31 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy1_monitor_60000 (call=19, status=1, cib-update=0, confirmed=true) Cancelled
> Aug 10 07:49:31 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy0_monitor_0 (call=20, rc=0, cib-update=44, confirmed=true) ok
> Aug 10 07:49:31 qpr1 crm_resource: [27526]: info: Invoked: crm_resource -C -r dummy0 -H qpr2
> Aug 10 07:49:31 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy1_stop_0 (call=21, rc=0, cib-update=45, confirmed=true) ok
> Aug 10 07:49:31 qpr1 cib: [27525]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-11.raw
> Aug 10 07:49:31 qpr1 cib: [27525]: info: write_cib_contents: Wrote version 0.17.0 of the CIB to disk (digest: 4ab7e93936f66e0ac5bd95aeae3afcbe)
> Aug 10 07:49:31 qpr1 cib: [27525]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.09ENqU (digest: /var/lib/heartbeat/crm/cib.dIFAa2)
> Aug 10 07:49:32 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=8:15:0:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy0_monitor_60000 )
> Aug 10 07:49:32 qpr1 lrmd: [27128]: info: rsc:dummy0:22: monitor
> Aug 10 07:49:32 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=9:15:0:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy1_start_0 )
> Aug 10 07:49:32 qpr1 lrmd: [27128]: info: rsc:dummy1:23: start
> Aug 10 07:49:32 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy1_start_0 (call=23, rc=0, cib-update=46, confirmed=true) ok
> Aug 10 07:49:32 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy0_monitor_60000 (call=22, rc=0, cib-update=47, confirmed=false) ok
> Aug 10 07:49:32 qpr1 cib: [27536]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-12.raw
> Aug 10 07:49:32 qpr1 cib: [27536]: info: write_cib_contents: Wrote version 0.18.0 of the CIB to disk (digest: 1da0b5df8907098f77d8819e16380257)
> Aug 10 07:49:32 qpr1 cib: [27536]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.G4maW1 (digest: /var/lib/heartbeat/crm/cib.kfLsWb)
> Aug 10 07:49:34 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=10:15:0:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy1_monitor_60000 )
> Aug 10 07:49:34 qpr1 lrmd: [27128]: info: rsc:dummy1:24: monitor
> Aug 10 07:49:34 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy1_monitor_60000 (call=24, rc=0, cib-update=48, confirmed=false) ok
>
> I also patched and tried pacemaker 1.0.6-1 as a sanity check (same result).  The cib files were deleted, the systems were rebooted and resources were recreated when switching versions.
>
> Regards,
> Troy
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




More information about the Pacemaker mailing list