[Pacemaker] Question on fix for Bug lf#2433

Stepan, Troy troy.stepan at unisys.com
Tue Aug 10 12:57:03 EDT 2010


Hi,

I applied the changeset for Bug lf#2433 (No services should be stopped until probes finish) to pacemaker 1.0.7-4.1.  Either I misinterpreted the bugfix or it's not working that way I thought it would.  While both of my dummy rscs are running, issuing a clean to dummy0 stops dummy1 (dummy1 is ordered after dummy0).  It looks like the stop is issued to dummy1 without waiting for the monitor of dummy0 to return.

CIB:

node $id="281aeabe-f895-4499-8e45-b380b3e82e0b" qpr1
node $id="c2493a06-ff09-40cb-b47d-04dae5a00802" qpr2
primitive dummy0 ocf:heartbeat:Dummy \
        op monitor interval="60s" timeout="120s"
primitive dummy1 ocf:heartbeat:Dummy \
        op monitor interval="60s" timeout="120s"
colocation col-dummy0_dummy1 inf: dummy0 dummy1
order order-dummy0_dummy1 inf: dummy0:start dummy1:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \

Syslog:

Aug 10 07:49:30 qpr1 crm_shadow: [27517]: info: Invoked: crm_shadow
Aug 10 07:49:30 qpr1 cibadmin: [27518]: info: Invoked: cibadmin -Ql -o nodes
Aug 10 07:49:30 qpr1 cibadmin: [27519]: info: Invoked: cibadmin -Ql -o resources
Aug 10 07:49:30 qpr1 crm_resource: [27520]: info: Invoked: crm_resource -C -r dummy0 -H qpr1
Aug 10 07:49:30 qpr1 crmd: [27131]: info: do_lrm_invoke: Removing resource dummy0 from the LRM
Aug 10 07:49:30 qpr1 crmd: [27131]: info: send_direct_ack: ACK'ing resource op dummy0_delete_60000 from 0:0:crm-resource-27520: lrm_invoke-lrmd-1281440970-14
Aug 10 07:49:30 qpr1 crmd: [27131]: info: lrm_remove_deleted_op: Removing op dummy0_monitor_60000:17 for deleted resource dummy0
Aug 10 07:49:31 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=5:14:7:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy0_monitor_0 )
Aug 10 07:49:31 qpr1 lrmd: [27128]: info: rsc:dummy0:20: monitor
Aug 10 07:49:31 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=9:14:0:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy1_stop_0 )
Aug 10 07:49:31 qpr1 lrmd: [27128]: info: rsc:dummy1:21: stop
Aug 10 07:49:31 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy1_monitor_60000 (call=19, status=1, cib-update=0, confirmed=true) Cancelled
Aug 10 07:49:31 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy0_monitor_0 (call=20, rc=0, cib-update=44, confirmed=true) ok
Aug 10 07:49:31 qpr1 crm_resource: [27526]: info: Invoked: crm_resource -C -r dummy0 -H qpr2
Aug 10 07:49:31 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy1_stop_0 (call=21, rc=0, cib-update=45, confirmed=true) ok
Aug 10 07:49:31 qpr1 cib: [27525]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-11.raw
Aug 10 07:49:31 qpr1 cib: [27525]: info: write_cib_contents: Wrote version 0.17.0 of the CIB to disk (digest: 4ab7e93936f66e0ac5bd95aeae3afcbe)
Aug 10 07:49:31 qpr1 cib: [27525]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.09ENqU (digest: /var/lib/heartbeat/crm/cib.dIFAa2)
Aug 10 07:49:32 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=8:15:0:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy0_monitor_60000 )
Aug 10 07:49:32 qpr1 lrmd: [27128]: info: rsc:dummy0:22: monitor
Aug 10 07:49:32 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=9:15:0:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy1_start_0 )
Aug 10 07:49:32 qpr1 lrmd: [27128]: info: rsc:dummy1:23: start
Aug 10 07:49:32 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy1_start_0 (call=23, rc=0, cib-update=46, confirmed=true) ok
Aug 10 07:49:32 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy0_monitor_60000 (call=22, rc=0, cib-update=47, confirmed=false) ok
Aug 10 07:49:32 qpr1 cib: [27536]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-12.raw
Aug 10 07:49:32 qpr1 cib: [27536]: info: write_cib_contents: Wrote version 0.18.0 of the CIB to disk (digest: 1da0b5df8907098f77d8819e16380257)
Aug 10 07:49:32 qpr1 cib: [27536]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.G4maW1 (digest: /var/lib/heartbeat/crm/cib.kfLsWb)
Aug 10 07:49:34 qpr1 crmd: [27131]: info: do_lrm_rsc_op: Performing key=10:15:0:78c6bc14-cfa7-4516-b291-610bf2ee22eb op=dummy1_monitor_60000 )
Aug 10 07:49:34 qpr1 lrmd: [27128]: info: rsc:dummy1:24: monitor
Aug 10 07:49:34 qpr1 crmd: [27131]: info: process_lrm_event: LRM operation dummy1_monitor_60000 (call=24, rc=0, cib-update=48, confirmed=false) ok

I also patched and tried pacemaker 1.0.6-1 as a sanity check (same result).  The cib files were deleted, the systems were rebooted and resources were recreated when switching versions.

Regards,
Troy





More information about the Pacemaker mailing list