[Pacemaker] crm_resource -L not trustable right after restart
Andrew Beekhof
andrew at beekhof.net
Tue Feb 18 00:55:23 UTC 2014
On 22 Jan 2014, at 10:54 am, Brian J. Murrell (brian) <brian at interlinx.bc.ca> wrote:
> On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote:
>>
>> What crm_mon are you looking at?
>> I see stuff like:
>>
>> virt-fencing (stonith:fence_xvm): Started rhos4-node3
>> Resource Group: mysql-group
>> mysql-vip (ocf::heartbeat:IPaddr2): Started rhos4-node3
>> mysql-fs (ocf::heartbeat:Filesystem): Started rhos4-node3
>> mysql-db (ocf::heartbeat:mysql): Started rhos4-node3
>
> Yes, you are right. I couldn't see the forest for the trees.
>
> I initially was optimistic about crm_mon being more truthful than
> crm_resource but it turns out it is not.
It can't be, they're both obtaining their data from the same place (the cib).
>
> Take for example these commands to set a constraint and start a resource
> (which has already been defined at this point):
>
> [21/Jan/2014:13:46:40] cibadmin -o constraints -C -X '<rsc_location id="res1-primary" node="node5" rsc="res1" score="20"/>'
> [21/Jan/2014:13:46:41] cibadmin -o constraints -C -X '<rsc_location id="res1-secondary" node="node6" rsc="res1" score="10"/>'
> [21/Jan/2014:13:46:42] crm_resource -r 'res1' -p target-role -m -v 'Started'
>
> and then these repeated calls to crm_mon -1 on node5:
>
> [21/Jan/2014:13:46:42] crm_mon -1
> Last updated: Tue Jan 21 13:46:42 2014
> Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
> Stack: openais
> Current DC: node5 - partition with quorum
> Version: 1.1.10-14.el6_5.1-368c726
> 2 Nodes configured
> 2 Resources configured
>
>
> Online: [ node5 node6 ]
>
> st-fencing (stonith:fence_product): Started node5
> res1 (ocf::product:Target): Started node6
>
> [21/Jan/2014:13:46:42] crm_mon -1
> Last updated: Tue Jan 21 13:46:42 2014
> Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
> Stack: openais
> Current DC: node5 - partition with quorum
> Version: 1.1.10-14.el6_5.1-368c726
> 2 Nodes configured
> 2 Resources configured
>
>
> Online: [ node5 node6 ]
>
> st-fencing (stonith:fence_product): Started node5
> res1 (ocf::product:Target): Started node6
>
> [21/Jan/2014:13:46:49] crm_mon -1 -r
> Last updated: Tue Jan 21 13:46:49 2014
> Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
> Stack: openais
> Current DC: node5 - partition with quorum
> Version: 1.1.10-14.el6_5.1-368c726
> 2 Nodes configured
> 2 Resources configured
>
>
> Online: [ node5 node6 ]
>
> Full list of resources:
>
> st-fencing (stonith:fence_product): Started node5
> res1 (ocf::product:Target): Started node5
>
> The first two are not correct, showing the resource started on node6
> when it was actually started on node5.
Was it running there to begin with?
Answering my own question... yes. It was:
> Jan 21 13:46:41 node5 crmd[8695]: warning: status_from_rc: Action 6 (res1_monitor_0) on node6 failed (target: 7 vs. rc: 0): Error
and then we try to stop it:
> Jan 21 13:46:41 node5 crmd[8695]: notice: te_rsc_command: Initiating action 7: stop res1_stop_0 on node6
So you are correct that something is wrong, but it isn't pacemaker.
> Finally, 7 seconds later, it is
> reporting correctly. The logs on node{5,6} bear this out. The resource
> was actually only ever started on node5 and never on node6.
Wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140218/78a13a0f/attachment-0003.sig>
More information about the Pacemaker
mailing list