[Pacemaker] Question about the resource to fence a node
Andrew Beekhof
andrew at beekhof.net
Thu Nov 14 19:36:18 EST 2013
On 14 Nov 2013, at 5:53 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
> Hi, Andrew
>
> 2013/11/13 Kazunori INOUE <kazunori.inoue3 at gmail.com>:
>> 2013/11/13 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 16 Oct 2013, at 8:51 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>
>>>>
>>>> On 15/10/2013, at 8:24 PM, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using pacemaker-1.1 (the latest devel).
>>>>> I started resource (f1 and f2) which fence vm3 on vm1.
>>>>>
>>>>> $ crm_mon -1
>>>>> Last updated: Tue Oct 15 15:16:37 2013
>>>>> Last change: Tue Oct 15 15:16:21 2013 via crmd on vm1
>>>>> Stack: corosync
>>>>> Current DC: vm1 (3232261517) - partition with quorum
>>>>> Version: 1.1.11-0.284.6a5e863.git.el6-6a5e863
>>>>> 3 Nodes configured
>>>>> 3 Resources configured
>>>>>
>>>>> Online: [ vm1 vm2 vm3 ]
>>>>>
>>>>> pDummy (ocf::pacemaker:Dummy): Started vm3
>>>>> Resource Group: gStonith3
>>>>> f1 (stonith:external/libvirt): Started vm1
>>>>> f2 (stonith:external/ssh): Started vm1
>>>>>
>>>>>
>>>>> "reset" of f1 which hasn't been started on vm2 was performed when vm3 is fenced.
>>>>>
>>>>> $ ssh vm3 'rm -f /var/run/Dummy-pDummy.state'
>>>>> $ for i in vm1 vm2; do ssh $i 'hostname; egrep " reset | off "
>>>>> /var/log/ha-log'; done
>>>>> vm1
>>>>> Oct 15 15:17:35 vm1 stonith-ng[14870]: warning: log_operation:
>>>>> f2:15076 [ Performing: stonith -t external/ssh -T reset vm3 ]
>>>>> Oct 15 15:18:06 vm1 stonith-ng[14870]: warning: log_operation:
>>>>> f2:15464 [ Performing: stonith -t external/ssh -T reset vm3 ]
>>>>> vm2
>>>>> Oct 15 15:17:16 vm2 stonith-ng[9160]: warning: log_operation: f1:9273
>>>>> [ Performing: stonith -t external/libvirt -T reset vm3 ]
>>>>> Oct 15 15:17:46 vm2 stonith-ng[9160]: warning: log_operation: f1:9588
>>>>> [ Performing: stonith -t external/libvirt -T reset vm3 ]
>>>>>
>>>>> Is it specifications?
>>>>
>>>> Yes, although the host on which the device is started usually gets priority.
>>>> I will try to find some time to look through the report to see why this didn't happen.
>>>
>>> Reading through this again, it sounds like it should be fixed by your earlier pull request:
>>>
>>> https://github.com/beekhof/pacemaker/commit/6b4bfd6
>>>
>>> Yes?
>>
>> No.
>
> How is this change?
Thanks for this. I tweaked it a bit further and pushed:
https://github.com/beekhof/pacemaker/commit/4cbbeb0
>
> diff --git a/fencing/remote.c b/fencing/remote.c
> index 6c11ba9..68b31c5 100644
> --- a/fencing/remote.c
> +++ b/fencing/remote.c
> @@ -778,6 +778,7 @@ stonith_choose_peer(remote_fencing_op_t * op)
> {
> st_query_result_t *peer = NULL;
> const char *device = NULL;
> + uint32_t active = fencing_active_peers();
>
> do {
> if (op->devices) {
> @@ -790,7 +791,8 @@ stonith_choose_peer(remote_fencing_op_t * op)
>
> if ((peer = find_best_peer(device, op, FIND_PEER_SKIP_TARGET
> | FIND_PEER_VERIFIED_ONLY))) {
> return peer;
> - } else if ((peer = find_best_peer(device, op,
> FIND_PEER_SKIP_TARGET))) {
> + } else if ((op->query_timer == 0 || op->replies >=
> op->replies_expected || op->replies >= active)
> + && (peer = find_best_peer(device, op,
> FIND_PEER_SKIP_TARGET))) {
> return peer;
> } else if ((peer = find_best_peer(device, op,
> FIND_PEER_TARGET_ONLY))) {
> return peer;
> @@ -801,8 +803,13 @@ stonith_choose_peer(remote_fencing_op_t * op)
> && stonith_topology_next(op) == pcmk_ok);
>
> if (op->devices) {
> - crm_notice("Couldn't find anyone to fence %s with %s", op->target,
> - (char *)op->devices->data);
> + if (op->query_timer == 0 || op->replies >=
> op->replies_expected || op->replies >= active) {
> + crm_notice("Couldn't find anyone to fence %s with %s", op->target,
> + (char *)op->devices->data);
> + } else {
> + crm_debug("Couldn't find verified device to fence %s with
> %s", op->target,
> + (char *)op->devices->data);
> + }
> } else {
> crm_debug("Couldn't find anyone to fence %s", op->target);
> }
>
>
>>>> I'm kind of swamped at the moment though.
>>>>
>>>>>
>>>>> Best Regards,
>>>>> Kazunori INOUE
>>>>> <stopped_resource_performed_reset.tar.bz2>_______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131115/f4c11807/attachment-0003.sig>
More information about the Pacemaker
mailing list