[ClusterLabs] Pacemaker not always selecting the right stonith device

Andrei Borzenkov arvidjaar at gmail.com
Thu Jul 21 23:30:46 EDT 2016

22.07.2016 00:38, Klaus Wenninger пишет:
> On 07/21/2016 06:40 PM, Andrei Borzenkov wrote:
>> 19.07.2016 18:24, Klaus Wenninger пишет:
>>> On 07/19/2016 04:17 PM, Ken Gaillot wrote:
>>>> On 07/19/2016 09:00 AM, Andrei Borzenkov wrote:
>>>>> On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>>>> ...
>>>>>>> primitive p_ston_pg1 stonith:external/ipmi \
>>>>>>>  params hostname=pg1 ipaddr= userid=root
>>>>>>> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
>>>>>>> passwd_method=file interface=lan priv=OPERATOR
>>>>> ...
>>>>>> These constraints prevent each device from running on its intended
>>>>>> target, but they don't limit which nodes each device can fence. For
>>>>>> that, each device needs a pcmk_host_list or pcmk_host_map entry, for
>>>>>> example:
>>>>>>    primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com
>>>>>> Use pcmk_host_list if the fence device needs the node name as known to
>>>>>> the cluster, and pcmk_host_map if you need to translate a node name to
>>>>>> an address the device understands.
>>>>> Is not pacemaker expected by default to query stonith agent instance
>>>>> (sorry I do not know proper name for it) for a list of hosts it can
>>>>> manage? And external/ipmi should return value of "hostname" patameter
>>>>> here? So the question is why it does not work?
>>>> You're right -- if not told otherwise, Pacemaker will query the device
>>>> for the target list. In this case, the output of "stonith_admin -l"
>>>> suggests it's not returning the desired information. I'm not familiar
>>>> with the external agents, so I don't know why that would be. I
>>>> mistakenly assumed it worked similarly to fence_ipmilan ...
>>> guess it worked at the times when pacemaker did fencing via
>>> cluster-glue-code...
>>> A grep for "gethosts" doesn't return much for current pacemaker-sources
>>> apart
>>> from some leftovers in cts.
>> Pacemaker is expected to call fence_legacy which translates "list" into
>> "gethosts". It does it in my case. So it appears a problem of this
>> specific installation.
> As said in some other branch of this discussion I don't have an
> installation with legacy-fencing here at the moment ... the final
> translation to gethosts should be done by the test-binary coming
> from cluster-glue ...

fence_legacy calls "stonith" command which does it.

> But I'm still a little surprised as fence_legacy doesn't mention
> 'list' in the actions-section of the meta-data it creates for the
> legacy-agents.

It does it now

commit 3f2d1b1302adc40d9647e854187b7a85bd38f8fb
Author: Gao,Yan <ygao at suse.com>
Date:   Thu Jun 23 20:53:05 2016 +0200

    Fix: fencing: fence_legacy - Search capable devices by querying them
through "list" action

    Cluster-glue stonith agents have their own parameters for the host
    list. We need to query the devices and get the so-called dynamic-list
    via "stonith -l", which invokes "gethosts" action of the cluster-glue
    stonith agents.

And in the past pacemaker would try both "list" and "hostlist", so it
may be in the version in question "hostlilst" is already removed from
pacemaker but "list" is not yest present in fence_legacy.

> Playing with fence_dummy it didn't go for the dynamic-list
> anymore once I had removed the 'list' action from the
> meta-data. Hence my suggestion to give it a try with
> this action added. But I haven't checked in the code if there is
> some special-handling for legacy-fencing from pacemaker-side.

More information about the Users mailing list