[ClusterLabs] Pacemaker not always selecting the right stonith device

Ken Gaillot kgaillot at redhat.com
Wed Jul 20 18:03:21 UTC 2016


On 07/20/2016 12:02 PM, Martin Schlegel wrote:
> Thank you Andrei, Ken & Klaus - much appreciated !
> 
> I am now including pcmk_host_list and pcmk_host_check=static-list. 
> 
> The command stonith_admin -l <node_name> is now showing the right stonith device
> - the one matching the requested <node_name>, i.e. stonith_admin -l pg1 would
> show only the registered device p_ston_pg1.
> 
> However, could you please have another look - I'd like to understand what I am
> seeing ?
> 
> 1) Why does pg3 have stonith devices registered even though none of the stonith
> resources (p_ston_pg1, p_ston_pg2 or p_ston_pg3) were started on pg3 according
> to the crm_mon output ?
> 2) Why does pg2 have p_ston_pg3 registered although it only runs p_ston_pg1
> according to the crm_mon output ?

Where a fence device is running does not limit what targets it can
fence, or what nodes can execute fencing using the device.

A fence device may be used by any cluster node, regardless of where the
device is running, or even whether it is running at all -- unless you've
explicitly disabled the device in the configuration.

To pacemaker, having a fence device "running" on a node simply means
that the node runs the recurring monitor for the device (if one is
configured). That gives the node "verified" access to the device, and it
will be preferred to execute the fencing, if it's available -- but
another node can execute the fencing if necessary.

> (see also the detailed output for stonith_admin further below)
> 
> Cheers,
> Martin
> 
> ______________
> 
> [...]
> primitive p_ston_pg1 stonith:external/ipmi \
> params hostname=pg1 pcmk_host_list=pg1 pcmk_host_check=static-list
> ipaddr=10.148.128.35 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> 
> primitive p_ston_pg2 stonith:external/ipmi \
> params hostname=pg2 pcmk_host_list=pg2 pcmk_host_check=static-list
> ipaddr=10.148.128.19 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> 
> primitive p_ston_pg3 stonith:external/ipmi \
> params hostname=pg3 pcmk_host_list=pg3 pcmk_host_check=static-list
> ipaddr=10.148.128.59 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> [...]
> 
> 
> root at dsvt0-resiliency-test-7:~# crm_mon -1rR
> Last updated: Wed Jul 20 14:36:13 2016 Last change: Wed Jul 20 14:24:19 2016 by
> root via cibadmin on pg2
> Stack: corosync
> Current DC: pg2 (2) (version 1.1.14-70404b0) - partition with quorum
> 3 nodes and 25 resources configured
> 
> Online: [ pg1 (1) pg2 (2) pg3 (3) ]
> 
> Full list of resources:
> 
> p_ston_pg1 (stonith:external/ipmi): Started pg2
> p_ston_pg2 (stonith:external/ipmi): Started pg1
> p_ston_pg3 (stonith:external/ipmi): Started pg1
> 
> [...]
> 
> 
> root at test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en
> $xnode'\n======\n\n' ; for node in pg{1..3}; do echo -en 'Fence node '\$node'
> with:\n' ; stonith_admin -l \$node ; echo '--' ; done"; done
> pg1
> ======
> 
> Fence node pg1 with:
> No devices found
> --
> Fence node pg2 with:
> 1 devices found
> p_ston_pg2
> --
> Fence node pg3 with:
> 1 devices found
> p_ston_pg3
> --
> pg2
> ======
> 
> Fence node pg1 with:
> 1 devices found
> p_ston_pg1
> --
> Fence node pg2 with:
> No devices found
> --
> Fence node pg3 with:
> 1 devices found
> p_ston_pg3
> --
> pg3
> ======
> 
> Fence node pg1 with:
> 1 devices found
> p_ston_pg1
> --
> Fence node pg2 with:
> 1 devices found
> p_ston_pg2
> --
> Fence node pg3 with:
> No devices found
> --
> 
> 
> 
> root at test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en
> $xnode'\n======\n\n' ; stonith_admin -L; echo "; done
> pg1
> ======
> 
> 2 devices found
> p_ston_pg3
> p_ston_pg2
> 
> pg2
> ======
> 
> 2 devices found
> p_ston_pg3
> p_ston_pg1
> 
> pg3
> ======
> 
> 2 devices found
> p_ston_pg1
> p_ston_pg2
> 
> 
> 
>> Andrei Borzenkov <arvidjaar at gmail.com> hat am 20. Juli 2016 um 08:26
>> geschrieben:
>>
>> On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegel <martin at nuboreto.org> wrote:
>>>>> [...]
>>>>>
>>>>> primitive p_ston_pg1 stonith:external/ipmi \
>>>>> params hostname=pg1 ipaddr=10.148.128.35 userid=root
>>>>> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
>>>>> passwd_method=file interface=lan priv=OPERATOR
>>>>>
>>>>> primitive p_ston_pg2 stonith:external/ipmi \
>>>>> params hostname=pg2 ipaddr=10.148.128.19 userid=root
>>>>> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass"
>>>>> passwd_method=file interface=lan priv=OPERATOR
>>>>>
>>>>> primitive p_ston_pg3 stonith:external/ipmi \
>>>>> params hostname=pg3 ipaddr=10.148.128.59 userid=root
>>>>> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass"
>>>>> passwd_method=file interface=lan priv=OPERATOR
>>>>>
>>>>> location l_pgs_resources { otherstuff p_ston_pg1 p_ston_pg2 p_ston_pg3 }
>>>>> resource-discovery=exclusive \
>>>>> rule #uname eq pg1 \
>>>>> rule #uname eq pg2 \
>>>>> rule #uname eq pg3
>>>>>
>>>>> location l_ston_pg1 p_ston_pg1 -inf: pg1
>>>>> location l_ston_pg2 p_ston_pg2 -inf: pg2
>>>>> location l_ston_pg3 p_ston_pg3 -inf: pg3
>>>>
>>>> These constraints prevent each device from running on its intended
>>>> target, but they don't limit which nodes each device can fence. For
>>>> that, each device needs a pcmk_host_list or pcmk_host_map entry, for
>>>> example:
>>>>
>>>> primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com
>>>>
>>>> Use pcmk_host_list if the fence device needs the node name as known to
>>>> the cluster, and pcmk_host_map if you need to translate a node name to
>>>> an address the device understands.
>>
>>> We used the parameter "hostname". What does it do if not that ?
>>
>> hostname is resource parameter. From pacemaker point of view this is
>> opaque string and only resource agent knows how to interpret it.
>>
>> See discussion in another part of this thread. Agent is supposed to
>> return information based on "hostname" parameter but apparently it
>> does not understand when pacemaker asks it.




More information about the Users mailing list