[Pacemaker] stonith

Sun Apr 26 16:50:18 EDT 2015

> On 19 Apr 2015, at 11:37 pm, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> 
> В Sun, 19 Apr 2015 14:23:27 +0200
> Andreas Kurz <andreas.kurz at gmail.com> пишет:
> 
>> On 2015-04-17 12:36, Thomas Manninger wrote:
>>> Hi list,
>>> 
>>> i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over
>>> ipmi interface.
>>> 
>>> My problem is, that sometimes, a wrong node is stonithed.
>>> As example:
>>> I have 4 servers: node1, node2, node3, node4
>>> 
>>> I start a hardware- reset on node node1, but node1 and node3 will be
>>> stonithed.
>> 
>> You have to tell pacemaker exactly what stonith-resource can fence what
>> node if the stonith agent you are using does not support the "list" action.
>> 
> 
> pacmeker is expected to get this information dynamically from stonith
> agent.

Only from those agents that support it.

> 
>> Do this by adding "pcmk_host_check=static-list" and "pcmk_host_list" to
>> every stonith-resource like:
>> 
> 
> Default for pcmk_host_check is "dynamic"; why it does not work in this
> case?

Because IPMI usually has no notion of host names?

> I use external/ipmi muself and I do not remember ever fiddling
> with static list.
> 
>> primitive p_stonith_node3 stonith:external/ipmi \
>>  op monitor interval=3s timeout=20s \
>>  params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
>>  passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
>>  priv=OPERATOR \
>>  pcmk_host_check="static-list" pcmk_host_list="node3"
>> 
>> ... see "man stonithd".
>> 
>> Best regards,
>> Andreas
>> 
>>> 
>>> In the cluster.log, i found following entry:
>>> Apr 17 11:02:41 [20473] node2   stonithd:    debug:
>>> stonith_action_create:       Initiating action reboot for agent
>>> fence_legacy (target=node1)
>>> Apr 17 11:02:41 [20473] node2   stonithd:    debug: make_args:  
>>> Performing reboot action for node 'node1' as 'port=node1'
>>> Apr 17 11:02:41 [20473] node2   stonithd:    debug:
>>> internal_stonith_action_execute:     forking
>>> Apr 17 11:02:41 [20473] node2   stonithd:    debug:
>>> internal_stonith_action_execute:     sending args
>>> Apr 17 11:02:41 [20473] node2   stonithd:    debug:
>>> stonith_device_execute:      Operation reboot for node node1 on
>>> p_stonith_node3 now running with pid=113092, timeout=60s
>>> 
>>> node1 will be reseted with the stonith primitive of node3 ?? Why??
>>> 
>>> my stonith config:
>>> primitive p_stonith_node1 stonith:external/ipmi \
>>>        params hostname=node1 ipaddr=10.100.0.2 passwd_method=file
>>> passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
>>> priv=OPERATOR \
>>>        op monitor interval=3s timeout=20s \
>>>        meta target-role=Started failure-timeout=30s
>>> primitive p_stonith_node2 stonith:external/ipmi \
>>>        op monitor interval=3s timeout=20s \
>>>        params hostname=node2 ipaddr=10.100.0.4 passwd_method=file
>>> passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
>>> priv=OPERATOR \
>>>        meta target-role=Started failure-timeout=30s
>>> primitive p_stonith_node3 stonith:external/ipmi \
>>>        op monitor interval=3s timeout=20s \
>>>        params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
>>> passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
>>> priv=OPERATOR \
>>>        meta target-role=Started failure-timeout=30s
>>> primitive p_stonith_node4 stonith:external/ipmi \
>>>        op monitor interval=3s timeout=20s \
>>>        params hostname=node4 ipaddr=10.100.0.8 passwd_method=file
>>> passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
>>> priv=OPERATOR \
>>>        meta target-role=Started failure-timeout=30s
>>> 
>>> Somebody can help me??
>>> Thanks!
>>> 
>>> Regards,
>>> Thomas
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org