[ClusterLabs] fence_ipmilan power_timeout ValueError

Ron Kerry ron.kerry at hpe.com
Thu Jul 13 09:11:54 EDT 2017


I have a customer who recently tried to create fence-ipmilan resources 
with a power_timeout parameters set to 60s. His reasoning was that the 
parameter is set to be a string value, so he expected the 's' modifier 
on the 60 value to be interpreted correctly. It is not.

     <parameter name="power_timeout" unique="0" required="0">
         <getopt mixed="--power-timeout=[seconds]" />
         <content type="string" default="20"  />
         <shortdesc lang="en">Test X seconds for status change after 
ON/OFF</shortdesc>
     </parameter>

Here is an example of his resource definition.

primitive smc0002_fencing stonith:fence_ipmilan \
         params ipaddr=X.X.X.X login=admin passwd=pppppp lanplus=true 
power_timeout=60s pcmk_host_check=static-list pcmk_host_list=smc0002 \
         op stop on-fail=ignore interval=0 timeout=20s \
         op monitor interval=60s timeout=20s \
         meta target-role=Started

STONITH does not work and gets the following errors.

-----
Jul 13 00:34:20 smc0001 stonith-ng[3921]:  notice: Client 
stonith_admin.30337.1d0dee53 wants to fence (reboot) 'smc0002' with 
device '(any)'
Jul 13 00:34:20 smc0001 stonith-ng[3921]:  notice: Requesting peer 
fencing (reboot) of smc0002
Jul 13 00:34:20 smc0001 stonith-ng[3921]:  notice: smc0002_fencing can 
fence (reboot) smc0002: static-list
Jul 13 00:34:20 smc0001 stonith-ng[3921]:  notice: smc0002_fencing can 
fence (reboot) smc0002: static-list
Jul 13 00:34:20 smc0001 python: detected unhandled Python exception in 
'/usr/sbin/fence_ipmilan'
Jul 13 00:34:20 smc0001 python: can't communicate with ABRT daemon, is 
it running? [Errno 2] No such file or directory
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [ Traceback (most recent call last): ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [   File "/usr/sbin/fence_ipmilan", line 204, in <module> ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [     main() ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [   File "/usr/sbin/fence_ipmilan", line 200, in main ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [     result = fence_action(None, options, set_power_status, 
get_power_status, None, reboot_fn) ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [   File "/usr/share/fence/fencing.py", line 973, in fence_action ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [     status = get_multi_power_fn(tn, options, get_power_fn) ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [   File "/usr/share/fence/fencing.py", line 880, in 
get_multi_power_fn ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [     plug_status = get_power_fn(tn, options) ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [   File "/usr/sbin/fence_ipmilan", line 17, in get_power_status ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [     output = run_command(options, create_command(options, 
"status")) ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [   File "/usr/share/fence/fencing.py", line 1192, in run_command ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [     timeout = float(timeout) ]
Jul 13 00:34:20 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30338] 
stderr: [ ValueError: invalid literal for float(): 60s ]
Jul 13 00:34:21 smc0001 python: detected unhandled Python exception in 
'/usr/sbin/fence_ipmilan'
Jul 13 00:34:22 smc0001 python: can't communicate with ABRT daemon, is 
it running? [Errno 2] No such file or directory
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [ Traceback (most recent call last): ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [   File "/usr/sbin/fence_ipmilan", line 204, in <module> ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [     main() ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [   File "/usr/sbin/fence_ipmilan", line 200, in main ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [     result = fence_action(None, options, set_power_status, 
get_power_status, None, reboot_fn) ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [   File "/usr/share/fence/fencing.py", line 973, in fence_action ]
Command failed: No route to host
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [     status = get_multi_power_fn(tn, options, get_power_fn) ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [   File "/usr/share/fence/fencing.py", line 880, in 
get_multi_power_fn ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [     plug_status = get_power_fn(tn, options) ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [   File "/usr/sbin/fence_ipmilan", line 17, in get_power_status ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [     output = run_command(options, create_command(options, 
"status")) ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [   File "/usr/share/fence/fencing.py", line 1192, in run_command ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: fence_ipmilan[30343] 
stderr: [     timeout = float(timeout) ]
[root at smc0001 lib64]# Jul 13 00:34:22 smc0001 stonith-ng[3921]: warning: 
fence_ipmilan[30343] stderr: [ ValueError: invalid literal for float(): 
60s ]
Jul 13 00:34:22 smc0001 stonith-ng[3921]:   error: Operation 'reboot' 
[30343] (call 2 from stonith_admin.30337) for host 'smc0002' with device 
'smc0002_fencing' returned: -201 (Generic Pacemaker error)
Jul 13 00:34:22 smc0001 stonith-ng[3921]:  notice: Couldn't find anyone 
to fence (reboot) smc0002 with any device
Jul 13 00:34:22 smc0001 stonith-ng[3921]:   error: Operation reboot of 
smc0002 by <no-one> for stonith_admin.30337 at smc0001.a7b73c3b: No route 
to host
Jul 13 00:34:22 smc0001 crmd[3925]:  notice: Peer smc0002 was not 
terminated (reboot) by <anyone> for smc0001: No route to host 
(ref=a7b73c3b-b67f-4a85-815e-df877055eb37) by client stonith_admin.30337
----

So clearly, the code is expecting power_timeout to be an integer value. 
I suspect this is also true for the other internal timeout values: 
timeout, login_timeout, shell_timeout, power_timeout.

Is this a bug? Should the metadata definitions define these parameters 
as integers or floats instead of strings? Or should the code correctly 
interpret the 's' (or other modifiers)? Or should the metadata 
documentation clearly state that no time modifiers like 's' are allowed 
in order to facilitate specifying fractional numbers of seconds? Or is 
this just the way it is (if so, please give me some logical reasoning 
that I can pass along to the customer)?

-- 

Ron Kerry
Global Product Support

ron.kerry at hpe.com
Hewlett Packard Enterprise





More information about the Users mailing list