[Pacemaker] wrong device in stonith_admin -l
laurent+pacemaker at u-picardie.fr
laurent+pacemaker at u-picardie.fr
Tue Dec 18 12:38:50 EST 2012
laurent+pacemaker at u-picardie.fr writes:
> David Vossel <dvossel at redhat.com> writes:
>
>>> Dec 12 01:12:37 elasticsearch-06 stonith-ng[18181]: notice:
>>> dynamic_list_search_cb: Disabling port list queries for
>>> stonith-xen-eddu (1): failed: 255
>>
>> We discover what hosts a agent can fence by running this command internally in stonith.
>>
>> # agent -o list
>>
>>>From there we expect a exit-code of 0 and the list of node to be in the output.
>> https://fedorahosted.org/cluster/wiki/FenceAgentAPI
>>
>> Looking at your logs, stonith-xen-eddu is returning -1 (255) as the return code when we issue the 'list' action. That means we don't try to get the dynamic list again, we assume the 'list' action isn't supported. From there we fall back to using the 'status' action to dynamically determine if agent can fence a particular host. I'm guessing the 'status' action is returning true (return codes 0 or 2) for hosts you wouldn't expect the agent to be able to fence for some reason.
>
> Hi,
>
> Ok it makes sense.
> The FenceAgentAPI doc gives extra information on top of this one:
> http://hg.linux-ha.org/glue/file/67224d37df80/doc/stonith/README.external
>
> returning 1 when hostlist is empty does the trick (gethosts action)
> so does returning 1 to the status action.
>
> So I guess that's the explanation to both of my issues :
> - after the timeout issue, the port list queries were disabled,
> failing back to the status action that was always returning rc=0
> - gethosts returning rc=0 with an empty hostlist also disables the
> port list queries
>
> so I guess there's no need to fill a new ticket :)
> Thanks,
Hmm it still feels like there's something funny with this issue.
is the FenceAgentAPI relevant with pacemaker ?
I don't see why the fencing agent should return 1 when called with
"gethosts", it's reachable and working properly. It's just returning
an empty hostlist.
as for the status action, it also feels like it should return 0 (or 2
if pacemaker supports it) as the device is reachable.
In the end I'm going to fill a bug.
--
Laurent
More information about the Pacemaker
mailing list