[ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15
Ken Gaillot
kgaillot at redhat.com
Mon Sep 16 10:59:25 EDT 2019
On Tue, 2019-09-03 at 10:09 +0200, Marco Marino wrote:
> Hi, I have a problem with fencing on a two node cluster. It seems
> that randomly the cluster cannot complete monitor operation for fence
> devices. In log I see:
> crmd[8206]: error: Result of monitor operation for fence-node2 on
> ld2.mydomain.it: Timed Out
> As attachment there is
> - /var/log/messages for node1 (only the important part)
> - /var/log/messages for node2 (only the important part) <-- Problem
> starts here
> - pcs status
> - pcs stonith show (for both fence devices)
>
> I think it could be a timeout problem, so how can I see timeout value
> for monitor operation in stonith devices?
> Please, someone can help me with this problem?
> Furthermore, how can I fix the state of fence devices without
> downtime?
>
> Thank you
How to investigate depends on whether this is an occasional monitor
failure, or happens every time the device start is attempted. From the
status you attached, I'm guessing it's at start.
In that case, my next step (since you've already verified ipmitool
works directly) would be to run the fence agent manually using the same
arguments used in the cluster configuration.
Check the man page for the fence agent, looking at the section for
"Stdin Parameters". These are what's used in the cluster configuration,
so make a note of what values you've configured. Then run the fence
agent like this:
echo -e "action=status\nPARAMETER=VALUE\nPARAMETER=VALUE\n..." | /path/to/agent
where PARAMETER=VALUE entries are what you have configured in the
cluster. If the problem isn't obvious from that, you can try adding a
debug_file parameter.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list