[ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15
jpokorny at redhat.com
Wed Sep 4 04:29:19 EDT 2019
On 03/09/19 20:15 +0300, Andrei Borzenkov wrote:
> 03.09.2019 11:09, Marco Marino пишет:
>> Hi, I have a problem with fencing on a two node cluster. It seems that
>> randomly the cluster cannot complete monitor operation for fence devices.
>> In log I see:
>> crmd: error: Result of monitor operation for fence-node2 on
>> ld2.mydomain.it: Timed Out
> Can you actually access IP addresses of your IPMI ports?
Tangentially, interesting aspect beyond that and applicable for any
non-IP cross-host referential needs, which I haven't seen mentioned
anywhere so far, is the risk of DNS resolution (when /etc/hosts will
come short) getting to troubles (stale records, port blocked, DNS
server overload [DNSSEC, etc.], IPv4/IPv6 parallel records that the SW
cannot handle gracefully, etc.). In any case, just a single DNS
server would apparently be an undesired SPOF, and would be unfortunate
when unable to fence a node because of that.
I think the most robust approach is to use IP addresses whenever
possible, and unambiguous records in /etc/hosts when practical.
>> As attachment there is
>> - /var/log/messages for node1 (only the important part)
>> - /var/log/messages for node2 (only the important part) <-- Problem starts
>> - pcs status
>> - pcs stonith show (for both fence devices)
>> I think it could be a timeout problem, so how can I see timeout value for
>> monitor operation in stonith devices?
>> Please, someone can help me with this problem?
>> Furthermore, how can I fix the state of fence devices without downtime?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 819 bytes
Desc: not available
More information about the Users