[ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

Marco Marino marino.mrc at gmail.com
Wed Sep 11 11:03:20 EDT 2019


Hi, some updates about this?
Thank you

Il Mer 4 Set 2019, 10:46 Marco Marino <marino.mrc at gmail.com> ha scritto:

> First of all, thank you for your support.
> Andrey: sure, I can reach machines through IPMI.
> Here is a short "log":
>
> #From ld1 trying to contact ld1
> [root at ld1 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XXXXXX
> sdr elist all
> SEL              | 72h | ns  |  7.1 | No Reading
> Intrusion        | 73h | ok  |  7.1 |
> iDRAC8           | 00h | ok  |  7.1 | Dynamic MC @ 20h
> ...
>
> #From ld1 trying to contact ld2
> ipmitool -I lanplus -H 192.168.254.251 -U root -P XXXXXX sdr elist all
> SEL              | 72h | ns  |  7.1 | No Reading
> Intrusion        | 73h | ok  |  7.1 |
> iDRAC7           | 00h | ok  |  7.1 | Dynamic MC @ 20h
> .......
>
>
> #From ld2 trying to contact ld1:
> root at ld2 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XXXXX sdr
> elist all
> SEL              | 72h | ns  |  7.1 | No Reading
> Intrusion        | 73h | ok  |  7.1 |
> iDRAC8           | 00h | ok  |  7.1 | Dynamic MC @ 20h
> System Board     | 00h | ns  |  7.1 | Logical FRU @00h
> .....
>
> #From ld2 trying to contact ld2
> [root at ld2 ~]# ipmitool -I lanplus -H 192.168.254.251 -U root -P XXXX sdr
> elist all
> SEL              | 72h | ns  |  7.1 | No Reading
> Intrusion        | 73h | ok  |  7.1 |
> iDRAC7           | 00h | ok  |  7.1 | Dynamic MC @ 20h
> System Board     | 00h | ns  |  7.1 | Logical FRU @00h
> ........
>
> Jan: Actually the cluster uses /etc/hosts in order to resolve names:
> 172.16.77.10    ld1.mydomain.it      ld1
> 172.16.77.11    ld2.mydomain.it      ld2
>
> Furthermore I'm using ip addresses for ipmi interfaces in the
> configuration:
> [root at ld1 ~]# pcs stonith show fence-node1
>  Resource: fence-node1 (class=stonith type=fence_ipmilan)
>   Attributes: ipaddr=192.168.254.250 lanplus=1 login=root passwd=XXXXX
> pcmk_host_check=static-list pcmk_host_list=ld1.mydomain.it
>   Operations: monitor interval=60s (fence-node1-monitor-interval-60s)
>
>
> Any idea?
> How can I reset the state of the cluster without downtime? "pcs resource
> cleanup" is enough?
> Thank you,
> Marco
>
>
> Il giorno mer 4 set 2019 alle ore 10:29 Jan Pokorný <jpokorny at redhat.com>
> ha scritto:
>
>> On 03/09/19 20:15 +0300, Andrei Borzenkov wrote:
>> > 03.09.2019 11:09, Marco Marino пишет:
>> >> Hi, I have a problem with fencing on a two node cluster. It seems that
>> >> randomly the cluster cannot complete monitor operation for fence
>> devices.
>> >> In log I see:
>> >> crmd[8206]:   error: Result of monitor operation for fence-node2 on
>> >> ld2.mydomain.it: Timed Out
>> >
>> > Can you actually access IP addresses of your IPMI ports?
>>
>> [
>> Tangentially, interesting aspect beyond that and applicable for any
>> non-IP cross-host referential needs, which I haven't seen mentioned
>> anywhere so far, is the risk of DNS resolution (when /etc/hosts will
>> come short) getting to troubles (stale records, port blocked, DNS
>> server overload [DNSSEC, etc.], IPv4/IPv6 parallel records that the SW
>> cannot handle gracefully, etc.).  In any case, just a single DNS
>> server would apparently be an undesired SPOF, and would be unfortunate
>> when unable to fence a node because of that.
>>
>> I think the most robust approach is to use IP addresses whenever
>> possible, and unambiguous records in /etc/hosts when practical.
>> ]
>>
>> >> As attachment there is
>> >> - /var/log/messages for node1 (only the important part)
>> >> - /var/log/messages for node2 (only the important part) <-- Problem
>> starts
>> >> here
>> >> - pcs status
>> >> - pcs stonith show (for both fence devices)
>> >>
>> >> I think it could be a timeout problem, so how can I see timeout value
>> for
>> >> monitor operation in stonith devices?
>> >> Please, someone can help me with this problem?
>> >> Furthermore, how can I fix the state of fence devices without downtime?
>>
>> --
>> Jan (Poki)
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190911/c6a0da41/attachment.html>


More information about the Users mailing list