[ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

Sebastien BASTARD sebastien at domalys.com
Mon Feb 21 04:08:03 EST 2022


Hello Ulrich,

I modified your script to add the capability to test the TCP connectivity.
Currently, between servers A or B and the QDevice, there is a firewall
which doesn't answer to ping request. So, I tested the 5403 port.

There is result of the week-end :

Logs of Server A :

==> log_up_down_ServerB_from_ServerA.txt <==
---START 1645111039 (2022-02-17_15:17:19)
0 (11) -> 1 1645111050 (2022-02-17_15:17:30)
---EXIT 1645177062 (2022-02-18_09:37:42)
---START 1645199714 (2022-02-18_15:55:14)
0 (4) -> 1 1645199718 (2022-02-18_15:55:18)


==> log_up_down_qdevice_from_ServerA.txt <==
---START 1645117334 (2022-02-17_17:02:14)
0 (10) -> 1 1645117344 (2022-02-17_17:02:24)
*1 (27820) -> 0 1645145164 (2022-02-18_00:46:04)*
0 (10) -> 1 1645145174 (2022-02-18_00:46:14)
---EXIT 1645177062 (2022-02-18_09:37:42)
---START 1645199684 (2022-02-18_15:54:44)
0 (3) -> 1 1645199687 (2022-02-18_15:54:47)
*1 (19519) -> 0 1645219206 (2022-02-18_21:20:06)*
0 (3) -> 1 1645219209 (2022-02-18_21:20:09)

The scripts on Server A stopped working because I forgot to launch it in
the background. But we can see that server A lost connection with the
Qdevice twice.

Logs of Server B :

==> log_up_down_ ServerA_from_ServerB.txt <==
---START 1645110964 (2022-02-17_15:16:04)
0 (11) -> 1 1645110975 (2022-02-17_15:16:15)
---EXIT 1645199533 (2022-02-18_15:52:13)
---START 1645199576 (2022-02-18_15:52:56)
0 (4) -> 1 1645199580 (2022-02-18_15:53:00)


==> log_up_down_qdevice_from_ ServerB  .txt <==
---START 1645117428 (2022-02-17_17:03:48)
0 (10) -> 1 1645117438 (2022-02-17_17:03:58)
---EXIT 1645199529 (2022-02-18_15:52:09)
---START 1645199546 (2022-02-18_15:52:26)
0 (3) -> 1 1645199549 (2022-02-18_15:52:29)
*1 (232677) -> 0 1645432226 (2022-02-21_08:30:26)*
0 (3) -> 1 1645432229 (2022-02-21_08:30:29)


The scripts on Server B stopped working because I forgot to launch it in
the background. But we can see that server B lost connection with the
Qdevice one time.

Logs of qDevice :

==> log_up_down_ServerA_from_qdevice.txt <==
---START 1645363302 (2022-02-20_13:21:42)
0 (4) -> 1 1645363306 (2022-02-20_13:21:46)


==> log_up_down_ ServerB _from_qdevice.txt <==
---START 1645363310 (2022-02-20_13:21:50)
0 (4) -> 1 1645363314 (2022-02-20_13:21:54)


The scripts on qDevice stopped working because the input was linked to the
script and after some minutes, the OS killed the script. We can see the
Qdevice never lost the connection with the 2 servers.

I continue to control the output of the scripts to see when the servers
lost the connections and when they are fencing.

Best regards.


Le ven. 18 févr. 2022 à 08:07, Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> a écrit :

> >>> Sebastien BASTARD <sebastien at domalys.com> schrieb am 17.02.2022 um
> 16:28 in
> Nachricht
> <CAAjZqdz9a2OorPyoSjdRFWNgJT5snOH2KehkpXdEbAuZrWOvEw at mail.gmail.com>:
> > Thank you Ulrich for your script !
> >
> > I launched it, with 10 seconds delay :
> >
> >    - on Server A, to ping Server B
> >    - on Server B, to ping server A
> >    - on QDevice, to ping server A and Server B
> >
> > I currently can't ping Qdevice from server A and B, because it is behind
> a
> > firewall which only authorizes port 5403.
> >
> > Tomorrow, I will see the results.
>
> Maybe another remark: The script was not desoigned for cluster, so it was
> good enough to reditrect the output of the script to a file.
> However bash may buffer some lines before they are written. If the script
> is killed, that's not a problem, but if the node is fenced, you might loose
> the last lines(s).
> So maybe you want do change the echo statement in log_time() to:
> echo "$@ $t ($(date -d@"$t" -u +%F_%T))" >> your_log_file
>
> Maybe you want to use a variable or parameter for that.
>
> Regards,
> Ulrich
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 


Sébastien BASTARD
*Ingénieur R&D* | Domalys • Créateurs d’autonomie

| phone : +33 5 49 83 00 08
| site : www.domalys.com
| email : sebastien at domalys.com
| address : 58 Rue du Vercors 86240 Fontaine-Le-Comte

<https://www.domalys.com/> <https://www.facebook.com/domalys/>
<https://twitter.com/domalysfr>
<https://www.youtube.com/channel/UCRLVU19hjkZ0dv29FaPJacw>
<https://www.linkedin.com/company/domalys/?originalSubdomain=fr>
<https://youtu.be/77t5rETTwQs> <https://www.ces.tech>
<https://www.ces.tech>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220221/0fcf0859/attachment.htm>


More information about the Users mailing list