<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 13, 2021 at 1:43 PM damiano giuliani <<a href="mailto:damianogiuliani87@gmail.com">damianogiuliani87@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi guys,<div>im back with some PAF postgres cluster problems.</div><div>tonight the cluster fenced the master node and promote the PAF resource to a new node.</div><div>everything went fine, unless i really dont know why.</div><div>so this morning i noticed the old master was fenced by sbd and a new master was promoted, this happen tonight at 00.40.XX.</div><div>filtering the logs i cant find out the any reasons why the old master was fenced and the start of promotion of the new master (which seems went perfectly), at certain point, im a bit lost cuz non of us can is able to get the real reason.</div><div>the cluster worked flawessy for days  with no issues, till now.</div><div>crucial for me uderstand why this switch occured.</div><div><br></div><div>a attached the current status and configuration and logs.</div><div>on the old master node log cant find any reasons</div><div>on the new master the only thing is the fencing and the promotion.</div><div><br></div><div><br>PS:</div><div>could be this the reason of fencing?</div><div><br></div><div>grep  -e sbd /var/log/messages<br>Jul 12 14:58:59 ltaoperdbs02 sbd[6107]: warning: inquisitor_child: Servant pcmk is outdated (age: 4)<br>Jul 12 14:58:59 ltaoperdbs02 sbd[6107]:  notice: inquisitor_child: Servant pcmk is healthy (age: 0)<br></div></div></blockquote><div>That was yesterday afternoon and not 0:40 today in the morning.</div><div>With the watchdog-timeout set to 5s this may have been tight though.</div><div>Maybe check your other nodes for similar warnings - or check the compressed warnings.</div><div>Maybe you can as well check the journal of sbd after start to see if it managed to run rt-scheduled.</div><div>Is this a bare-metal-setup or running on some hypervisor?</div><div>Unfortunately I'm not enough into postgres to tell if there is anything interesting about the last</div><div>messages shown before the suspected watchdog-reboot.</div><div>Was there some administrative stuff done by ltauser before the reboot? If yes what?</div><div><br></div><div>Regards,</div><div>Klaus</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div><br></div><div>Any though and help is really appreciate.<br></div><div><br></div><div>Damiano</div></div>

_______________________________________________<br>

Manage your subscription:<br>

<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

<br>

ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

</blockquote></div></div>