[ClusterLabs] unexpected fenced node and promotion of the new master PAF - postgres

Wed Jul 14 08:12:03 EDT 2021

On Wed, Jul 14, 2021 at 12:50 PM damiano giuliani <
damianogiuliani87 at gmail.com> wrote:

> Hi guys, thanks for helping,
>
> could be quite hard troubleshooting unexpected fails expecially if they
> are not easily tracked on the pacemaker / system logs.
> all servers are baremetal , i requested the BMC logs hoping there are some
> informations.
> you guys said the sbd is too tight, can you explain me and suggest a valid
> configuration?
>

There is no one-fits-all configuration. If you are experiencing issues that
sbd isn't able to timely
trigger the hardware-watchdog you can consider setting the watchdog-timeout
value to a highter
number and consequently stonith-watchdog-timeout to about double that time.
But you should try to understand why your watchdog triggers and there
aren't things systematically
going wrong - like e.g. sbd or corosync not being able to switch to
rt-scheduling.

>
> ps: yesterday i resyc the old master (to slave) and rejoined into the
> cluster.
> i found the following error into the var/log/messages about the sbd
>
>  grep -r sbd messages
> Jul 12 14:58:59 ltaoperdbs02 sbd[6107]: warning: inquisitor_child: Servant
> pcmk is outdated (age: 4)
> Jul 12 14:58:59 ltaoperdbs02 sbd[6107]:  notice: inquisitor_child: Servant
> pcmk is healthy (age: 0)
> Jul 13 20:42:14 ltaoperdbs02 sbd[185352]:  notice: main: Doing flush +
> writing 'b' to sysrq on timeout
> Jul 13 20:42:14 ltaoperdbs02 sbd[185362]:      pcmk:   notice:
> servant_pcmk: Monitoring Pacemaker health
> Jul 13 20:42:14 ltaoperdbs02 sbd[185363]:   cluster:   notice:
> servant_cluster: Monitoring unknown cluster health
> Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:  notice: inquisitor_child:
> Servant cluster is healthy (age: 0)
> Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:  notice: watchdog_init: Using
> watchdog device '/dev/watchdog'
> Jul 13 20:42:19 ltaoperdbs02 sbd[185357]:  notice: inquisitor_child:
> Servant pcmk is healthy (age: 0)
> Jul 13 20:53:57 ltaoperdbs02 sbd[188919]:    info: main: Verbose mode
> enabled.
> Jul 13 20:53:57 ltaoperdbs02 sbd[188919]:    info: main: Watchdog enabled.
> Jul 13 20:54:28 ltaoperdbs02 sbd[189176]:  notice: main: Doing flush +
> writing 'b' to sysrq on timeout
> Jul 13 20:54:28 ltaoperdbs02 sbd[189178]:      pcmk:   notice:
> servant_pcmk: Monitoring Pacemaker health
> Jul 13 20:54:28 ltaoperdbs02 sbd[189177]:  notice: inquisitor_child:
> Servant pcmk is healthy (age: 0)
> Jul 13 20:54:28 ltaoperdbs02 sbd[189177]:   error: watchdog_init_fd:
> Cannot open watchdog device '/dev/watchdog': Device or resource busy (16)
> Jul 13 20:54:28 ltaoperdbs02 sbd[189177]: warning: cleanup_servant_by_pid:
> Servant for pcmk (pid: 189178) has terminated
> Jul 13 20:54:28 ltaoperdbs02 sbd[189177]: warning: cleanup_servant_by_pid:
> Servant for cluster (pid: 189179) has terminated
> Jul 13 20:55:30 ltaoperdbs02 sbd[189484]:  notice: main: Doing flush +
> writing 'b' to sysrq on timeout
> Jul 13 20:55:30 ltaoperdbs02 sbd[189484]:   error: watchdog_init_fd:
> Cannot open watchdog device '/dev/watchdog0': Device or resource busy (16)
> Jul 13 20:55:30 ltaoperdbs02 sbd[189484]:   error: watchdog_init_fd:
> Cannot open watchdog device '/dev/watchdog': Device or resource busy (16)
>
> There is something strange going on so that sbd isn't able to open the
watchdog-device.
Check that there is nobody else sitting on the watchdog-device - like
systemd, watchdogd, with - iirc compile-time -
configuration corosync, ... Tools like 'lsof' may be helpful for that if
you catch the system in that state.
I'm guessing it doesn't always happen because that should actually prevent
a successful startup of
sbd and thus systemd shouldn't bring up pacemaker.
On the other hand competing for /dev/watchdog shouldn't introduce
unexpected watchdog-reboots
as sbd will either fail opening the device and not come up thus or open the
device and keep it open
for the time being so that nobody else is able to open it.

> if i check the systemctl status sbd:
>
> systemctl status sbd.service
> ● sbd.service - Shared-storage based fencing daemon
>    Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
> preset: disabled)
>    Active: active (running) since Tue 2021-07-13 20:42:15 UTC; 13h ago
>      Docs: man:sbd(8)
>   Process: 185352 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
> watch (code=exited, status=0/SUCCESS)
>  Main PID: 185357 (sbd)
>    CGroup: /system.slice/sbd.service
>            ├─185357 sbd: inquisitor
>            ├─185362 sbd: watcher: Pacemaker
>            └─185363 sbd: watcher: Cluster
>
> Jul 13 20:42:14 ltaoperdbs02 systemd[1]: Starting Shared-storage based
> fencing daemon...
> Jul 13 20:42:14 ltaoperdbs02 sbd[185352]:   notice: main: Doing flush +
> writing 'b' to sysrq on timeout
> Jul 13 20:42:14 ltaoperdbs02 sbd[185362]:       pcmk:   notice:
> servant_pcmk: Monitoring Pacemaker health
> Jul 13 20:42:14 ltaoperdbs02 sbd[185363]:    cluster:   notice:
> servant_cluster: Monitoring unknown cluster health
> Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:   notice: inquisitor_child:
> Servant cluster is healthy (age: 0)
> Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:   notice: watchdog_init: Using
> watchdog device '/dev/watchdog'
> Jul 13 20:42:15 ltaoperdbs02 systemd[1]: Started Shared-storage based
> fencing daemon.
> Jul 13 20:42:19 ltaoperdbs02 sbd[185357]:   notice: inquisitor_child:
> Servant pcmk is healthy (age: 0)
>
> So at least for sbd there don't seem to be systematic issues switching to
rt-scheduling
as we would see it moaning in the logs above.

> this is happening to all 3 nodes, any toughts?
>
> Thanks for helping, have as good day
>
> Damiano
>
>
> Il giorno mer 14 lug 2021 alle ore 10:08 Klaus Wenninger <
> kwenning at redhat.com> ha scritto:
>
>>
>>
>> On Wed, Jul 14, 2021 at 6:40 AM Andrei Borzenkov <arvidjaar at gmail.com>
>> wrote:
>>
>>> On 13.07.2021 23:09, damiano giuliani wrote:
>>> > Hi Klaus, thanks for helping, im quite lost because cant find out the
>>> > causes.
>>> > i attached the corosync logs of all three nodes hoping you guys can
>>> find
>>> > and hint me  something i cant see. i really appreciate the effort.
>>> > the old master log seems cutted at 00:38. so nothing interessing.
>>> > the new master and the third slave logged what its happened. but i cant
>>> > figure out the cause the old master went lost.
>>> >
>>>
>>> The reason it was lost is most likely outside of pacemaker. You need to
>>> check other logs on the node that was lost, may be BMC if this is bare
>>> metal or hypervisor if it is virtualized system.
>>>
>>> All that these logs say is that ltaoperdbs02 was lost from the point of
>>> view of two other nodes. It happened at the same time (around Jul 13
>>> 00:40) which suggests ltaoperdbs02 had some problem indeed. Whether it
>>> was software crash, hardware failure or network outage cannot be
>>> determined from these logs.
>>>
>>> What speaks against a pure network-outage is that we don't see
>> the corosync memberhip messages on the node that died.
>> Of course it is possible that the log wasn't flushed out before reboot
>> but usually I'd expect that there would be enough time.
>> If something kept corosync or sbd from being scheduled that would
>> explain why we don't see messages from these instances.
>> And that was why I was asking to check if in the setup corosync and
>> sbd are able to switch to rt-scheduling.
>> But of course that is all speculations and from what we know it can
>> be merely anything from an administrative hard shutdown via
>> some BMC to whatever.
>>
>>>
>>> > something interessing could be the stonith logs of the new master and
>>> the
>>> > third slave:
>>> >
>>> > NEW MASTER:
>>> > grep stonith-ng /var/log/messages
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Node
>>> ltaoperdbs02
>>> > state is now lost
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Purged 1 peer
>>> > with id=1 and/or uname=ltaoperdbs02 from the membership cache
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Client
>>> > crmd.228700.154a9e50 wants to fence (reboot) 'ltaoperdbs02' with device
>>> > '(any)'
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Requesting
>>> peer
>>> > fencing (reboot) targeting ltaoperdbs02
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Couldn't find
>>> > anyone to fence (reboot) ltaoperdbs02 with any device
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Waiting 10s
>>> for
>>> > ltaoperdbs02 to self-fence (reboot) for client crmd.228700.f5d882d5
>>> > Jul 13 00:40:47 ltaoperdbs03 stonith-ng[228696]:  notice: Self-fencing
>>> > (reboot) by ltaoperdbs02 for
>>> > crmd.228700.f5d882d5-a804-4e20-bad4-7f16393d7748 assumed complete
>>> > Jul 13 00:40:47 ltaoperdbs03 stonith-ng[228696]:  notice: Operation
>>> > 'reboot' targeting ltaoperdbs02 on ltaoperdbs03 for
>>> > crmd.228700 at ltaoperdbs03.f5d882d5: OK
>>> >
>>> > THIRD SLAVE:
>>> > grep stonith-ng /var/log/messages
>>> > Jul 13 00:40:37 ltaoperdbs04 stonith-ng[77928]:  notice: Node
>>> ltaoperdbs02
>>> > state is now lost
>>> > Jul 13 00:40:37 ltaoperdbs04 stonith-ng[77928]:  notice: Purged 1 peer
>>> with
>>> > id=1 and/or uname=ltaoperdbs02 from the membership cache
>>> > Jul 13 00:40:47 ltaoperdbs04 stonith-ng[77928]:  notice: Operation
>>> 'reboot'
>>> > targeting ltaoperdbs02 on ltaoperdbs03 for
>>> crmd.228700 at ltaoperdbs03.f5d882d5:
>>> > OK
>>> >
>>> > i really appreciate the help and  what you think about it.
>>> >
>>> > PS the stonith should be set to 10s (pcs  property set
>>> > stonith-watchdog-timeout=10s) are u suggest different setting?
>>> >
>>> > Il giorno mar 13 lug 2021 alle ore 14:29 Klaus Wenninger <
>>> > kwenning at redhat.com> ha scritto:
>>> >
>>> >>
>>> >>
>>> >> On Tue, Jul 13, 2021 at 1:43 PM damiano giuliani <
>>> >> damianogiuliani87 at gmail.com> wrote:
>>> >>
>>> >>> Hi guys,
>>> >>> im back with some PAF postgres cluster problems.
>>> >>> tonight the cluster fenced the master node and promote the PAF
>>> resource
>>> >>> to a new node.
>>> >>> everything went fine, unless i really dont know why.
>>> >>> so this morning i noticed the old master was fenced by sbd and a new
>>> >>> master was promoted, this happen tonight at 00.40.XX.
>>> >>> filtering the logs i cant find out the any reasons why the old
>>> master was
>>> >>> fenced and the start of promotion of the new master (which seems went
>>> >>> perfectly), at certain point, im a bit lost cuz non of us can is
>>> able to
>>> >>> get the real reason.
>>> >>> the cluster worked flawessy for days  with no issues, till now.
>>> >>> crucial for me uderstand why this switch occured.
>>> >>>
>>> >>> a attached the current status and configuration and logs.
>>> >>> on the old master node log cant find any reasons
>>> >>> on the new master the only thing is the fencing and the promotion.
>>> >>>
>>> >>>
>>> >>> PS:
>>> >>> could be this the reason of fencing?
>>> >>>
>>> >>> grep  -e sbd /var/log/messages
>>> >>> Jul 12 14:58:59 ltaoperdbs02 sbd[6107]: warning: inquisitor_child:
>>> >>> Servant pcmk is outdated (age: 4)
>>> >>> Jul 12 14:58:59 ltaoperdbs02 sbd[6107]:  notice: inquisitor_child:
>>> >>> Servant pcmk is healthy (age: 0)
>>> >>>
>>> >> That was yesterday afternoon and not 0:40 today in the morning.
>>> >> With the watchdog-timeout set to 5s this may have been tight though.
>>> >> Maybe check your other nodes for similar warnings - or check the
>>> >> compressed warnings.
>>> >> Maybe you can as well check the journal of sbd after start to see if
>>> it
>>> >> managed to run rt-scheduled.
>>> >> Is this a bare-metal-setup or running on some hypervisor?
>>> >> Unfortunately I'm not enough into postgres to tell if there is
>>> anything
>>> >> interesting about the last
>>> >> messages shown before the suspected watchdog-reboot.
>>> >> Was there some administrative stuff done by ltauser before the
>>> reboot? If
>>> >> yes what?
>>> >>
>>> >> Regards,
>>> >> Klaus
>>> >>
>>> >>
>>> >>>
>>> >>> Any though and help is really appreciate.
>>> >>>
>>> >>> Damiano
>>> >>> _______________________________________________
>>> >>> Manage your subscription:
>>> >>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> >>>
>>> >>> ClusterLabs home: https://www.clusterlabs.org/
>>> >>>
>>> >> _______________________________________________
>>> >> Manage your subscription:
>>> >> https://lists.clusterlabs.org/mailman/listinfo/users
>>> >>
>>> >> ClusterLabs home: https://www.clusterlabs.org/
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > Manage your subscription:
>>> > https://lists.clusterlabs.org/mailman/listinfo/users
>>> >
>>> > ClusterLabs home: https://www.clusterlabs.org/
>>> >
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210714/b8d4ce02/attachment-0001.htm>