[ClusterLabs] Antw: [EXT] Re: unexpected fenced node and promotion of the new master PAF ‑ postgres

Wed Jul 14 09:28:26 EDT 2021

>>> damiano giuliani <damianogiuliani87 at gmail.com> schrieb am 14.07.2021 um
12:49
in Nachricht
<CAG=zYNOjRmKC5az8nz2r82CRabJ3Z+GEnuW_8dE3UJFu1hD1hA at mail.gmail.com>:
> Hi guys, thanks for helping,
> 
> could be quite hard troubleshooting unexpected fails expecially if they are
> not easily tracked on the pacemaker / system logs.
> all servers are baremetal , i requested the BMC logs hoping there are some
> informations.
> you guys said the sbd is too tight, can you explain me and suggest a valid
> configuration?

You must answer these questions for yourself:
* What is the maximum read/write delay for your sbd device that still means
the storage is working? Before assuming something like 1s also think of
firmware updates, bad disk sectors, etc.
* Then configure the sbd parameters accordingly
* Finally configure the stonith timeout to be not less than the time sbd needs
in worst case to down the machine. If the cluster starts recovering while the
other node is not down already, you may have data corruption or other
failures.

> 
> ps: yesterday i resyc the old master (to slave) and rejoined into the
> cluster.
> i found the following error into the var/log/messages about the sbd
> 
>  grep -r sbd messages
> Jul 12 14:58:59 ltaoperdbs02 sbd[6107]: warning: inquisitor_child: Servant
> pcmk is outdated (age: 4)
> Jul 12 14:58:59 ltaoperdbs02 sbd[6107]:  notice: inquisitor_child: Servant
> pcmk is healthy (age: 0)
> Jul 13 20:42:14 ltaoperdbs02 sbd[185352]:  notice: main: Doing flush +
> writing 'b' to sysrq on timeout
> Jul 13 20:42:14 ltaoperdbs02 sbd[185362]:      pcmk:   notice:
> servant_pcmk: Monitoring Pacemaker health
> Jul 13 20:42:14 ltaoperdbs02 sbd[185363]:   cluster:   notice:
> servant_cluster: Monitoring unknown cluster health
> Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:  notice: inquisitor_child:
> Servant cluster is healthy (age: 0)
> Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:  notice: watchdog_init: Using
> watchdog device '/dev/watchdog'
> Jul 13 20:42:19 ltaoperdbs02 sbd[185357]:  notice: inquisitor_child:
> Servant pcmk is healthy (age: 0)
> Jul 13 20:53:57 ltaoperdbs02 sbd[188919]:    info: main: Verbose mode
> enabled.
> Jul 13 20:53:57 ltaoperdbs02 sbd[188919]:    info: main: Watchdog enabled.
> Jul 13 20:54:28 ltaoperdbs02 sbd[189176]:  notice: main: Doing flush +
> writing 'b' to sysrq on timeout
> Jul 13 20:54:28 ltaoperdbs02 sbd[189178]:      pcmk:   notice:
> servant_pcmk: Monitoring Pacemaker health
> Jul 13 20:54:28 ltaoperdbs02 sbd[189177]:  notice: inquisitor_child:
> Servant pcmk is healthy (age: 0)
> Jul 13 20:54:28 ltaoperdbs02 sbd[189177]:   error: watchdog_init_fd: Cannot
> open watchdog device '/dev/watchdog': Device or resource busy (16)

Maybe also debug the watchdog device.

> Jul 13 20:54:28 ltaoperdbs02 sbd[189177]: warning: cleanup_servant_by_pid:
> Servant for pcmk (pid: 189178) has terminated
> Jul 13 20:54:28 ltaoperdbs02 sbd[189177]: warning: cleanup_servant_by_pid:
> Servant for cluster (pid: 189179) has terminated
> Jul 13 20:55:30 ltaoperdbs02 sbd[189484]:  notice: main: Doing flush +
> writing 'b' to sysrq on timeout
> Jul 13 20:55:30 ltaoperdbs02 sbd[189484]:   error: watchdog_init_fd: Cannot
> open watchdog device '/dev/watchdog0': Device or resource busy (16)
> Jul 13 20:55:30 ltaoperdbs02 sbd[189484]:   error: watchdog_init_fd: Cannot
> open watchdog device '/dev/watchdog': Device or resource busy (16)
> 
> if i check the systemctl status sbd:
> 
> systemctl status sbd.service
> ● sbd.service - Shared-storage based fencing daemon
>    Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
> preset: disabled)
>    Active: active (running) since Tue 2021-07-13 20:42:15 UTC; 13h ago
>      Docs: man:sbd(8)
>   Process: 185352 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
> watch (code=exited, status=0/SUCCESS)
>  Main PID: 185357 (sbd)
>    CGroup: /system.slice/sbd.service
>            ├─185357 sbd: inquisitor
>            ├─185362 sbd: watcher: Pacemaker
>            └─185363 sbd: watcher: Cluster
> 
> Jul 13 20:42:14 ltaoperdbs02 systemd[1]: Starting Shared-storage based
> fencing daemon...
> Jul 13 20:42:14 ltaoperdbs02 sbd[185352]:   notice: main: Doing flush +
> writing 'b' to sysrq on timeout
> Jul 13 20:42:14 ltaoperdbs02 sbd[185362]:       pcmk:   notice:
> servant_pcmk: Monitoring Pacemaker health
> Jul 13 20:42:14 ltaoperdbs02 sbd[185363]:    cluster:   notice:
> servant_cluster: Monitoring unknown cluster health
> Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:   notice: inquisitor_child:
> Servant cluster is healthy (age: 0)
> Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:   notice: watchdog_init: Using
> watchdog device '/dev/watchdog'
> Jul 13 20:42:15 ltaoperdbs02 systemd[1]: Started Shared-storage based
> fencing daemon.
> Jul 13 20:42:19 ltaoperdbs02 sbd[185357]:   notice: inquisitor_child:
> Servant pcmk is healthy (age: 0)
> 
> this is happening to all 3 nodes, any toughts?

Bad watchdog? 

> 
> Thanks for helping, have as good day
> 
> Damiano
> 
> 
> Il giorno mer 14 lug 2021 alle ore 10:08 Klaus Wenninger <
> kwenning at redhat.com> ha scritto:
> 
>>
>>
>> On Wed, Jul 14, 2021 at 6:40 AM Andrei Borzenkov <arvidjaar at gmail.com>
>> wrote:
>>
>>> On 13.07.2021 23:09, damiano giuliani wrote:
>>> > Hi Klaus, thanks for helping, im quite lost because cant find out the
>>> > causes.
>>> > i attached the corosync logs of all three nodes hoping you guys can
find
>>> > and hint me  something i cant see. i really appreciate the effort.
>>> > the old master log seems cutted at 00:38. so nothing interessing.
>>> > the new master and the third slave logged what its happened. but i cant
>>> > figure out the cause the old master went lost.
>>> >
>>>
>>> The reason it was lost is most likely outside of pacemaker. You need to
>>> check other logs on the node that was lost, may be BMC if this is bare
>>> metal or hypervisor if it is virtualized system.
>>>
>>> All that these logs say is that ltaoperdbs02 was lost from the point of
>>> view of two other nodes. It happened at the same time (around Jul 13
>>> 00:40) which suggests ltaoperdbs02 had some problem indeed. Whether it
>>> was software crash, hardware failure or network outage cannot be
>>> determined from these logs.
>>>
>>> What speaks against a pure network-outage is that we don't see
>> the corosync memberhip messages on the node that died.
>> Of course it is possible that the log wasn't flushed out before reboot
>> but usually I'd expect that there would be enough time.
>> If something kept corosync or sbd from being scheduled that would
>> explain why we don't see messages from these instances.
>> And that was why I was asking to check if in the setup corosync and
>> sbd are able to switch to rt-scheduling.
>> But of course that is all speculations and from what we know it can
>> be merely anything from an administrative hard shutdown via
>> some BMC to whatever.
>>
>>>
>>> > something interessing could be the stonith logs of the new master and
>>> the
>>> > third slave:
>>> >
>>> > NEW MASTER:
>>> > grep stonith-ng /var/log/messages
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Node
>>> ltaoperdbs02
>>> > state is now lost
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Purged 1 peer
>>> > with id=1 and/or uname=ltaoperdbs02 from the membership cache
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Client
>>> > crmd.228700.154a9e50 wants to fence (reboot) 'ltaoperdbs02' with device
>>> > '(any)'
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Requesting
>>> peer
>>> > fencing (reboot) targeting ltaoperdbs02
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Couldn't find
>>> > anyone to fence (reboot) ltaoperdbs02 with any device
>>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Waiting 10s
>>> for
>>> > ltaoperdbs02 to self-fence (reboot) for client crmd.228700.f5d882d5
>>> > Jul 13 00:40:47 ltaoperdbs03 stonith-ng[228696]:  notice: Self-fencing
>>> > (reboot) by ltaoperdbs02 for
>>> > crmd.228700.f5d882d5-a804-4e20-bad4-7f16393d7748 assumed complete
>>> > Jul 13 00:40:47 ltaoperdbs03 stonith-ng[228696]:  notice: Operation
>>> > 'reboot' targeting ltaoperdbs02 on ltaoperdbs03 for
>>> > crmd.228700 at ltaoperdbs03.f5d882d5: OK
>>> >
>>> > THIRD SLAVE:
>>> > grep stonith-ng /var/log/messages
>>> > Jul 13 00:40:37 ltaoperdbs04 stonith-ng[77928]:  notice: Node
>>> ltaoperdbs02
>>> > state is now lost
>>> > Jul 13 00:40:37 ltaoperdbs04 stonith-ng[77928]:  notice: Purged 1 peer
>>> with
>>> > id=1 and/or uname=ltaoperdbs02 from the membership cache
>>> > Jul 13 00:40:47 ltaoperdbs04 stonith-ng[77928]:  notice: Operation
>>> 'reboot'
>>> > targeting ltaoperdbs02 on ltaoperdbs03 for
>>> crmd.228700 at ltaoperdbs03.f5d882d5:
>>> > OK
>>> >
>>> > i really appreciate the help and  what you think about it.
>>> >
>>> > PS the stonith should be set to 10s (pcs  property set
>>> > stonith-watchdog-timeout=10s) are u suggest different setting?
>>> >
>>> > Il giorno mar 13 lug 2021 alle ore 14:29 Klaus Wenninger <
>>> > kwenning at redhat.com> ha scritto:
>>> >
>>> >>
>>> >>
>>> >> On Tue, Jul 13, 2021 at 1:43 PM damiano giuliani <
>>> >> damianogiuliani87 at gmail.com> wrote:
>>> >>
>>> >>> Hi guys,
>>> >>> im back with some PAF postgres cluster problems.
>>> >>> tonight the cluster fenced the master node and promote the PAF
>>> resource
>>> >>> to a new node.
>>> >>> everything went fine, unless i really dont know why.
>>> >>> so this morning i noticed the old master was fenced by sbd and a new
>>> >>> master was promoted, this happen tonight at 00.40.XX.
>>> >>> filtering the logs i cant find out the any reasons why the old master
>>> was
>>> >>> fenced and the start of promotion of the new master (which seems went
>>> >>> perfectly), at certain point, im a bit lost cuz non of us can is able
>>> to
>>> >>> get the real reason.
>>> >>> the cluster worked flawessy for days  with no issues, till now.
>>> >>> crucial for me uderstand why this switch occured.
>>> >>>
>>> >>> a attached the current status and configuration and logs.
>>> >>> on the old master node log cant find any reasons
>>> >>> on the new master the only thing is the fencing and the promotion.
>>> >>>
>>> >>>
>>> >>> PS:
>>> >>> could be this the reason of fencing?
>>> >>>
>>> >>> grep  -e sbd /var/log/messages
>>> >>> Jul 12 14:58:59 ltaoperdbs02 sbd[6107]: warning: inquisitor_child:
>>> >>> Servant pcmk is outdated (age: 4)
>>> >>> Jul 12 14:58:59 ltaoperdbs02 sbd[6107]:  notice: inquisitor_child:
>>> >>> Servant pcmk is healthy (age: 0)
>>> >>>
>>> >> That was yesterday afternoon and not 0:40 today in the morning.
>>> >> With the watchdog-timeout set to 5s this may have been tight though.
>>> >> Maybe check your other nodes for similar warnings - or check the
>>> >> compressed warnings.
>>> >> Maybe you can as well check the journal of sbd after start to see if
it
>>> >> managed to run rt-scheduled.
>>> >> Is this a bare-metal-setup or running on some hypervisor?
>>> >> Unfortunately I'm not enough into postgres to tell if there is
anything
>>> >> interesting about the last
>>> >> messages shown before the suspected watchdog-reboot.
>>> >> Was there some administrative stuff done by ltauser before the reboot?
>>> If
>>> >> yes what?
>>> >>
>>> >> Regards,
>>> >> Klaus
>>> >>
>>> >>
>>> >>>
>>> >>> Any though and help is really appreciate.
>>> >>>
>>> >>> Damiano
>>> >>> _______________________________________________
>>> >>> Manage your subscription:
>>> >>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>> >>>
>>> >>> ClusterLabs home: https://www.clusterlabs.org/ 
>>> >>>
>>> >> _______________________________________________
>>> >> Manage your subscription:
>>> >> https://lists.clusterlabs.org/mailman/listinfo/users 
>>> >>
>>> >> ClusterLabs home: https://www.clusterlabs.org/ 
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > Manage your subscription:
>>> > https://lists.clusterlabs.org/mailman/listinfo/users 
>>> >
>>> > ClusterLabs home: https://www.clusterlabs.org/ 
>>> >
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>>
>>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>