[ClusterLabs] Antw: Re: Antw: [EXT] unexpected fenced node and promotion of the new master PAF ‑ postgres

Tue Oct 12 02:42:49 EDT 2021

>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 11.10.2021 um
11:57 in
Nachricht <20211011115737.7cc99e69 at firost>:
> Hi,
> 
> I kept the full answer in history to keep the list informed of your full
> answer.
> 
> My answer down below.
> 
> On Mon, 11 Oct 2021 11:33:12 +0200
> damiano giuliani <damianogiuliani87 at gmail.com> wrote:
> 
>> ehy guys sorry for being late, was busy during the WE
>> 
>> here i im:
>> 
>> 
>> > Did you see the swap activity (in/out, not just swap occupation) happen
in
>> > the
>> >
>> > same time the member was lost on corosync side?
>> > Did you check corosync or some of its libs were indeed in swap?
>> >
>> >
>> no and i dont know how do it, i just noticed the swap occupation which
>> suggest me (and my collegue) to find out if it could cause some trouble.
>> 
>> > First, corosync now sit on a lot of memory because of knet. Did you try
to
>> > switch back to udpu which is using way less memory?
>> 
>> 
>> No i havent move to udpd, cast stop processes at all.
>> 
>>   "Could not lock memory of service to avoid page faults"
>> 
>> 
>> grep ‑rn 'Could not lock memory of service to avoid page faults'
/var/log/*
>> returns noting

Maybe the expression is too specific (try "lock memory", maybe), or syslog in
in journal only (journalctl -b | grep "lock memory").

> 
> This message should appears on corosync startup. Make sure the logs hadn't 
> been
> rotated to a blackhole in the meantime...
> 
>> > On my side, mlocks is unlimited on ulimit settings. Check the values
>> > in /proc/$(coro PID)/limits (be careful with the ulimit command, check
the
>> > proc itself).
>> 
>> 
>> cat /proc/101350/limits
>> Limit                     Soft Limit           Hard Limit           Units
>> Max cpu time              unlimited            unlimited           
seconds
>> Max file size             unlimited            unlimited            bytes
>> Max data size             unlimited            unlimited            bytes
>> Max stack size            8388608              unlimited            bytes
>> Max core file size        0                    unlimited            bytes
>> Max resident set          unlimited            unlimited            bytes
>> Max processes             770868               770868
>> processes
>> Max open files            1024                 4096                 files
>> Max locked memory         unlimited            unlimited            bytes
>> Max address space         unlimited            unlimited            bytes
>> Max file locks            unlimited            unlimited            locks
>> Max pending signals       770868               770868              
signals
>> Max msgqueue size         819200               819200               bytes
>> Max nice priority         0                    0
>> Max realtime priority     0                    0
>> Max realtime timeout      unlimited            unlimited            us
>> 
>> Ah... That's the first thing I change.
>> > In SLES, that is defaulted to 10s and so far I have never seen an
>> > environment that is stable enough for the default 1s timeout.
>> 
>> 
>> old versions have 10s default
>> you are not going to fix the problem lthis way, 1s timeout for a bonded
>> network and overkill hardware is enourmous time.
>> 
>> hostnamectl | grep Kernel
>>             Kernel: Linux 3.10.0‑1160.6.1.el7.x86_64
>> [root at ltaoperdbs03 ~]# cat /etc/os‑release
>> NAME="CentOS Linux"
>> VERSION="7 (Core)"
>> 
>> > Indeed. But it's an arbitrage between swapping process mem or freeing
>> > mem by removing data from cache. For database servers, it is advised to
>> > use a
>> > lower value for swappiness anyway, around 5‑10, as a swapped process
means
>> > longer query, longer data in caches, piling sessions, etc.
>> 
>> 
>> totally agree, for db server swappines has to be 5‑10.
>> 
>> kernel?
>> > What are your settings for vm.dirty_* ?
>> 
>> 
>> 
>> hostnamectl | grep Kernel
>>             Kernel: Linux 3.10.0‑1160.6.1.el7.x86_64
>> [root at ltaoperdbs03 ~]# cat /etc/os‑release
>> NAME="CentOS Linux"
>> VERSION="7 (Core)"
>> 
>> 
>> sysctl ‑a | grep dirty
>> vm.dirty_background_bytes = 0
>> vm.dirty_background_ratio = 10
> 
> Considering your 256GB of physical memory, this means you can dirty up to 
> 25GB
> pages in cache before the kernel start to write them on storage.
> 
> You might want to trigger these background, lighter syncs much before 
> hitting
> this limit.
> 
>> vm.dirty_bytes = 0
>> vm.dirty_expire_centisecs = 3000
>> vm.dirty_ratio = 20
> 
> This is 20% of your 256GB physical memory. After this limit, writes have to

> go
> to disks, directly. Considering the time to write to SSD compared to memory
> and the amount of data to sync in the background as well (52GB), this could

> be
> very painful.

Wowever (unless doing really large commits) databases should flush buffers
rather frequently, so I doubt database operations would fill the dirty buffer
rate.
"watch cat /proc/meminfo" could be your friend.

> 
>> vm.dirty_writeback_centisecs = 500
>> 
>> 
>> > Do you have a proof that swap was the problem?
>> 
>> 
>> not at all but after switch to swappiness to 10, cluster doesnt sunndletly
>> swap anymore from a month
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/