[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

Wed Feb 20 16:08:24 EST 2019

On 20/02/19 21:25 +0100, Klaus Wenninger wrote:
> Hmm maybe the thing that should be scheduled is running at
> SCHED_RR as well but with just a lower prio. So it wouldn't
> profit from the sched_yield and it wouldn't get anything of
> the 5% either.

Actually, it would possibly make the situation even worse in
that case, as explained in sched_yield(2):

> since doing so will result in unnecessary context
> switches, which will degrade system performance

(not sure into which bucket would this context-switched time
get accounted if at all, but the physical-clock time is ticking
in the interim...)

I am curious if well-tuned SCHED_DEADLINE as mentioned might
be a more comprehensive solution here, also to automatically
flip still-alive-without-progress buggy scenarios into
a purposefully exaggerated condition and hence possibly
actionable (like with token loss -> fencing).

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190220/7cdaf950/attachment-0002.sig>