[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
wferi at niif.hu
Tue Sep 5 11:22:21 EDT 2017
Jan Friesse <jfriesse at redhat.com> writes:
> wferi at niif.hu writes:
>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>> (in August; in May, it happened 0-2 times a day only, it's slowly
>> ramping up):
>> vhbl08 corosync: [TOTEM ] A processor failed, forming new configuration.
>> vhbl03 corosync: [TOTEM ] A processor failed, forming new configuration.
>> vhbl07 corosync: [MAIN ] Corosync main process was not scheduled for 4317.0054 ms (threshold is 2400.0000 ms). Consider token timeout increase.
> ^^^ This is main problem you have to solve. It usually means that
> machine is too overloaded. It is happening quite often when corosync
> is running inside VM where host machine is unable to schedule regular
> VM running.
After some extensive tracing, I think the problem lies elsewhere: my
IPMI watchdog device is slow beyond imagination. Its ioctl operations
can take seconds, starving all other functions. At least, it seems to
block the main thread of Corosync. Is this a plausible scenario?
Corosync has two threads, what are their roles?
More information about the Users