[ClusterLabs] Corosync: 100% cpu (corosync 2.3.5, libqb 0.17.1, pacemaker 1.1.13)

Pallai Roland pallair at magex.hu
Thu Aug 6 15:54:02 UTC 2015


Thanks, resolved.

I ran into the following libqb issue:
 https://github.com/ClusterLabs/libqb/issues/139
 https://github.com/ClusterLabs/libqb/pull/141

Applying 7f56f58 on libqb 0.17.1 fixed my problem.

https://github.com/davidvossel/libqb/commit/7f56f583d891859c94b24db0ec38a301c3f3466a.patch


2015-08-06 1:57 GMT+02:00 Pallai Roland <pallair at magex.hu>:

> hi,
>
> I've built a recent cluster stack from sources on Debian Jessie and I
> can't get rid of cpu spikes. Corosync blocks the entire system for
> seconds on every simple transition, even itself:
>
>  drbdtest1 corosync[4734]:   [MAIN  ] Corosync main process was not
> scheduled for 2590.4512 ms (threshold is 2400.0000 ms). Consider token
> timeout increase.
>
> and even drbd:
>  drbdtest1 kernel: drbd p1: PingAck did not arrive in time.
>
> My previous build (corosync 1.4.6, libqb 0.17.0, pacemaker 1.1.12) works
> fine on this nodes with the same corosync/pacemaker setup.
>
> What should I try? It's a test environment, the issue is 100% reproducible
> in seconds. Network traffic is minimal all the time and there is no I/O
> load.
>
>
> *Pacemaker config:*
>
> node 167969573: drbdtest1
> node 167969574: drbdtest2
> primitive drbd_p1 ocf:linbit:drbd \
>         params drbd_resource=p1 \
>         op monitor interval=30
> primitive drbd_p2 ocf:linbit:drbd \
>         params drbd_resource=p2 \
>         op monitor interval=30
> primitive dummy_test ocf:pacemaker:Dummy \
>         meta allow-migrate=true \
>         params state="/var/run/activenode"
> primitive fence_libvirt stonith:external/libvirt \
>         params hostlist="drbdtest1,drbdtest2"
> hypervisor_uri="qemu+ssh://libvirt-fencing@mgx4/system" \
>         op monitor interval=30
> primitive fs_boot Filesystem \
>         params device="/dev/null" directory="/boot" fstype="*" \
>         meta is-managed=false \
>         op monitor interval=20 timeout=40 on-fail=block OCF_CHECK_LEVEL=20
> primitive fs_f1 Filesystem \
>         params device="/dev/drbd/by-res/p1" directory="/mnt/p1"
> fstype=ext4 options="commit=60,barrier=0,data=writeback" \
>         op monitor interval=20 timeout=40 \
>         op start timeout=300 interval=0 \
>         op stop timeout=180 interval=0
> primitive ip_10.3.3.138 IPaddr2 \
>         params ip=10.3.3.138 cidr_netmask=32 \
>         op monitor interval=10s timeout=20s
> primitive sysinfo ocf:pacemaker:SysInfo \
>         op start timeout=20s interval=0 \
>         op stop timeout=20s interval=0 \
>         op monitor interval=60s
> group dummy-group dummy_test
> ms ms_drbd_p1 drbd_p1 \
>         meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> ms ms_drbd_p2 drbd_p2 \
>         meta master-max=2 master-node-max=1 clone-max=2 notify=true
> clone fencing_by_libvirt fence_libvirt \
>         meta globally-unique=false
> clone fs_boot_clone fs_boot
> clone sysinfos sysinfo \
>         meta globally-unique=false
> location fs1_on_high_load fs_f1 \
>         rule -inf: cpu_load gte 4
> colocation dummy_coloc inf: dummy-group ms_drbd_p2:Master
> colocation f1a-coloc inf: fs_f1 ms_drbd_p1:Master
> colocation f1b-coloc inf: fs_f1 fs_boot_clone:Started
> order dummy_order inf: ms_drbd_p2:promote dummy-group:start
> order orderA inf: ms_drbd_p1:promote fs_f1:start
> property cib-bootstrap-options: \
>         dc-version=1.1.13-6052cd1 \
>         cluster-infrastructure=corosync \
>         expected-quorum-votes=2 \
>         no-quorum-policy=ignore \
>         symmetric-cluster=true \
>         placement-strategy=default \
>         last-lrm-refresh=1438735742 \
>         have-watchdog=false
> property cib-bootstrap-options-stonith: \
>         stonith-enabled=true \
>         stonith-action=reboot
> rsc_defaults rsc-options: \
>         resource-stickiness=100
>
>
> *corosync.conf:*
>
> totem {
>         version: 2
>         token: 3000
>         token_retransmits_before_loss_const: 10
>         clear_node_high_bit: yes
>         crypto_cipher: none
>         crypto_hash: none
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 10.3.3.37
>                 mcastaddr: 225.0.0.37
>                 mcastport: 5403
>                 ttl: 1
>         }
> }
>
> logging {
>         fileline: off
>         to_stderr: no
>         to_logfile: yes
>         logfile: /var/log/corosync/corosync.log
>         to_syslog: yes
>         syslog_facility: daemon
>         debug: off
>         timestamp: on
>         logger_subsys {
>                 subsys: QUORUM
>                 debug: off
>         }
> }
>
> quorum {
>         provider: corosync_votequorum
>         expected_votes: 2
> }
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150806/edcaac71/attachment-0002.html>


More information about the Users mailing list