<div dir="ltr"><div>Thanks, resolved.</div><div><br></div><div>I ran into the following libqb issue:</div><div> <a href="https://github.com/ClusterLabs/libqb/issues/139">https://github.com/ClusterLabs/libqb/issues/139</a><br></div> <a href="https://github.com/ClusterLabs/libqb/pull/141">https://github.com/ClusterLabs/libqb/pull/141</a><div><br></div><div>Applying 7f56f58 on libqb 0.17.1 fixed my problem.</div><div> <a href="https://github.com/davidvossel/libqb/commit/7f56f583d891859c94b24db0ec38a301c3f3466a.patch">https://github.com/davidvossel/libqb/commit/7f56f583d891859c94b24db0ec38a301c3f3466a.patch</a></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">2015-08-06 1:57 GMT+02:00 Pallai Roland <span dir="ltr"><<a href="mailto:pallair@magex.hu" target="_blank">pallair@magex.hu</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">hi,<div><br></div><div>I've built a recent cluster stack from sources on Debian Jessie and I can't get rid of cpu spikes. Corosync blocks the entire system for seconds on every simple transition, even itself:</div><div><br></div><div> drbdtest1 corosync[4734]: [MAIN ] Corosync main process was not scheduled for 2590.4512 ms (threshold is 2400.0000 ms). Consider token timeout increase.<br></div><div><br></div><div>and even drbd:</div><div> drbdtest1 kernel: drbd p1: PingAck did not arrive in time.</div><div><br></div><div>My previous build (corosync 1.4.6, libqb 0.17.0, pacemaker 1.1.12) works fine on this nodes with the same corosync/pacemaker setup.</div><div><br></div><div>What should I try? It's a test environment, the issue is 100% reproducible in seconds. Network traffic is minimal all the time and there is no I/O load.</div><div><br></div><div><br></div><div><b>Pacemaker config:</b><br><br></div><div><div>node 167969573: drbdtest1</div><div>node 167969574: drbdtest2</div><div>primitive drbd_p1 ocf:linbit:drbd \</div><div> params drbd_resource=p1 \</div><div> op monitor interval=30</div><div>primitive drbd_p2 ocf:linbit:drbd \</div><div> params drbd_resource=p2 \</div><div> op monitor interval=30</div><div>primitive dummy_test ocf:pacemaker:Dummy \</div><div> meta allow-migrate=true \</div><div> params state="/var/run/activenode"</div><div>primitive fence_libvirt stonith:external/libvirt \</div><div> params hostlist="drbdtest1,drbdtest2" hypervisor_uri="qemu+ssh://libvirt-fencing@mgx4/system" \</div><div> op monitor interval=30</div><div>primitive fs_boot Filesystem \</div><div> params device="/dev/null" directory="/boot" fstype="*" \</div><div> meta is-managed=false \</div><div> op monitor interval=20 timeout=40 on-fail=block OCF_CHECK_LEVEL=20</div><div>primitive fs_f1 Filesystem \</div><div> params device="/dev/drbd/by-res/p1" directory="/mnt/p1" fstype=ext4 options="commit=60,barrier=0,data=writeback" \</div><div> op monitor interval=20 timeout=40 \</div><div> op start timeout=300 interval=0 \</div><div> op stop timeout=180 interval=0</div><div>primitive ip_10.3.3.138 IPaddr2 \</div><div> params ip=10.3.3.138 cidr_netmask=32 \</div><div> op monitor interval=10s timeout=20s</div><div>primitive sysinfo ocf:pacemaker:SysInfo \</div><div> op start timeout=20s interval=0 \</div><div> op stop timeout=20s interval=0 \</div><div> op monitor interval=60s</div><div>group dummy-group dummy_test</div><div>ms ms_drbd_p1 drbd_p1 \</div><div> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true</div><div>ms ms_drbd_p2 drbd_p2 \</div><div> meta master-max=2 master-node-max=1 clone-max=2 notify=true</div><div>clone fencing_by_libvirt fence_libvirt \</div><div> meta globally-unique=false</div><div>clone fs_boot_clone fs_boot</div><div>clone sysinfos sysinfo \</div><div> meta globally-unique=false</div><div>location fs1_on_high_load fs_f1 \<br></div><div> rule -inf: cpu_load gte 4</div><div>colocation dummy_coloc inf: dummy-group ms_drbd_p2:Master</div><div>colocation f1a-coloc inf: fs_f1 ms_drbd_p1:Master</div><div>colocation f1b-coloc inf: fs_f1 fs_boot_clone:Started</div><div>order dummy_order inf: ms_drbd_p2:promote dummy-group:start</div><div>order orderA inf: ms_drbd_p1:promote fs_f1:start</div><div>property cib-bootstrap-options: \</div><div> dc-version=1.1.13-6052cd1 \</div><div> cluster-infrastructure=corosync \</div><div> expected-quorum-votes=2 \</div><div> no-quorum-policy=ignore \</div><div> symmetric-cluster=true \</div><div> placement-strategy=default \</div><div> last-lrm-refresh=1438735742 \</div><div> have-watchdog=false</div><div>property cib-bootstrap-options-stonith: \</div><div> stonith-enabled=true \</div><div> stonith-action=reboot</div><div>rsc_defaults rsc-options: \</div><div> resource-stickiness=100</div></div><div><br></div><div><br></div><div><b>corosync.conf:</b></div><div><br></div><div><div>totem {</div><div> version: 2</div><div> token: 3000</div><div> token_retransmits_before_loss_const: 10</div><div> clear_node_high_bit: yes</div><div> crypto_cipher: none</div><div> crypto_hash: none</div><div> interface {</div><div> ringnumber: 0</div><div> bindnetaddr: 10.3.3.37</div><div> mcastaddr: 225.0.0.37</div><div> mcastport: 5403</div><div> ttl: 1</div><div> }</div><div>}</div><div><br></div><div>logging {</div><div> fileline: off</div><div> to_stderr: no</div><div> to_logfile: yes</div><div> logfile: /var/log/corosync/corosync.log</div><div> to_syslog: yes</div><div> syslog_facility: daemon</div><div> debug: off</div><div> timestamp: on</div><div> logger_subsys {</div><div> subsys: QUORUM</div><div> debug: off</div><div> }</div><div>}</div><div><br></div><div>quorum {</div><div> provider: corosync_votequorum</div><div> expected_votes: 2</div><div>}</div></div><div><br></div></div>
</blockquote></div><br></div></div>