[Pacemaker] lrmd WARN on high IO load

Diego Woitasen diego at woitasen.com.ar
Fri Jul 16 18:27:53 EDT 2010

  I've installed Heartbeat+Pacemaker (3.0.3 and 1.0.9). I have a
resource which executes an script to check the service:

primitive kolab_imapd ocf:heartbeat:kolab-service \
        params service="all" monitor_script="/usr/local/bin/check-imap.py" \
        meta migration-threshold="3" failure-timeout="300s" is-managed="true" \
        operations $id="operations-imap" \
        op monitor interval="20s" timeout="30s" on-fail="restart" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120"

I did I/O stress using bonnie++ and I started to see this message:

Jul 16 18:24:38 imapserver lrmd: [4719]: WARN: perform_ra_op: the
operation operation monitor[21] on ocf::kolab-service::kolab_imapd for
client 4722, its parameters: CRM_meta_interval=[20000]
CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000]
crm_feature_set=[3.0.1] CRM_meta_name=[monitor] service=[all]  stayed
in operation list for 32740 ms (longer than 10000 ms)

The problem is that I've got this messages under High I/O without the
stress testing, for example running backups. If I understand that
message correctly the monitor operation didn't start, it was waiting
on some workqueue to start.

If I try to execute a command while I'm running the stress it's slow
(3 seconds aprox.) but it works. For example, I can run "crm configure
show" and the output appears in 3 o 4 seconds.

The server have 2 quad-core processors, 6 GB of RAM, running RHEL 5.


Diego Woitasen

