[Pacemaker] lrmd WARN on high IO load

Mon Jul 19 18:09:11 EDT 2010

2010/7/16 Diego Woitasen <diego at woitasen.com.ar>:
> Hi,
>  I've installed Heartbeat+Pacemaker (3.0.3 and 1.0.9). I have a
> resource which executes an script to check the service:
>
> primitive kolab_imapd ocf:heartbeat:kolab-service \
>        params service="all" monitor_script="/usr/local/bin/check-imap.py" \
>        meta migration-threshold="3" failure-timeout="300s" is-managed="true" \
>        operations $id="operations-imap" \
>        op monitor interval="20s" timeout="30s" on-fail="restart" \
>        op start interval="0" timeout="120" \
>        op stop interval="0" timeout="120"
>
> I did I/O stress using bonnie++ and I started to see this message:
>
> Jul 16 18:24:38 imapserver lrmd: [4719]: WARN: perform_ra_op: the
> operation operation monitor[21] on ocf::kolab-service::kolab_imapd for
> client 4722, its parameters: CRM_meta_interval=[20000]
> monitor_script=[/usr/local/bin/check-imap.py]
> CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000]
> crm_feature_set=[3.0.1] CRM_meta_name=[monitor] service=[all]  stayed
> in operation list for 32740 ms (longer than 10000 ms)
>
> The problem is that I've got this messages under High I/O without the
> stress testing, for example running backups. If I understand that
> message correctly the monitor operation didn't start, it was waiting
> on some workqueue to start.
>
> If I try to execute a command while I'm running the stress it's slow
> (3 seconds aprox.) but it works. For example, I can run "crm configure
> show" and the output appears in 3 o 4 seconds.
>
> The server have 2 quad-core processors, 6 GB of RAM, running RHEL 5.
>
> Regards,
>  Diego
>
> --
> Diego Woitasen
>

I've rised the priority of the process to 10 and works now.

The documentations says that default rtprio is 5. That's wrong it's 1.
At least in my pkgs...

Regards,
 Diego

-- 
Diego Woitasen