[Pacemaker] lrmd WARN on high IO load

Diego Woitasen diego at woitasen.com.ar
Wed Aug 11 16:17:03 EDT 2010


Hi

2010/8/2 Dejan Muhamedagic <dejanmm at fastmail.fm>:
> Hi,
>
> On Mon, Jul 19, 2010 at 07:09:11PM -0300, Diego Woitasen wrote:
>> 2010/7/16 Diego Woitasen <diego at woitasen.com.ar>:
>> > Hi,
>> >  I've installed Heartbeat+Pacemaker (3.0.3 and 1.0.9). I have a
>> > resource which executes an script to check the service:
>> >
>> > primitive kolab_imapd ocf:heartbeat:kolab-service \
>> >        params service="all" monitor_script="/usr/local/bin/check-imap.py" \
>> >        meta migration-threshold="3" failure-timeout="300s" is-managed="true" \
>> >        operations $id="operations-imap" \
>> >        op monitor interval="20s" timeout="30s" on-fail="restart" \
>> >        op start interval="0" timeout="120" \
>> >        op stop interval="0" timeout="120"
>> >
>> > I did I/O stress using bonnie++ and I started to see this message:
>> >
>> > Jul 16 18:24:38 imapserver lrmd: [4719]: WARN: perform_ra_op: the
>> > operation operation monitor[21] on ocf::kolab-service::kolab_imapd for
>> > client 4722, its parameters: CRM_meta_interval=[20000]
>> > monitor_script=[/usr/local/bin/check-imap.py]
>> > CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000]
>> > crm_feature_set=[3.0.1] CRM_meta_name=[monitor] service=[all]  stayed
>> > in operation list for 32740 ms (longer than 10000 ms)
>> >
>> > The problem is that I've got this messages under High I/O without the
>> > stress testing, for example running backups. If I understand that
>> > message correctly the monitor operation didn't start, it was waiting
>> > on some workqueue to start.
>
> It was most probably waiting for the previous monitor operation
> to finish, though that one should have timed out according to
> your configuration. Or there were at least 4 operations on
> different resources running on the node. If you expect high load
> on the server, you should tune timeouts accordingly.

And what are the correct values for timeout and interval? timeout < interval?

>
> Thanks,
>
> Dejan
>
>> > If I try to execute a command while I'm running the stress it's slow
>> > (3 seconds aprox.) but it works. For example, I can run "crm configure
>> > show" and the output appears in 3 o 4 seconds.
>> >
>> > The server have 2 quad-core processors, 6 GB of RAM, running RHEL 5.
>> >
>> > Regards,
>> >  Diego
>> >
>> > --
>> > Diego Woitasen
>> >
>>
>>
>> I've rised the priority of the process to 10 and works now.
>>
>> The documentations says that default rtprio is 5. That's wrong it's 1.
>> At least in my pkgs...
>>
>> Regards,
>>  Diego
>>
>> --
>> Diego Woitasen
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




-- 
Diego Woitasen




More information about the Pacemaker mailing list