[Pacemaker] Resources time out

Andrew Beekhof andrew at beekhof.net
Sun Nov 25 19:40:11 EST 2012


On Mon, Nov 26, 2012 at 11:37 AM, Pedro Sousa <pgsousa at gmail.com> wrote:
> Hi,
>
> thank you for your answer.
>
> Do you think that if I increase the resource timeout that's failling it will
> solve the problem?

It couldn't hurt.

>
> Regards,
> Pedro Sousa
>
>
> On Mon, Nov 26, 2012 at 12:23 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>> On Wed, Nov 21, 2012 at 2:02 AM, Pedro Sousa <pgsousa at gmail.com> wrote:
>> > Hi all,
>> >
>> > some strange behavior is happening when I do some more intensive work on
>> > my
>> > cluster like running a bash script or wireshark, some pacemaker
>> > resources
>> > start to time out and fail back to the other node. I was running this
>> > script:
>> >
>> > # find /sharedstorage/var/log/asterisk/cdr-csv/ -type f  -size 0 -exec
>> > rm -f
>> > {} \;
>> >
>> > to clean some unused 0-byte files on my drbd shared storage when I saw
>> > this
>> > on my logs and some resources failling:
>> >
>> > Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Sip:monitor
>> > process
>> > (PID 30312) timed out (try 1).  Killing with signal SIGTERM (15).
>> > Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Admin:monitor
>> > process
>> > (PID 30313) timed out (try 1).  Killing with signal SIGTERM (15).
>> > Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Asterisk:monitor
>> > process (PID 30314) timed out (try 1).  Killing with signal SIGTERM
>> > (15).
>> > Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[416] on
>> > ocf::IPaddr2::res_IPaddr2_Sip for client 27146, its parameters:
>> > CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5]
>> > CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Sip]
>> > ip=[10.100.251.30] : pid [30312] timed out
>> > Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[428] on
>> > ocf::IPaddr2::res_IPaddr2_Admin for client 27146, its parameters:
>> > CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5]
>> > CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Admin]
>> > ip=[10.100.252.30] : pid [30313] timed out
>> > Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[442] on
>> > ocf::IPaddr2::res_IPaddr2_Asterisk for client 27146, its parameters:
>> > CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5]
>> > CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Asterisk]
>> > ip=[10.100.251.100] : pid [30314] timed out
>> > Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: G_SIG_dispatch: Dispatch
>> > function
>> > for SIGCHLD took too long to execute: 970 ms (> 300 ms) (GSource:
>> > 0x1ab0d60)
>> > Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: perform_ra_op: the operation
>> > operation monitor[434] on lsb::ntpd::res_ntpd_Sip for client 27146, its
>> > parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[15000]
>> > crm_feature_set=[3.0.5] CRM_meta_timeout=[30000]
>> > CRM_meta_interval=[15000]
>> > stayed in operation list for 21100 ms (longer than 10000 ms)
>> > Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: perform_ra_op: the operation
>> > operation monitor[429] on lsb::dhcpd::res_dhcpd_Sip for client 27146,
>> > its
>> > parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[15000]
>> > crm_feature_set=[3.0.5] CRM_meta_timeout=[30000]
>> > CRM_meta_interval=[15000]
>> > stayed in operation list for 19910 ms (longer than 10000 ms)
>> >
>> > Any hint on what can be causing this? Can anybody help? Thanks.
>>
>> I wouldn't have thought that one should be able to affect the other
>> unless the find was running as a high priority task.
>> But if it is a large filesystem then all those directory listings and
>> file lookups would keep the kernel quite busy, long enough to delay a
>> few socket()/bind()/recvmsg() calls I suppose.
>>
>> >
>> > Here's my configuration:
>> >
>> > 2 x node cluster Centos 6.0 64-bit 2GB RAM
>> > pacemaker-1.1.6-3.el6.x86_64
>> > pacemaker-libs-1.1.6-3.el6.x86_64
>> > pacemaker-cli-1.1.6-3.el6.x86_64
>> > pacemaker-cluster-libs-1.1.6-3.el6.x86_64
>> > corosync-1.4.1-4.el6.x86_64
>> > corosynclib-1.4.1-4.el6.x86_64
>> > openaislib-1.1.1-7.el6.x86_64
>> > openais-1.1.1-7.el6.x86_64
>> >
>> > corosync.conf:
>> >
>> > aisexec {
>> > user: root
>> > group: root
>> > }
>> >
>> > corosync {
>> > user: root
>> > group: root
>> > }
>> >
>> > amf {
>> > mode: disabled
>> > }
>> >
>> > logging {
>> > to_stderr: yes
>> > debug: off
>> > timestamp: on
>> > to_file: no
>> > to_syslog: yes
>> > syslog_facility: daemon
>> > }
>> >
>> > totem {
>> > version: 2
>> > token: 3000
>> > token_retransmits_before_loss_const: 10
>> > join: 60
>> > consensus: 4000
>> > vsftype: none
>> > max_messages: 20
>> > clear_node_high_bit: yes
>> > secauth: on
>> > threads: 0
>> > # nodeid: 1234
>> > rrp_mode: active
>> >
>> > interface {
>> > ringnumber: 0
>> > bindnetaddr: 10.0.0.2
>> > mcastaddr: 226.94.1.1
>> > mcastport: 4000
>> > }
>> >
>> > }
>> >
>> > service {
>> > ver: 0
>> > name: pacemaker
>> > use_mgmtd: yes
>> > }
>> >
>> > pacemaker configuration
>> >
>> > # crm configure edit
>> >
>> > node nd01.lab
>> > node nd02.lab \
>> >         attributes standby="off"
>> > primitive res_Filesystem_Sip ocf:heartbeat:Filesystem \
>> >         params device="/dev/drbd0" directory="/sharedstorage"
>> > fstype="ext4"
>> > \
>> >         operations $id="res_Filesystem_Sip-operations" \
>> >         op start interval="0" timeout="60" \
>> >         op stop interval="0" timeout="60" \
>> >         op monitor interval="20" timeout="40" start-delay="0" \
>> >         op notify interval="0" timeout="60"
>> > primitive res_IPaddr2_Admin ocf:heartbeat:IPaddr2 \
>> >         params ip="10.100.252.30" iflabel="Admin" \
>> >         operations $id="res_IPaddr2_Admin-operations" \
>> >         op start interval="0" timeout="20" \
>> >         op stop interval="0" timeout="20" \
>> >         op monitor interval="10" timeout="20" start-delay="0"
>> > primitive res_IPaddr2_Asterisk ocf:heartbeat:IPaddr2 \
>> >         params ip="10.100.251.100" iflabel="Asterisk" \
>> >         operations $id="res_IPaddr2_Asterisk-operations" \
>> >         op start interval="0" timeout="20" \
>> >         op stop interval="0" timeout="20" \
>> >         op monitor interval="10" timeout="20" start-delay="0"
>> > primitive res_IPaddr2_Sip ocf:heartbeat:IPaddr2 \
>> >         params ip="10.100.251.30" iflabel="Sip" \
>> >         operations $id="res_IPaddr2_Sip-operations" \
>> >         op start interval="0" timeout="20" \
>> >         op stop interval="0" timeout="20" \
>> >         op monitor interval="10" timeout="20" start-delay="0"
>> > primitive res_asterisk_Asterisk lsb:asterisk \
>> >         operations $id="res_asterisk_Asterisk-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta target-role="Started"
>> > primitive res_dhcpd_Sip lsb:dhcpd \
>> >         operations $id="res_dhcpd_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta is-managed="true" target-role="Started"
>> > primitive res_drbd_1 ocf:linbit:drbd \
>> >         params drbd_resource="r0" \
>> >         operations $id="res_drbd_1-operations" \
>> >         op start interval="0" timeout="240" \
>> >         op promote interval="0" timeout="90" \
>> >         op demote interval="0" timeout="90" \
>> >         op stop interval="0" timeout="100" \
>> >         op monitor interval="10" timeout="20" start-delay="0" \
>> >         op notify interval="0" timeout="90"
>> > primitive res_drbdlinks_Sip heartbeat:drbdlinks \
>> >         params 1="-c" 2="/etc/drbdlinks.conf" \
>> >         operations $id="res_drbdlinks_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta target-role="started"
>> > primitive res_faxmodems_Sip lsb:faxmodems \
>> >         operations $id="res_faxmodems_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta is-managed="true" target-role="Started"
>> > primitive res_httpd_Sip lsb:httpd \
>> >         operations $id="res_httpd_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta is-managed="true" target-role="Started"
>> > primitive res_hylafax_Sip lsb:hylafax \
>> >         operations $id="res_hylafax_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta target-role="Started"
>> > primitive res_iaxmodem_Sip lsb:iaxmodem \
>> >         operations $id="res_iaxmodem_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta is-managed="true" target-role="Started"
>> > primitive res_kamailio_Sip lsb:kamailio \
>> >         operations $id="res_kamailio_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta target-role="Started"
>> > primitive res_mysqld_Sip lsb:mysqld \
>> >         operations $id="res_mysqld_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15"
>> > primitive res_named_Sip lsb:named \
>> >         operations $id="res_named_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15" \
>> >         meta target-role="started" is-managed="true"
>> > primitive res_nfs_Sip lsb:nfs \
>> >         operations $id="res_nfs_Sip-operations" \
>> >         op start interval="0" timeout="15" \        op stop interval="0"
>> > timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15"
>> > primitive res_ntpd_Sip lsb:ntpd \
>> >         operations $id="res_ntpd_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15"
>> > primitive res_postfix_Sip lsb:postfix \
>> >         operations $id="res_postfix_Sip-operations" \
>> >         op start interval="0" timeout="15" \
>> >         op stop interval="0" timeout="15" \
>> >         op monitor interval="15" timeout="15" start-delay="15"
>> > ms ms_drbd_1 res_drbd_1 \
>> >         meta clone-max="2" notify="true"
>> > colocation col_res_Filesystem_Sip_ms_drbd_1 inf: res_Filesystem_Sip
>> > ms_drbd_1:Master
>> > colocation col_res_IPaddr2_Admin_res_Filesystem_Sip inf:
>> > res_IPaddr2_Admin
>> > res_Filesystem_Sip
>> > colocation col_res_IPaddr2_Sip_res_Filesystem_Sip inf: res_IPaddr2_Sip
>> > res_Filesystem_Sip
>> > colocation col_res_asterisk_Asterisk_res_IPaddr2_Asterisk inf:
>> > res_asterisk_Asterisk res_IPaddr2_Asterisk
>> > colocation col_res_drbdlinks_Sip_res_IPaddr2_Asterisk inf:
>> > res_IPaddr2_Asterisk res_drbdlinks_Sip
>> > colocation col_res_drbdlinks_Sip_res_IPaddr2_Sip inf: res_drbdlinks_Sip
>> > res_IPaddr2_Sip
>> > colocation col_res_drbdlinks_Sip_res_dhcpd_Sip inf: res_dhcpd_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_faxmodems_Sip_res_drbdlinks_Sip inf:
>> > res_faxmodems_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_httpd_Sip_res_drbdlinks_Sip inf: res_httpd_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_hylafax_Sip_res_drbdlinks_Sip inf: res_hylafax_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_iaxmodem_Sip_res_drbdlinks_Sip inf: res_iaxmodem_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_kamailio_Sip_res_IPaddr2_Asterisk inf:
>> > res_IPaddr2_Asterisk res_kamailio_Sip
>> > colocation col_res_kamailio_Sip_res_drbdlinks_Sip inf: res_kamailio_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_mysqld_Sip_res_drbdlinks_Sip inf: res_mysqld_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_mysqld_Sip_res_kamailio_Sip inf: res_kamailio_Sip
>> > res_mysqld_Sip
>> > colocation col_res_named_Sip_res_drbdlinks_Sip inf: res_named_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_named_Sip_res_kamailio_Sip inf: res_kamailio_Sip
>> > res_named_Sip
>> > colocation col_res_nfs_Sip_res_drbdlinks_Sip inf: res_nfs_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_ntpd_Sip_res_drbdlinks_Sip inf: res_ntpd_Sip
>> > res_drbdlinks_Sip
>> > colocation col_res_postfix_Sip_res_drbdlinks_Sip inf: res_postfix_Sip
>> > res_drbdlinks_Sip
>> > order ord_ms_drbd_1_res_Filesystem_Sip inf: ms_drbd_1:promote
>> > res_Filesystem_Sip:start
>> > order ord_res_Filesystem_Sip_res_IPaddr2_Admin inf: res_Filesystem_Sip
>> > res_IPaddr2_Admin
>> > order ord_res_Filesystem_Sip_res_IPaddr2_Sip inf: res_Filesystem_Sip
>> > res_IPaddr2_Sip
>> > order ord_res_IPaddr2_Asterisk_res_asterisk_Asterisk inf:
>> > res_IPaddr2_Asterisk res_asterisk_Asterisk
>> > order ord_res_IPaddr2_Sip_res_drbdlinks_Sip inf: res_IPaddr2_Sip
>> > res_drbdlinks_Sip
>> > order ord_res_drbdlinks_Sip_res_IPaddr2_Asterisk inf: res_drbdlinks_Sip
>> > res_IPaddr2_Asterisk
>> > order ord_res_drbdlinks_Sip_res_dhcpd_Sip inf: res_drbdlinks_Sip
>> > res_dhcpd_Sip
>> > order ord_res_drbdlinks_Sip_res_faxmodems_Sip inf: res_drbdlinks_Sip
>> > res_faxmodems_Sip
>> > order ord_res_drbdlinks_Sip_res_httpd_Sip inf: res_drbdlinks_Sip
>> > res_httpd_Sip
>> > order ord_res_drbdlinks_Sip_res_hylafax_Sip inf: res_drbdlinks_Sip
>> > res_hylafax_Sip
>> > order ord_res_drbdlinks_Sip_res_iaxmodem_Sip inf: res_drbdlinks_Sip
>> > res_iaxmodem_Sip
>> > order ord_res_drbdlinks_Sip_res_kamailio_Sip inf: res_drbdlinks_Sip
>> > res_kamailio_Sip
>> > order ord_res_drbdlinks_Sip_res_mysqld_Sip inf: res_drbdlinks_Sip
>> > res_mysqld_Sip
>> > order ord_res_drbdlinks_Sip_res_named_Sip inf: res_drbdlinks_Sip
>> > res_named_Sip
>> > order ord_res_drbdlinks_Sip_res_nfs_Sip inf: res_drbdlinks_Sip
>> > res_nfs_Sip
>> > order ord_res_drbdlinks_Sip_res_ntpd_Sip inf: res_drbdlinks_Sip
>> > res_ntpd_Sip
>> > order ord_res_drbdlinks_Sip_res_postfix_Sip inf: res_drbdlinks_Sip
>> > res_postfix_Sip
>> > order ord_res_kamailio_Sip_res_IPaddr2_Asterisk inf: res_kamailio_Sip
>> > res_IPaddr2_Asterisk
>> > order ord_res_mysqld_Sip_res_kamailio_Sip inf: res_mysqld_Sip
>> > res_kamailio_Sip
>> > order ord_res_named_Sip_res_kamailio_Sip inf: res_named_Sip
>> > res_kamailio_Sip
>> > property $id="cib-bootstrap-options" \
>> >         expected-quorum-votes="2" \
>> >         stonith-enabled="false" \
>> >
>> > dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
>> >         no-quorum-policy="ignore" \
>> >         cluster-infrastructure="openais" \
>> >         last-lrm-refresh="1353415104"
>> > rsc_defaults $id="rsc-options" \
>> >         resource-stickiness="100"
>> >
>> >
>> > Regards,
>> > Pedro Sousa
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Pacemaker mailing list