[Pacemaker] Resources time out

Pedro Sousa pgsousa at gmail.com
Tue Nov 20 10:02:51 EST 2012


Hi all,

some strange behavior is happening when I do some more intensive work on my
cluster like running a bash script or wireshark, some pacemaker resources
start to time out and fail back to the other node. I was running this
script:

# find /sharedstorage/var/log/asterisk/cdr-csv/ -type f  -size 0 -exec rm
-f {} \;

to clean some unused 0-byte files on my drbd shared storage when I saw this
on my logs and some resources failling:

Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Sip:monitor process
(PID 30312) timed out (try 1).  Killing with signal SIGTERM (15).
Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Admin:monitor process
(PID 30313) timed out (try 1).  Killing with signal SIGTERM (15).
Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Asterisk:monitor
process (PID 30314) timed out (try 1).  Killing with signal SIGTERM (15).
Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[416] on
ocf::IPaddr2::res_IPaddr2_Sip for client 27146, its parameters:
CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5]
CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Sip]
ip=[10.100.251.30] : pid [30312] timed out
Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[428] on
ocf::IPaddr2::res_IPaddr2_Admin for client 27146, its parameters:
CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5]
CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Admin]
ip=[10.100.252.30] : pid [30313] timed out
Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[442] on
ocf::IPaddr2::res_IPaddr2_Asterisk for client 27146, its parameters:
CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5]
CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Asterisk]
ip=[10.100.251.100] : pid [30314] timed out
Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: G_SIG_dispatch: Dispatch function
for SIGCHLD took too long to execute: 970 ms (> 300 ms) (GSource: 0x1ab0d60)
Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: perform_ra_op: the operation
operation monitor[434] on lsb::ntpd::res_ntpd_Sip for client 27146, its
parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[15000]
crm_feature_set=[3.0.5] CRM_meta_timeout=[30000] CRM_meta_interval=[15000]
 stayed in operation list for 21100 ms (longer than 10000 ms)
Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: perform_ra_op: the operation
operation monitor[429] on lsb::dhcpd::res_dhcpd_Sip for client 27146, its
parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[15000]
crm_feature_set=[3.0.5] CRM_meta_timeout=[30000] CRM_meta_interval=[15000]
 stayed in operation list for 19910 ms (longer than 10000 ms)

Any hint on what can be causing this? Can anybody help? Thanks.

Here's my configuration:

2 x node cluster Centos 6.0 64-bit 2GB RAM
pacemaker-1.1.6-3.el6.x86_64
pacemaker-libs-1.1.6-3.el6.x86_64
pacemaker-cli-1.1.6-3.el6.x86_64
pacemaker-cluster-libs-1.1.6-3.el6.x86_64
corosync-1.4.1-4.el6.x86_64
corosynclib-1.4.1-4.el6.x86_64
openaislib-1.1.1-7.el6.x86_64
openais-1.1.1-7.el6.x86_64

corosync.conf:

aisexec {
user: root
group: root
}

corosync {
user: root
group: root
}

amf {
mode: disabled
}

logging {
to_stderr: yes
debug: off
timestamp: on
to_file: no
to_syslog: yes
syslog_facility: daemon
}

totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 4000
vsftype: none
max_messages: 20
clear_node_high_bit: yes
secauth: on
threads: 0
# nodeid: 1234
rrp_mode: active

interface {
ringnumber: 0
bindnetaddr: 10.0.0.2
mcastaddr: 226.94.1.1
mcastport: 4000
}

}

service {
ver: 0
name: pacemaker
use_mgmtd: yes
}

pacemaker configuration

# crm configure edit

node nd01.lab
node nd02.lab \
        attributes standby="off"
primitive res_Filesystem_Sip ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/sharedstorage" fstype="ext4"
\
        operations $id="res_Filesystem_Sip-operations" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40" start-delay="0" \
        op notify interval="0" timeout="60"
primitive res_IPaddr2_Admin ocf:heartbeat:IPaddr2 \
        params ip="10.100.252.30" iflabel="Admin" \
        operations $id="res_IPaddr2_Admin-operations" \
        op start interval="0" timeout="20" \
        op stop interval="0" timeout="20" \
        op monitor interval="10" timeout="20" start-delay="0"
primitive res_IPaddr2_Asterisk ocf:heartbeat:IPaddr2 \
        params ip="10.100.251.100" iflabel="Asterisk" \
        operations $id="res_IPaddr2_Asterisk-operations" \
        op start interval="0" timeout="20" \
        op stop interval="0" timeout="20" \
        op monitor interval="10" timeout="20" start-delay="0"
primitive res_IPaddr2_Sip ocf:heartbeat:IPaddr2 \
        params ip="10.100.251.30" iflabel="Sip" \
        operations $id="res_IPaddr2_Sip-operations" \
        op start interval="0" timeout="20" \
        op stop interval="0" timeout="20" \
        op monitor interval="10" timeout="20" start-delay="0"
primitive res_asterisk_Asterisk lsb:asterisk \
        operations $id="res_asterisk_Asterisk-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta target-role="Started"
primitive res_dhcpd_Sip lsb:dhcpd \
        operations $id="res_dhcpd_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta is-managed="true" target-role="Started"
primitive res_drbd_1 ocf:linbit:drbd \
        params drbd_resource="r0" \
        operations $id="res_drbd_1-operations" \
        op start interval="0" timeout="240" \
        op promote interval="0" timeout="90" \
        op demote interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="10" timeout="20" start-delay="0" \
        op notify interval="0" timeout="90"
primitive res_drbdlinks_Sip heartbeat:drbdlinks \
        params 1="-c" 2="/etc/drbdlinks.conf" \
        operations $id="res_drbdlinks_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta target-role="started"
primitive res_faxmodems_Sip lsb:faxmodems \
        operations $id="res_faxmodems_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta is-managed="true" target-role="Started"
primitive res_httpd_Sip lsb:httpd \
        operations $id="res_httpd_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta is-managed="true" target-role="Started"
primitive res_hylafax_Sip lsb:hylafax \
        operations $id="res_hylafax_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta target-role="Started"
primitive res_iaxmodem_Sip lsb:iaxmodem \
        operations $id="res_iaxmodem_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta is-managed="true" target-role="Started"
primitive res_kamailio_Sip lsb:kamailio \
        operations $id="res_kamailio_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta target-role="Started"
primitive res_mysqld_Sip lsb:mysqld \
        operations $id="res_mysqld_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15"
primitive res_named_Sip lsb:named \
        operations $id="res_named_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15" \
        meta target-role="started" is-managed="true"
primitive res_nfs_Sip lsb:nfs \
        operations $id="res_nfs_Sip-operations" \
        op start interval="0" timeout="15" \        op stop interval="0"
timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15"
primitive res_ntpd_Sip lsb:ntpd \
        operations $id="res_ntpd_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15"
primitive res_postfix_Sip lsb:postfix \
        operations $id="res_postfix_Sip-operations" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor interval="15" timeout="15" start-delay="15"
ms ms_drbd_1 res_drbd_1 \
        meta clone-max="2" notify="true"
colocation col_res_Filesystem_Sip_ms_drbd_1 inf: res_Filesystem_Sip
ms_drbd_1:Master
colocation col_res_IPaddr2_Admin_res_Filesystem_Sip inf: res_IPaddr2_Admin
res_Filesystem_Sip
colocation col_res_IPaddr2_Sip_res_Filesystem_Sip inf: res_IPaddr2_Sip
res_Filesystem_Sip
colocation col_res_asterisk_Asterisk_res_IPaddr2_Asterisk inf:
res_asterisk_Asterisk res_IPaddr2_Asterisk
colocation col_res_drbdlinks_Sip_res_IPaddr2_Asterisk inf:
res_IPaddr2_Asterisk res_drbdlinks_Sip
colocation col_res_drbdlinks_Sip_res_IPaddr2_Sip inf: res_drbdlinks_Sip
res_IPaddr2_Sip
colocation col_res_drbdlinks_Sip_res_dhcpd_Sip inf: res_dhcpd_Sip
res_drbdlinks_Sip
colocation col_res_faxmodems_Sip_res_drbdlinks_Sip inf: res_faxmodems_Sip
res_drbdlinks_Sip
colocation col_res_httpd_Sip_res_drbdlinks_Sip inf: res_httpd_Sip
res_drbdlinks_Sip
colocation col_res_hylafax_Sip_res_drbdlinks_Sip inf: res_hylafax_Sip
res_drbdlinks_Sip
colocation col_res_iaxmodem_Sip_res_drbdlinks_Sip inf: res_iaxmodem_Sip
res_drbdlinks_Sip
colocation col_res_kamailio_Sip_res_IPaddr2_Asterisk inf:
res_IPaddr2_Asterisk res_kamailio_Sip
colocation col_res_kamailio_Sip_res_drbdlinks_Sip inf: res_kamailio_Sip
res_drbdlinks_Sip
colocation col_res_mysqld_Sip_res_drbdlinks_Sip inf: res_mysqld_Sip
res_drbdlinks_Sip
colocation col_res_mysqld_Sip_res_kamailio_Sip inf: res_kamailio_Sip
res_mysqld_Sip
colocation col_res_named_Sip_res_drbdlinks_Sip inf: res_named_Sip
res_drbdlinks_Sip
colocation col_res_named_Sip_res_kamailio_Sip inf: res_kamailio_Sip
res_named_Sip
colocation col_res_nfs_Sip_res_drbdlinks_Sip inf: res_nfs_Sip
res_drbdlinks_Sip
colocation col_res_ntpd_Sip_res_drbdlinks_Sip inf: res_ntpd_Sip
res_drbdlinks_Sip
colocation col_res_postfix_Sip_res_drbdlinks_Sip inf: res_postfix_Sip
res_drbdlinks_Sip
order ord_ms_drbd_1_res_Filesystem_Sip inf: ms_drbd_1:promote
res_Filesystem_Sip:start
order ord_res_Filesystem_Sip_res_IPaddr2_Admin inf: res_Filesystem_Sip
res_IPaddr2_Admin
order ord_res_Filesystem_Sip_res_IPaddr2_Sip inf: res_Filesystem_Sip
res_IPaddr2_Sip
order ord_res_IPaddr2_Asterisk_res_asterisk_Asterisk inf:
res_IPaddr2_Asterisk res_asterisk_Asterisk
order ord_res_IPaddr2_Sip_res_drbdlinks_Sip inf: res_IPaddr2_Sip
res_drbdlinks_Sip
order ord_res_drbdlinks_Sip_res_IPaddr2_Asterisk inf: res_drbdlinks_Sip
res_IPaddr2_Asterisk
order ord_res_drbdlinks_Sip_res_dhcpd_Sip inf: res_drbdlinks_Sip
res_dhcpd_Sip
order ord_res_drbdlinks_Sip_res_faxmodems_Sip inf: res_drbdlinks_Sip
res_faxmodems_Sip
order ord_res_drbdlinks_Sip_res_httpd_Sip inf: res_drbdlinks_Sip
res_httpd_Sip
order ord_res_drbdlinks_Sip_res_hylafax_Sip inf: res_drbdlinks_Sip
res_hylafax_Sip
order ord_res_drbdlinks_Sip_res_iaxmodem_Sip inf: res_drbdlinks_Sip
res_iaxmodem_Sip
order ord_res_drbdlinks_Sip_res_kamailio_Sip inf: res_drbdlinks_Sip
res_kamailio_Sip
order ord_res_drbdlinks_Sip_res_mysqld_Sip inf: res_drbdlinks_Sip
res_mysqld_Sip
order ord_res_drbdlinks_Sip_res_named_Sip inf: res_drbdlinks_Sip
res_named_Sip
order ord_res_drbdlinks_Sip_res_nfs_Sip inf: res_drbdlinks_Sip res_nfs_Sip
order ord_res_drbdlinks_Sip_res_ntpd_Sip inf: res_drbdlinks_Sip res_ntpd_Sip
order ord_res_drbdlinks_Sip_res_postfix_Sip inf: res_drbdlinks_Sip
res_postfix_Sip
order ord_res_kamailio_Sip_res_IPaddr2_Asterisk inf: res_kamailio_Sip
res_IPaddr2_Asterisk
order ord_res_mysqld_Sip_res_kamailio_Sip inf: res_mysqld_Sip
res_kamailio_Sip
order ord_res_named_Sip_res_kamailio_Sip inf: res_named_Sip res_kamailio_Sip
property $id="cib-bootstrap-options" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
        no-quorum-policy="ignore" \
        cluster-infrastructure="openais" \
        last-lrm-refresh="1353415104"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"


Regards,
Pedro Sousa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121120/bb9b140a/attachment-0002.html>


More information about the Pacemaker mailing list