Hi all,<div><br></div><div>some strange behavior is happening when I do some more intensive work on my cluster like running a bash script or wireshark, some pacemaker resources start to time out and fail back to the other node. I was running this script:</div>
<div><br></div><div># find /sharedstorage/var/log/asterisk/cdr-csv/ -type f -size 0 -exec rm -f {} \;</div><div><br></div><div>to clean some unused 0-byte files on my drbd shared storage when I saw this on my logs and some resources failling:</div>
<div><br></div><div><div>Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Sip:monitor process (PID 30312) timed out (try 1). Killing with signal SIGTERM (15).</div><div>Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Admin:monitor process (PID 30313) timed out (try 1). Killing with signal SIGTERM (15).</div>
<div>Nov 20 12:32:19 nd02 lrmd: [27143]: WARN: res_IPaddr2_Asterisk:monitor process (PID 30314) timed out (try 1). Killing with signal SIGTERM (15).</div><div>Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[416] on ocf::IPaddr2::res_IPaddr2_Sip for client 27146, its parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5] CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Sip] ip=[10.100.251.30] : pid [30312] timed out</div>
<div>Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[428] on ocf::IPaddr2::res_IPaddr2_Admin for client 27146, its parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5] CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Admin] ip=[10.100.252.30] : pid [30313] timed out</div>
<div>Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: operation monitor[442] on ocf::IPaddr2::res_IPaddr2_Asterisk for client 27146, its parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[0] crm_feature_set=[3.0.5] CRM_meta_timeout=[20000] CRM_meta_interval=[10000] iflabel=[Asterisk] ip=[10.100.251.100] : pid [30314] timed out</div>
<div>Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 970 ms (> 300 ms) (GSource: 0x1ab0d60)</div><div>Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: perform_ra_op: the operation operation monitor[434] on lsb::ntpd::res_ntpd_Sip for client 27146, its parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[15000] crm_feature_set=[3.0.5] CRM_meta_timeout=[30000] CRM_meta_interval=[15000] stayed in operation list for 21100 ms (longer than 10000 ms)</div>
<div>Nov 20 12:32:20 nd02 lrmd: [27143]: WARN: perform_ra_op: the operation operation monitor[429] on lsb::dhcpd::res_dhcpd_Sip for client 27146, its parameters: CRM_meta_name=[monitor] CRM_meta_start_delay=[15000] crm_feature_set=[3.0.5] CRM_meta_timeout=[30000] CRM_meta_interval=[15000] stayed in operation list for 19910 ms (longer than 10000 ms)</div>
</div><div><br></div><div>Any hint on what can be causing this? Can anybody help? Thanks.</div><div><br></div><div>Here's my configuration:</div><div><br></div><div>2 x node cluster Centos 6.0 64-bit 2GB RAM</div><div>
<div>pacemaker-1.1.6-3.el6.x86_64</div><div>pacemaker-libs-1.1.6-3.el6.x86_64</div><div>pacemaker-cli-1.1.6-3.el6.x86_64</div><div>pacemaker-cluster-libs-1.1.6-3.el6.x86_64</div></div><div><div>corosync-1.4.1-4.el6.x86_64</div>
<div>corosynclib-1.4.1-4.el6.x86_64</div></div><div><div>openaislib-1.1.1-7.el6.x86_64</div><div>openais-1.1.1-7.el6.x86_64</div></div><div><br></div><div>corosync.conf:</div><div><br></div><div><div>aisexec {</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>user: root</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>group: root</div><div>}</div><div><br></div><div>corosync {</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>user: root</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>group: root</div>
<div>}</div><div><br></div><div>amf {</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>mode: disabled</div><div>}</div><div><br></div><div>logging {</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>to_stderr: yes</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>debug: off</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>timestamp: on</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>to_file: no</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>to_syslog: yes</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>syslog_facility: daemon</div><div>}</div><div><br></div><div>totem {</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>version: 2</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>token: 3000</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>token_retransmits_before_loss_const: 10</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>join: 60</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>consensus: 4000</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>vsftype: none</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>max_messages: 20</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>clear_node_high_bit: yes</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>secauth: on</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>threads: 0</div><div><span class="Apple-tab-span" style="white-space:pre"> </span># nodeid: 1234</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>rrp_mode: active</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>interface {</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ringnumber: 0</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>bindnetaddr: 10.0.0.2</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>mcastaddr: 226.94.1.1</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>mcastport: 4000</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>}</div>
<div><br></div><div>}</div><div><br></div><div>service {</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ver: 0</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>name: pacemaker</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>use_mgmtd: yes</div><div>}</div></div><div><br></div><div>pacemaker configuration</div><div><br></div><div># crm configure edit</div><div><br></div><div><div>
node nd01.lab</div><div>node nd02.lab \</div><div> attributes standby="off"</div><div>primitive res_Filesystem_Sip ocf:heartbeat:Filesystem \</div><div> params device="/dev/drbd0" directory="/sharedstorage" fstype="ext4" \</div>
<div> operations $id="res_Filesystem_Sip-operations" \</div><div> op start interval="0" timeout="60" \</div><div> op stop interval="0" timeout="60" \</div>
<div> op monitor interval="20" timeout="40" start-delay="0" \</div><div> op notify interval="0" timeout="60"</div><div>primitive res_IPaddr2_Admin ocf:heartbeat:IPaddr2 \</div>
<div> params ip="10.100.252.30" iflabel="Admin" \</div><div> operations $id="res_IPaddr2_Admin-operations" \</div><div> op start interval="0" timeout="20" \</div>
<div> op stop interval="0" timeout="20" \</div><div> op monitor interval="10" timeout="20" start-delay="0"</div><div>primitive res_IPaddr2_Asterisk ocf:heartbeat:IPaddr2 \</div>
<div> params ip="10.100.251.100" iflabel="Asterisk" \</div><div> operations $id="res_IPaddr2_Asterisk-operations" \</div><div> op start interval="0" timeout="20" \</div>
<div> op stop interval="0" timeout="20" \</div><div> op monitor interval="10" timeout="20" start-delay="0"</div><div>primitive res_IPaddr2_Sip ocf:heartbeat:IPaddr2 \</div>
<div> params ip="10.100.251.30" iflabel="Sip" \</div><div> operations $id="res_IPaddr2_Sip-operations" \</div><div> op start interval="0" timeout="20" \</div>
<div> op stop interval="0" timeout="20" \</div><div> op monitor interval="10" timeout="20" start-delay="0"</div><div>primitive res_asterisk_Asterisk lsb:asterisk \</div>
<div> operations $id="res_asterisk_Asterisk-operations" \</div><div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div>
<div> op monitor interval="15" timeout="15" start-delay="15" \</div><div> meta target-role="Started"</div><div>primitive res_dhcpd_Sip lsb:dhcpd \</div><div> operations $id="res_dhcpd_Sip-operations" \</div>
<div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div><div> op monitor interval="15" timeout="15" start-delay="15" \</div>
<div> meta is-managed="true" target-role="Started"</div><div>primitive res_drbd_1 ocf:linbit:drbd \</div><div> params drbd_resource="r0" \</div><div> operations $id="res_drbd_1-operations" \</div>
<div> op start interval="0" timeout="240" \</div><div> op promote interval="0" timeout="90" \</div><div> op demote interval="0" timeout="90" \</div>
<div> op stop interval="0" timeout="100" \</div><div> op monitor interval="10" timeout="20" start-delay="0" \</div><div> op notify interval="0" timeout="90"</div>
</div><div><div>primitive res_drbdlinks_Sip heartbeat:drbdlinks \</div><div> params 1="-c" 2="/etc/drbdlinks.conf" \</div><div> operations $id="res_drbdlinks_Sip-operations" \</div>
<div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div><div> op monitor interval="15" timeout="15" start-delay="15" \</div>
<div> meta target-role="started"</div><div>primitive res_faxmodems_Sip lsb:faxmodems \</div><div> operations $id="res_faxmodems_Sip-operations" \</div><div> op start interval="0" timeout="15" \</div>
<div> op stop interval="0" timeout="15" \</div><div> op monitor interval="15" timeout="15" start-delay="15" \</div><div> meta is-managed="true" target-role="Started"</div>
<div>primitive res_httpd_Sip lsb:httpd \</div><div> operations $id="res_httpd_Sip-operations" \</div><div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div>
<div> op monitor interval="15" timeout="15" start-delay="15" \</div><div> meta is-managed="true" target-role="Started"</div><div>primitive res_hylafax_Sip lsb:hylafax \</div>
<div> operations $id="res_hylafax_Sip-operations" \</div><div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div>
<div> op monitor interval="15" timeout="15" start-delay="15" \</div><div> meta target-role="Started"</div><div>primitive res_iaxmodem_Sip lsb:iaxmodem \</div><div> operations $id="res_iaxmodem_Sip-operations" \</div>
<div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div><div> op monitor interval="15" timeout="15" start-delay="15" \</div>
<div> meta is-managed="true" target-role="Started"</div><div>primitive res_kamailio_Sip lsb:kamailio \</div><div> operations $id="res_kamailio_Sip-operations" \</div><div> op start interval="0" timeout="15" \</div>
<div> op stop interval="0" timeout="15" \</div><div> op monitor interval="15" timeout="15" start-delay="15" \</div><div> meta target-role="Started"</div>
<div>primitive res_mysqld_Sip lsb:mysqld \</div><div> operations $id="res_mysqld_Sip-operations" \</div><div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div>
<div> op monitor interval="15" timeout="15" start-delay="15"</div><div>primitive res_named_Sip lsb:named \</div><div> operations $id="res_named_Sip-operations" \</div>
<div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div><div> op monitor interval="15" timeout="15" start-delay="15" \</div>
<div> meta target-role="started" is-managed="true"</div><div>primitive res_nfs_Sip lsb:nfs \</div><div> operations $id="res_nfs_Sip-operations" \</div><div> op start interval="0" timeout="15" \ op stop interval="0" timeout="15" \</div>
<div> op monitor interval="15" timeout="15" start-delay="15"</div><div>primitive res_ntpd_Sip lsb:ntpd \</div><div> operations $id="res_ntpd_Sip-operations" \</div><div>
op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div><div> op monitor interval="15" timeout="15" start-delay="15"</div>
<div>primitive res_postfix_Sip lsb:postfix \</div><div> operations $id="res_postfix_Sip-operations" \</div><div> op start interval="0" timeout="15" \</div><div> op stop interval="0" timeout="15" \</div>
<div> op monitor interval="15" timeout="15" start-delay="15"</div><div>ms ms_drbd_1 res_drbd_1 \</div><div> meta clone-max="2" notify="true"</div><div>colocation col_res_Filesystem_Sip_ms_drbd_1 inf: res_Filesystem_Sip ms_drbd_1:Master</div>
<div>colocation col_res_IPaddr2_Admin_res_Filesystem_Sip inf: res_IPaddr2_Admin res_Filesystem_Sip</div><div>colocation col_res_IPaddr2_Sip_res_Filesystem_Sip inf: res_IPaddr2_Sip res_Filesystem_Sip</div><div>colocation col_res_asterisk_Asterisk_res_IPaddr2_Asterisk inf: res_asterisk_Asterisk res_IPaddr2_Asterisk</div>
<div>colocation col_res_drbdlinks_Sip_res_IPaddr2_Asterisk inf: res_IPaddr2_Asterisk res_drbdlinks_Sip</div><div>colocation col_res_drbdlinks_Sip_res_IPaddr2_Sip inf: res_drbdlinks_Sip res_IPaddr2_Sip</div><div>colocation col_res_drbdlinks_Sip_res_dhcpd_Sip inf: res_dhcpd_Sip res_drbdlinks_Sip</div>
<div>colocation col_res_faxmodems_Sip_res_drbdlinks_Sip inf: res_faxmodems_Sip res_drbdlinks_Sip</div><div>colocation col_res_httpd_Sip_res_drbdlinks_Sip inf: res_httpd_Sip res_drbdlinks_Sip</div><div>colocation col_res_hylafax_Sip_res_drbdlinks_Sip inf: res_hylafax_Sip res_drbdlinks_Sip</div>
<div>colocation col_res_iaxmodem_Sip_res_drbdlinks_Sip inf: res_iaxmodem_Sip res_drbdlinks_Sip</div><div>colocation col_res_kamailio_Sip_res_IPaddr2_Asterisk inf: res_IPaddr2_Asterisk res_kamailio_Sip</div><div>colocation col_res_kamailio_Sip_res_drbdlinks_Sip inf: res_kamailio_Sip res_drbdlinks_Sip</div>
<div>colocation col_res_mysqld_Sip_res_drbdlinks_Sip inf: res_mysqld_Sip res_drbdlinks_Sip</div><div>colocation col_res_mysqld_Sip_res_kamailio_Sip inf: res_kamailio_Sip res_mysqld_Sip</div><div>colocation col_res_named_Sip_res_drbdlinks_Sip inf: res_named_Sip res_drbdlinks_Sip</div>
<div>colocation col_res_named_Sip_res_kamailio_Sip inf: res_kamailio_Sip res_named_Sip</div><div>colocation col_res_nfs_Sip_res_drbdlinks_Sip inf: res_nfs_Sip res_drbdlinks_Sip</div><div>colocation col_res_ntpd_Sip_res_drbdlinks_Sip inf: res_ntpd_Sip res_drbdlinks_Sip</div>
<div>colocation col_res_postfix_Sip_res_drbdlinks_Sip inf: res_postfix_Sip res_drbdlinks_Sip</div><div>order ord_ms_drbd_1_res_Filesystem_Sip inf: ms_drbd_1:promote res_Filesystem_Sip:start</div><div>order ord_res_Filesystem_Sip_res_IPaddr2_Admin inf: res_Filesystem_Sip res_IPaddr2_Admin</div>
<div>order ord_res_Filesystem_Sip_res_IPaddr2_Sip inf: res_Filesystem_Sip res_IPaddr2_Sip</div><div>order ord_res_IPaddr2_Asterisk_res_asterisk_Asterisk inf: res_IPaddr2_Asterisk res_asterisk_Asterisk</div><div>order ord_res_IPaddr2_Sip_res_drbdlinks_Sip inf: res_IPaddr2_Sip res_drbdlinks_Sip</div>
<div>order ord_res_drbdlinks_Sip_res_IPaddr2_Asterisk inf: res_drbdlinks_Sip res_IPaddr2_Asterisk</div><div>order ord_res_drbdlinks_Sip_res_dhcpd_Sip inf: res_drbdlinks_Sip res_dhcpd_Sip</div><div>order ord_res_drbdlinks_Sip_res_faxmodems_Sip inf: res_drbdlinks_Sip res_faxmodems_Sip</div>
<div>order ord_res_drbdlinks_Sip_res_httpd_Sip inf: res_drbdlinks_Sip res_httpd_Sip</div><div>order ord_res_drbdlinks_Sip_res_hylafax_Sip inf: res_drbdlinks_Sip res_hylafax_Sip</div><div>order ord_res_drbdlinks_Sip_res_iaxmodem_Sip inf: res_drbdlinks_Sip res_iaxmodem_Sip</div>
<div>order ord_res_drbdlinks_Sip_res_kamailio_Sip inf: res_drbdlinks_Sip res_kamailio_Sip</div><div>order ord_res_drbdlinks_Sip_res_mysqld_Sip inf: res_drbdlinks_Sip res_mysqld_Sip</div><div>order ord_res_drbdlinks_Sip_res_named_Sip inf: res_drbdlinks_Sip res_named_Sip</div>
<div>order ord_res_drbdlinks_Sip_res_nfs_Sip inf: res_drbdlinks_Sip res_nfs_Sip</div><div>order ord_res_drbdlinks_Sip_res_ntpd_Sip inf: res_drbdlinks_Sip res_ntpd_Sip</div><div>order ord_res_drbdlinks_Sip_res_postfix_Sip inf: res_drbdlinks_Sip res_postfix_Sip</div>
</div><div><div>order ord_res_kamailio_Sip_res_IPaddr2_Asterisk inf: res_kamailio_Sip res_IPaddr2_Asterisk</div><div>order ord_res_mysqld_Sip_res_kamailio_Sip inf: res_mysqld_Sip res_kamailio_Sip</div><div>order ord_res_named_Sip_res_kamailio_Sip inf: res_named_Sip res_kamailio_Sip</div>
<div>property $id="cib-bootstrap-options" \</div><div> expected-quorum-votes="2" \</div><div> stonith-enabled="false" \</div><div> dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \</div>
<div> no-quorum-policy="ignore" \</div><div> cluster-infrastructure="openais" \</div><div> last-lrm-refresh="1353415104"</div><div>rsc_defaults $id="rsc-options" \</div>
<div> resource-stickiness="100"</div></div><div><br></div><div><br></div><div>Regards,</div><div>Pedro Sousa</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div>
<div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div>