[Pacemaker] pingd process dies for no reason

Patrik.Rapposch at knapp.com Patrik.Rapposch at knapp.com
Fri Jan 7 08:56:03 EST 2011


Greetings,

we have a problem, that the ping daemon dies for no reason and we can't 
find why this happened. 

we use following versions on SLES 11.1:

libpacemaker3-1.1.2-0.6.1
pacemaker-mgmt-2.0.0-0.3.10
pacemaker-mgmt-client-2.0.0-0.3.10
drbd-pacemaker-8.3.8.1-0.2.9
libpacemaker-devel-1.1.2-0.6.1
pacemaker-1.1.2-0.6.1
pacemaker-mgmt-devel-2.0.0-0.3.10
libcorosync4-1.2.6-0.2.2
corosync-1.2.6-0.2.2
libcorosync-devel-1.2.6-0.2.2

here is the important part of the log trace: 
"
Jan  5 08:40:30 node2 lrmd: [5990]: info: rsc:OSR_IP:46535: monitor
Jan  5 08:40:30 node2 lrmd: [5990]: info: rsc:Cluster_IP:46533: monitor
Jan  5 08:40:33 node2 lrmd: [5990]: WARN: pingd:0:monitor process (PID 
23937) timed out (try 1).  Killing with signal SIGTERM (15).
Jan  5 08:40:33 node2 lrmd: [5990]: WARN: operation monitor[48559] on 
ocf::ping::pingd:0 for client 5993, its parameters: CRM_meta_clone=[0] 
host_list=[xxx.xxx.xxx.xxx] CRM_meta_clone_node_max=[1] 
CRM_meta_clone_max=[2] CRM_meta_notify=[false] dampen=[5s] 
CRM_meta_globally_unique=[false] crm_feature_set=[3.0.2] multiplier=[100] 
CRM_meta_name=[monitor] CRM_meta_interval=[15000] CRM_meta_timeout=[5000] 
: pid [23937] timed out
Jan  5 08:40:33 node2 crmd: [5993]: ERROR: process_lrm_event: LRM 
operation pingd:0_monitor_15000 (48559) Timed Out (timeout=5000ms)
Jan  5 08:40:33 node2 crmd: [5993]: WARN: update_failcount: Updating 
failcount for pingd:0 on node2 after failed monitor: rc=-2 
(update=value++, time=1294213233)
Jan  5 08:40:35 node2 pengine: [5992]: notice: unpack_config: On loss of 
CCM Quorum: Ignore
Jan  5 08:40:35 node2 pengine: [5992]: WARN: unpack_rsc_op: Processing 
failed op drbd_r0:1_promote_0 on node1: unknown exec error (-2)
Jan  5 08:40:35 node2 pengine: [5992]: WARN: unpack_rsc_op: Processing 
failed op pingd:0_monitor_15000 on node2: unknown exec error (-2)
Jan  5 08:40:35 node2 pengine: [5992]: notice: clone_print:  Clone Set: 
pingdclone [pingd]
Jan  5 08:40:35 node2 pengine: [5992]: notice: native_print:      pingd:0 
(ocf::pacemaker:ping):  Started node2 FAILED
Jan  5 08:40:35 node2 pengine: [5992]: notice: short_print:      Started: 
[ node1 ]"

the ressource is configured in following way:
<clone id="pingdclone">
        <meta_attributes id="pingdclone-meta_attributes">
          <nvpair id="pingdclone-meta_attributes-globally-unique" 
name="globally-unique" value="false"/>
        </meta_attributes>
        <primitive class="ocf" id="pingd" provider="pacemaker" 
type="ping">
          <instance_attributes id="pingd-instance_attributes">
            <nvpair id="pingd-instance_attributes-host_list" 
name="host_list" value="xxx.xxx.xxx.xxx"/>
            <nvpair id="pingd-instance_attributes-multiplier" 
name="multiplier" value="100"/>
            <nvpair id="nvpair-96877c9e-2825-4d7d-997b-944652f89584" 
name="dampen" value="5s"/>
          </instance_attributes>
          <operations>
            <op id="pingd-monitor-15s" interval="15s" name="monitor" 
timeout="5s"/>
          </operations>
        </primitive>
      </clone>

thx for your help in advance.

Mit freundlichen Grüßen / Best Regards

Patrik Rapposch, BSc
System Administration

KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria 
Phone: +43 3842 805-915
Fax: +43 3842 82930-500
patrik.rapposch at knapp.com 
www.KNAPP.com 

Commercial register number: FN 138870x
Commercial register court: Leoben

The information in this e-mail (including any attachment) is confidential 
and intended to be for the use of the addressee(s) only. If you have 
received the e-mail by mistake, any disclosure, copy, distribution or use 
of the contents of the e-mail is prohibited, and you must delete the 
e-mail from your system. As e-mail can be changed electronically KNAPP 
assumes no responsibility for any alteration to this e-mail or its 
attachments. KNAPP has taken every reasonable precaution to ensure that 
any attachment to this e-mail has been swept for virus. However, KNAPP 
does not accept any liability for damage sustained as a result of such 
attachment being virus infected and strongly recommend that you carry out 
your own virus check before opening any attachment.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110107/6b188648/attachment.html>


More information about the Pacemaker mailing list