[Pacemaker] primitive resource start timeout ignored by monitor-operation

Rainer Maier rainer.maier at thalesgroup.com
Tue Apr 17 06:41:55 EDT 2012


hi,

this is my first post to this list, therefor i ask you to be lenient towards me.

my problem is, that i configured a primitive resource like this:


primitive p_fuseesb_cellx ocf:thales:fuseesb \
        params instance="cell1" fuseesb_home="/usr/lib/fuseesb" 
            javahome="/usr/lib/jdk1.6.0_31" \
        op monitor interval="60s" timeout="45s" \
        op start interval="0" timeout="45s" \
        op stop interval="0" timeout="20s"

Now when i start the resource from crm, it gets started, and immediately it gets
 stopped and restarted. this happens in a cycle every 1-2 seconds.

inside the corosync-log i get the following output:

Apr 17 10:48:46 c6 lrmd: [28224]: info: operation start[1538] on p_fuseesb_cellx
 for client 28227: pid 27751 exited with return code 0
Apr 17 10:48:46 c6 crmd: [28227]: info: process_lrm_event: LRM operation 
 p_fuseesb_cellx_start_0 (call=1538, rc=0, cib-update=1633, confirmed=true) ok
Apr 17 10:48:46 c6 crmd: [28227]: info: do_lrm_rsc_op: Performing 
 key=1:1017:0:084c0a4a-562e-46b2-bd13-df30802c2bd5 
 op=p_fuseesb_cellx_monitor_60000 )
Apr 17 10:48:46 c6 lrmd: [28224]: info: rsc:p_fuseesb_cellx monitor[1539] 
 (pid 27830)
Apr 17 10:48:46 c6 lrmd: [28224]: info: operation monitor[1539] on 
 p_fuseesb_cellx for client 28227: pid 27830 exited with return code 7
Apr 17 10:48:46 c6 crmd: [28227]: info: process_lrm_event: LRM operation 
 p_fuseesb_cellx_monitor_60000 (call=1539, rc=7, cib-update=1634, 
 confirmed=false) 
not running
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_ais_dispatch: Update 
 relayed from c7
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_local_callback: Expanded
 fail-count-p_fuseesb_cellx=value++ to 225
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_trigger_update: Sending flush
 op to all hosts for: fail-count-p_fuseesb_cellx (225)
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_perform_update: Sent update
 2420: fail-count-p_fuseesb_cellx=225
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_ais_dispatch: Update relayed
 from c7
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_trigger_update: Sending flush 
 op to all hosts for: last-failure-p_fuseesb_cellx (1334652551)
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_perform_update: Sent update 
 2422: last-failure-p_fuseesb_cellx=1334652551
Apr 17 10:48:46 c6 lrmd: [28224]: info: cancel_op: operation monitor[1539] 
 on p_fuseesb_cellx for client 28227, its parameters: CRM_meta_name=[monitor] 
crm_feature_set=[3.0.1] fuseesb_home=[/usr/lib/fuseesb] 
 CRM_meta_timeout=[45000] CRM_meta_interval=[60000] 
 javahome=[/usr/lib/jdk1.6.0_31] instance=[cell1]  
cancelled
Apr 17 10:48:46 c6 crmd: [28227]: info: do_lrm_rsc_op: Performing 
 key=2:1019:0:084c0a4a-562e-46b2-bd13-df30802c2bd5 op=p_fuseesb_cellx_stop_0 )
Apr 17 10:48:46 c6 lrmd: [28224]: info: rsc:p_fuseesb_cellx stop[1540] 
 (pid 27897)
Apr 17 10:48:46 c6 crmd: [28227]: info: process_lrm_event: LRM operation 
 p_fuseesb_cellx_monitor_60000 (call=1539, status=1, cib-update=0, 
 confirmed=true) 
Cancelled
Apr 17 10:48:46 c6 lrmd: [28224]: info: RA output: 
 (p_fuseesb_cellx:stop:stdout) Stop FUSE ESB: fuse-esb


from what i can see, the monitor-operation is started immediately after the 
start-operation. as the start-operation is not finished, the monitor detects 
that it's not running and therefore, the resource get's immediately stopped 
and restarted - the circle starts from the beginning.

what i don't understand is, why does pacemaker ignore the timeouts defined? 

regards
Rainer





More information about the Pacemaker mailing list