[Pacemaker] WARNINGS and ERRORS on syslog after update to 1.1.7

Tue Jun 25 06:32:04 EDT 2013

On 25/06/2013, at 5:37 PM, Francesco Namuri <f.namuri at credires.it> wrote:

> Hi,
> after an update to the new debian stable, from pacemaker 1.0.9.1 to
> 1.1.7 I'm getting some strange errors on syslog:

Thats a hell of a jump there.
Can you attach /var/lib/pengine/pe-input-64.bz2 from SERVERNAME1 please?

I'll be able to see if its something we've already fixed.

> 
> Jun 25 09:20:01 SERVERNAME1 cib: [4585]: info: cib_stats: Processed 29 operations (344.00us average, 0% utilization) in the last 10min
> Jun 25 09:20:22 SERVERNAME1 lrmd: [4587]: info: operation monitor[8] on resDRBD:1 for client 4590: pid 19371 exited with return code 8
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_rsc_op: Operation monitor found resource resDRBD:1 active in master mode on SERVERNAME1
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: WARN: unpack_rsc_op: Processing failed op resSNORT:1_last_failure_0 on SERVERNAME1: not running (7)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_rsc_op: Operation monitor found resource resDRBD:0 active in master mode on SERVERNAME2
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: WARN: unpack_rsc_op: Processing failed op resSNORT:0_last_failure_0 on SERVERNAME2: not running (7)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: cloneSNORT can fail 999998 more times on SERVERNAME2 before being forced off
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: cloneSNORT can fail 999998 more times on SERVERNAME2 before being forced off
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: cloneSNORT can fail 999998 more times on SERVERNAME1 before being forced off
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: cloneSNORT can fail 999998 more times on SERVERNAME1 before being forced off
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: rsc_expand_action: Couldn't expand cloneDLM_demote_0
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: crm_abort: clone_update_actions_interleave: Triggered assert at clone.c:1245 : first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: clone_update_actions_interleave: No action found for demote in resDLM:1 (first)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: crm_abort: clone_update_actions_interleave: Triggered assert at clone.c:1245 : first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: clone_update_actions_interleave: No action found for demote in resDLM:0 (first)
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: do_te_invoke: Processing graph 2004 (ref=pe_calc-dc-1372144851-2079) derived from /var/lib/pengine/pe-input-64.bz2
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: process_pe_message: Transition 2004: PEngine Input stored in: /var/lib/pengine/pe-input-64.bz2
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: run_graph: ==== Transition 2004 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-64.bz2): Complete
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun 25 09:23:26 SERVERNAME1 lrmd: [4587]: info: rsc:resSNORTSAM:1 monitor[9] (pid 19862)
> Jun 25 09:23:27 SERVERNAME1 lrmd: [4587]: info: operation monitor[9] on resSNORTSAM:1 for client 4590: pid 19862 exited with return code 0
> Jun 25 09:25:20 SERVERNAME1 lrmd: [4587]: info: rsc:resDLM:0 monitor[11] (pid 20080)
> Jun 25 09:25:20 SERVERNAME1 lrmd: [4587]: info: operation monitor[11] on resDLM:0 for client 4590: pid 20080 exited with return code 0
> Jun 25 09:30:01 SERVERNAME1 cib: [4585]: info: cib_stats: Processed 31 operations (322.00us average, 0% utilization) in the last 10min
> 
> my config is:
> 
> node SERVERNAME2
> node SERVERNAME1
> primitive resDLM ocf:pacemaker:controld \
>        op monitor interval="120s" \
>        op start interval="0" timeout="90s" \
>        op stop interval="0" timeout="100s"
> primitive resDRBD ocf:linbit:drbd \
>        params drbd_resource="SERVERNAME2CL" \
>        operations $id="resDRBD-operation" \
>        op monitor interval="20" role="Master" timeout="20" \
>        op monitor interval="30" role="Slave" timeout="20" \
>        op start interval="0" timeout="240s" \
>        op stop interval="0" timeout="100s"
> primitive resFS ocf:heartbeat:Filesystem \
>        params device="/dev/drbd0" directory="/srv" fstype="ocfs2" \
>        op monitor interval="120s" timeout="40s" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> primitive resO2CB ocf:pacemaker:o2cb \
>        op monitor interval="120s" \
>        op start interval="0" timeout="90s" \
>        op stop interval="0" timeout="100s"
> primitive resSNORT lsb:snort \
>        op monitor interval="150s" timeout="40s" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> primitive resSNORTSAM lsb:snortsam \
>        op monitor interval="180s" timeout="40s" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> ms msDRBD resDRBD \
>        meta resource-stickines="100" notify="true" master-max="2" interleave="true"
> clone cloneDLM resDLM \
>        meta globally-unique="false" interleave="true" target-role="Started"
> clone cloneFS resFS \
>        meta interleave="true" ordered="true" target-role="Started"
> clone cloneO2CB resO2CB \
>        meta globally-unique="false" interleave="true" target-role="Started"
> clone cloneSNORT resSNORT \
>        meta interleave="true" target-role="Started"
> clone cloneSNORTSAM resSNORTSAM \
>        meta interleave="true" target-role="Started"
> colocation colDLMDRBD inf: cloneDLM msDRBD:Master
> colocation colFSO2CB inf: cloneFS cloneO2CB
> colocation colO2CBDLM inf: cloneO2CB cloneDLM
> order ordDLMO2CB 0: cloneDLM cloneO2CB
> order ordDRBDDLM 0: msDRBD:promote cloneDLM
> order ordO2CBFS 0: cloneO2CB cloneFS
> order ordSNORT inf: cloneFS cloneSNORT
> property $id="cib-bootstrap-options" \
>        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore" \
>        last-lrm-refresh="1371563001"
> 
> Thanks in advance for any suggestion.
> 
> Ciao,
> francesco
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org