[Pacemaker] error with cib synchronisation on disk

Халезов Иван i.khalezov at rts.ru
Thu May 16 11:31:47 UTC 2013


On 16.05.2013 07:14, Andrew Beekhof wrote:
> On 15/05/2013, at 9:53 PM, Халезов Иван <i.khalezov at rts.ru> wrote:
>
>> Hello everyone!
>>
>> Some problems occured with synchronisation CIB configuration to disk.
>> I have this errors in pacemaker's logfile:
> What were the messages before this?
> Did it happen once or many times?
> At startup or while the cluster was running?

I had updated cluster configuration before, so there was some output 
about it in the logfile (not from the beginning here, because it is 
rather big):

May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <primitive 
id="Security_A" >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - 
<meta_attributes id="Security_A-meta_attributes" >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <nvpair 
id="Security_A-meta_attributes-target-role" name="target-role" 
value="Stopped" __crm_diff_marker__="r
emoved:top" />
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </meta_attributes>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </primitive>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <primitive 
id="Security_B" >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - 
<meta_attributes id="SPBEX_Security_B-meta_attributes" >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <nvpair 
id="Security_B-meta_attributes-target-role" name="target-role" 
value="Started" __crm_diff_marker__="removed:top" />
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </meta_attributes>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </primitive>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </group>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </resources>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </configuration>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </cib>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <cib 
epoch="496" num_updates="1" admin_epoch="0" 
validate-with="pacemaker-1.2" cib-last-written="Mon May 13 18:50:25 
2013" crm_feature_set="3.0.6" update-origin="iblade6.net.rts" 
update-client="cibadmin" have-quorum="1" dc-uuid="2130706433" >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <configuration >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <resources >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <group 
id="FAST_SENDERS" >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + 
<meta_attributes id="FAST_SENDERS-meta_attributes" 
__crm_diff_marker__="added:top" >
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <nvpair 
id="FAST_SENDERS-meta_attributes-target-role" name="target-role" 
value="Started" />
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </meta_attributes>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </group>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </resources>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </configuration>
May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </cib>
May 14 13:29:13 iblade6 cib[2848]:     info: cib_process_request: 
Operation complete: op cib_replace for section resources 
(origin=local/cibadmin/2, version=0.496.1): ok (rc=0)
May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start 
Trades_INCR_A#011(iblade6.net.rts)
May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start 
Trades_INCR_B#011(iblade6.net.rts)
May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start 
Security_A#011(iblade6.net.rts)
May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start 
Security_B#011(iblade6.net.rts)
May 14 13:29:13 iblade6 crmd[2853]:   notice: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
May 14 13:29:13 iblade6 crmd[2853]:     info: do_te_invoke: Processing 
graph 41 (ref=pe_calc-dc-1368523753-125) derived from 
/var/lib/pengine/pe-input-452.bz2
May 14 13:29:13 iblade6 crmd[2853]:     info: te_rsc_command: Initiating 
action 80: start Trades_INCR_A_start_0 on iblade6.net.rts (local)
May 14 13:29:13 iblade6 cluster:    error: validate_cib_digest: Digest 
comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 
(/var/lib/heartbeat/crm/cib.Zm249H), calculated 
bc160870924630b3907c8cb1c3128eee
May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Checksum of 
/var/lib/heartbeat/crm/cib.a024wF failed!  Configuration contents ignored!
May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Usually this is 
caused by manual changes, please refer to 
http://clusterlabs.org/wiki/FAQ#cib_changes_detected
May 14 13:29:13 iblade6 cluster:    error: crm_abort: 
write_cib_contents: Triggered fatal assert at io.c:662 : 
retrieveCib(tmp1, tmp2, FALSE) != NULL
May 14 13:29:13 iblade6 pengine[2852]:   notice: process_pe_message: 
Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2
May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: 
Disk write failed: status=134, signo=6, exitcode=0
May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: 
Disabling disk writes after write failure


It happened two times during last week. Both while the cluster was running.

>> May 14 13:29:13 iblade6 cluster:    error: validate_cib_digest: Digest comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 (/var/lib/heartbeat/crm/cib.Zm249H), calculated bc1
>> 60870924630b3907c8cb1c3128eee
>> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Checksum of /var/lib/heartbeat/crm/cib.a024wF failed!  Configuration contents ignored!
>> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Usually this is caused by manual changes, please refer to http://clusterlabs.org/wiki/FAQ#cib_changes_detected
>> May 14 13:29:13 iblade6 cluster:    error: crm_abort: write_cib_contents: Triggered fatal assert at io.c:662 : retrieveCib(tmp1, tmp2, FALSE) != NULL
>> May 14 13:29:13 iblade6 pengine[2852]:   notice: process_pe_message: Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2
>> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: Disk write failed: status=134, signo=6, exitcode=0
>> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: Disabling disk writes after write failure
>>
>>
>> I didn't find anything about it, at this link: http://clusterlabs.org/wiki/FAQ#cib_changes_detected
>>
>> What can be the reason of this error?
>> Why the checksum of a cib file can be wrong?
>> Is it a problem of a hdd, or pacemaker bug or something else? (there are no disk or filesystem errors in syslog)
>>
>> I had a pair of such incidents during the last week.
>>
>>
>> My cluster installation:  CentOS 6.4 x86_64, pacemaker 1.1.7, corosync 2.3.0
>>
>> Thank you in advance!
>>
>> Ivan Khalezov.
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Ivan Khalezov





More information about the Pacemaker mailing list