[Pacemaker] error with cib synchronisation on disk

Andrew Beekhof andrew at beekhof.net
Mon May 20 01:01:13 EDT 2013


On 16/05/2013, at 9:31 PM, Халезов Иван <i.khalezov at rts.ru> wrote:

> On 16.05.2013 07:14, Andrew Beekhof wrote:
>> On 15/05/2013, at 9:53 PM, Халезов Иван <i.khalezov at rts.ru> wrote:
>> 
>>> Hello everyone!
>>> 
>>> Some problems occured with synchronisation CIB configuration to disk.
>>> I have this errors in pacemaker's logfile:
>> What were the messages before this?
>> Did it happen once or many times?
>> At startup or while the cluster was running?
> 
> I had updated cluster configuration before, so there was some output about it in the logfile (not from the beginning here, because it is rather big):

I'm guessing some whitespace crept into the configuration.
We've had problems with that in the past, https://github.com/beekhof/pacemaker/commit/c2550cbd33a3b2ab7efcd6ef516ba124fbae9a81 is one patch that you dont have for example.

> 
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <primitive id="Security_A" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <meta_attributes id="Security_A-meta_attributes" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <nvpair id="Security_A-meta_attributes-target-role" name="target-role" value="Stopped" __crm_diff_marker__="r
> emoved:top" />
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </meta_attributes>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </primitive>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <primitive id="Security_B" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <meta_attributes id="SPBEX_Security_B-meta_attributes" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <nvpair id="Security_B-meta_attributes-target-role" name="target-role" value="Started" __crm_diff_marker__="removed:top" />
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </meta_attributes>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </primitive>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </group>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </resources>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </configuration>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </cib>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <cib epoch="496" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Mon May 13 18:50:25 2013" crm_feature_set="3.0.6" update-origin="iblade6.net.rts" update-client="cibadmin" have-quorum="1" dc-uuid="2130706433" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <configuration >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <resources >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <group id="FAST_SENDERS" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <meta_attributes id="FAST_SENDERS-meta_attributes" __crm_diff_marker__="added:top" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <nvpair id="FAST_SENDERS-meta_attributes-target-role" name="target-role" value="Started" />
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </meta_attributes>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </group>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </resources>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </configuration>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </cib>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib_process_request: Operation complete: op cib_replace for section resources (origin=local/cibadmin/2, version=0.496.1): ok (rc=0)
> May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start Trades_INCR_A#011(iblade6.net.rts)
> May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start Trades_INCR_B#011(iblade6.net.rts)
> May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start Security_A#011(iblade6.net.rts)
> May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start Security_B#011(iblade6.net.rts)
> May 14 13:29:13 iblade6 crmd[2853]:   notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> May 14 13:29:13 iblade6 crmd[2853]:     info: do_te_invoke: Processing graph 41 (ref=pe_calc-dc-1368523753-125) derived from /var/lib/pengine/pe-input-452.bz2
> May 14 13:29:13 iblade6 crmd[2853]:     info: te_rsc_command: Initiating action 80: start Trades_INCR_A_start_0 on iblade6.net.rts (local)
> May 14 13:29:13 iblade6 cluster:    error: validate_cib_digest: Digest comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 (/var/lib/heartbeat/crm/cib.Zm249H), calculated bc160870924630b3907c8cb1c3128eee
> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Checksum of /var/lib/heartbeat/crm/cib.a024wF failed!  Configuration contents ignored!
> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Usually this is caused by manual changes, please refer to http://clusterlabs.org/wiki/FAQ#cib_changes_detected
> May 14 13:29:13 iblade6 cluster:    error: crm_abort: write_cib_contents: Triggered fatal assert at io.c:662 : retrieveCib(tmp1, tmp2, FALSE) != NULL
> May 14 13:29:13 iblade6 pengine[2852]:   notice: process_pe_message: Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2
> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: Disk write failed: status=134, signo=6, exitcode=0
> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: Disabling disk writes after write failure
> 
> 
> It happened two times during last week. Both while the cluster was running.
> 
>>> May 14 13:29:13 iblade6 cluster:    error: validate_cib_digest: Digest comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 (/var/lib/heartbeat/crm/cib.Zm249H), calculated bc1
>>> 60870924630b3907c8cb1c3128eee
>>> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Checksum of /var/lib/heartbeat/crm/cib.a024wF failed!  Configuration contents ignored!
>>> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Usually this is caused by manual changes, please refer to http://clusterlabs.org/wiki/FAQ#cib_changes_detected
>>> May 14 13:29:13 iblade6 cluster:    error: crm_abort: write_cib_contents: Triggered fatal assert at io.c:662 : retrieveCib(tmp1, tmp2, FALSE) != NULL
>>> May 14 13:29:13 iblade6 pengine[2852]:   notice: process_pe_message: Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2
>>> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: Disk write failed: status=134, signo=6, exitcode=0
>>> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: Disabling disk writes after write failure
>>> 
>>> 
>>> I didn't find anything about it, at this link: http://clusterlabs.org/wiki/FAQ#cib_changes_detected
>>> 
>>> What can be the reason of this error?
>>> Why the checksum of a cib file can be wrong?
>>> Is it a problem of a hdd, or pacemaker bug or something else? (there are no disk or filesystem errors in syslog)
>>> 
>>> I had a pair of such incidents during the last week.
>>> 
>>> 
>>> My cluster installation:  CentOS 6.4 x86_64, pacemaker 1.1.7, corosync 2.3.0
>>> 
>>> Thank you in advance!
>>> 
>>> Ivan Khalezov.
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> Ivan Khalezov
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list