[Pacemaker] Possible bug?

Andrew Beekhof andrew at beekhof.net
Mon Sep 10 20:39:11 EDT 2012


On Mon, Sep 10, 2012 at 11:43 PM, Borislav Borisov
<borislav.v.borisov at gmail.com> wrote:
> Hi all,
>
> I am experiencing a very strange issue. On two test boxes I have a setup
> that should server NFS and iSCSI Targets (SCST).
>
> When I create couple of iSCSI Target groups, composed of Target/Lun, and I
> decide to remove one an error occurs:
>
>   Sep 10 15:24:40 Cluster-Server-1 cib: [48709]: WARN: Managed
> write_cib_contents process 54564 killed by signal 6 [SIGABRT - Abort].
>   Sep 10 15:24:40 Cluster-Server-1 cib: [48709]: ERROR: Managed
> write_cib_contents process 54564 dumped core
>   Sep 10 15:24:40 Cluster-Server-1 cib: [48709]: ERROR:
> cib_diskwrite_complete: Disk write failed: status=134, signo=6, exitcode=0
>   Sep 10 15:24:40 Cluster-Server-1 cib: [48709]: ERROR:
> cib_diskwrite_complete: Disabling disk writes after write failure
>
> I then executed 'killall -PIPE cib' on both boxes to force a reload. More
> resources were added, this time NFS ones, composed FS/ExportFS.
> When I delete any of those there is no problem, however, the second that I
> remove an iSCSI one the Disk writes are disabled with the error above.
>
> A core file gets generated when that problem occurs, but it did not help me
> much:
> Reading symbols from /usr/libexec/pacemaker/cib...done.
> [New LWP 54564]
>
> warning: Can't read pathname for load map: Input/output error.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/libexec/pacemaker/cib'.
> Program terminated with signal 6, Aborted.
> #0  0x00007fb3d07e7475 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) backtrace
> #0  0x00007fb3d07e7475 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fb3d07ea6f0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fb3d2578b2b in crm_abort (file=0x411323 "io.c", function=0x412170
> "write_cib_contents", line=662, assert_condition=0x412120 "retrieveCib(tmp1,
> tmp2, FALSE) != NULL", do_core=1, do_fork=0) at utils.c:1659
> #3  0x0000000000406d59 in write_cib_contents (p=0x0) at io.c:662
> #4  0x00007fb3d1511863 in TempProcessTrigger (ginfo=0x6637a0) at
> GSource.c:1792
> #5  0x00007fb3d1510822 in G_TRIG_dispatch (source=0x664960, callback=0,
> user_data=0x0) at GSource.c:1403
> #6  0x00007fb3cfef94a3 in g_main_context_dispatch () from
> /lib/libglib-2.0.so.0
> #7  0x00007fb3cfef9c80 in ?? () from /lib/libglib-2.0.so.0
> #8  0x00007fb3cfefa2f2 in g_main_loop_run () from /lib/libglib-2.0.so.0
> #9  0x000000000040e905 in cib_init () at main.c:561
> #10 0x000000000040df60 in main (argc=1, argv=0x7fff7f780ff8) at main.c:247
>
> Something appears to be very wrong and I just can't figure out what. Any
> help is appreciated.

Pacemaker creates a second cib process to write the contents to disk
after a change so that the 'real' process doesn't block.

After the process writes the cib to disk, we then try to read it back
again to verify that everything is sane.
What you're seeing here is that check failing for some reason.

Looking at the logs I see:

Sep 10 15:24:40 Cluster-Server-1 cib: [54564]: ERROR:
validate_cib_digest: Digest comparision failed: expected
83751b899e758f9b138d060ace084080 (/var/lib/heartbeat/crm/cib.ANRY1Q),
calculated bd97ef3df10846e783bd64059be77e45
Sep 10 15:24:40 Cluster-Server-1 cib: [54564]: ERROR: retrieveCib:
Checksum of /var/lib/heartbeat/crm/cib.uGGnOm failed!  Configuration
contents ignored!

Which is really strange.
I see a couple of changes in related areas since 1.1.7, perhaps one of
those will fix your issue.
(1.1.8 should be out today/tomorrow)

>
> Cheers.
>
> P.S. I have attached the hb_report.
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Pacemaker mailing list