[Pacemaker] Failed in restart of Corosync.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Mon Oct 19 02:05:08 UTC 2009


I understand that a combination is not official in Corosync and Pacemaker.
However, I contributed it because I thought that it was important that I reported a problem.

I started next combination Corosync.(on Redhat5.4(x86))

* corosync trunk 2530
* Cluster-Resource-Agents-6d652f7cf9d8
* Reusable-Cluster-Components-4edc8f99701c
* Pacemaker-1-0-de2a3778ace7

I stopped service(corosync) next.
But, I did KILL of a process because a process of Pacemaker did not stop well.

[root at rh54-1 ~]# service Corosync stop
Stopping Corosync Cluster Engine (corosync):               [  OK  ]
Waiting for services to unload:                            [  OK  ]
[root at rh54-1 ~]# ps -ef |grep coro
root      5263  4617  0 10:54 pts/0    00:00:00 grep coro
[root at rh54-1 ~]# ps -ef |grep heartbeat 
root      4882     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/stonithd
500       4883     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/cib
root      4884     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/lrmd
500       4885     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/attrd
500       4886     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/pengine
500       4887     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/crmd
root      5278  4617  0 10:54 pts/0    00:00:00 grep heartbeat
[root at rh54-1 ~]# kill -9 4882 4883 4884 4885 4886 4887
[root at rh54-1 ~]# ps -ef |grep heartbeat 
root      5310  4617  0 10:54 pts/0    00:00:00 grep heartbeat


I started Corosync again.
But, a cib process of Pacemaker seems not to be able to communicate with Corosync.

Oct 19 10:55:29 rh54-1 cib: [5354]: info: startCib: CIB Initialization completed successfully
Oct 19 10:55:29 rh54-1 cib: [5354]: info: crm_cluster_connect: Connecting to OpenAIS
Oct 19 10:55:29 rh54-1 cib: [5354]: info: init_ais_connection: Creating connection to our AIS plugin
Oct 19 10:55:30 rh54-1 mgmtd: [5359]: info: login to cib live: 1, ret:-10
Oct 19 10:55:30 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB service:
connection failed
Oct 19 10:55:30 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration 1
times... pause and retry
Oct 19 10:55:30 rh54-1 crmd: [5358]: info: crmd_init: Starting crmd's mainloop
Oct 19 10:55:31 rh54-1 mgmtd: [5359]: info: login to cib live: 2, ret:-10
Oct 19 10:55:32 rh54-1 mgmtd: [5359]: info: login to cib live: 3, ret:-10
Oct 19 10:55:32 rh54-1 crmd: [5358]: info: crm_timer_popped: Wait Timer (I_NULL) just popped!
Oct 19 10:55:33 rh54-1 mgmtd: [5359]: info: login to cib live: 4, ret:-10
Oct 19 10:55:33 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB service:
connection failed
Oct 19 10:55:33 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration 2
times... pause and retry


On this account it does not start definitely even if Pacemaker waits till when.

As for the problem, Corosync seems to fail in poll(?) somehow or other.
However, possibly the cause may depend on the failure of the first stop.

[root at rh54-1 ~]# ps -ef |grep coro
root      5348     1  0 10:55 ?        00:00:00 /usr/sbin/corosync
root      5400  4617  0 10:56 pts/0    00:00:00 grep coro
[root at rh54-1 ~]# strace -p 5348
Process 5348 attached - interrupt to quit
futex(0x805c8c0, FUTEX_WAIT_PRIVATE, 2, NULL

Is there a method with the avoidance of this phenomenon what it is?
Can I evade a problem by deleting some file?

* I hope it so that a combination of Corosync and Pacemaker becomes the practical use early.

Best Regards,
Hideo Yamauchi.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rh54-1-message.zip
Type: application/x-zip-compressed
Size: 9947 bytes
Desc: 3324550729-rh54-1-message.zip
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091019/43d9cf75/attachment-0003.bin>

More information about the Pacemaker mailing list