[Pacemaker] Failed in restart of Corosync.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Mon Oct 19 05:12:52 UTC 2009


Hi Steven,

All right.
Thank you.

Best Regards,
Hideo Yamauchi.

--- Steven Dake <sdake at redhat.com> wrote:

> This bug is reported and we are working on a solution.
> 
> Regards
> -steve
> 
> On Mon, 2009-10-19 at 11:05 +0900, renayama19661014 at ybb.ne.jp wrote:
> > Hi,
> > 
> > I understand that a combination is not official in Corosync and Pacemaker.
> > However, I contributed it because I thought that it was important that I reported a problem.
> > 
> > I started next combination Corosync.(on Redhat5.4(x86))
> > 
> > * corosync trunk 2530
> > * Cluster-Resource-Agents-6d652f7cf9d8
> > * Reusable-Cluster-Components-4edc8f99701c
> > * Pacemaker-1-0-de2a3778ace7
> > 
> > I stopped service(corosync) next.
> > But, I did KILL of a process because a process of Pacemaker did not stop well.
> > 
> > ------------------------------------------------------------------------------------
> > [root at rh54-1 ~]# service Corosync stop
> > Stopping Corosync Cluster Engine (corosync):               [  OK  ]
> > Waiting for services to unload:                            [  OK  ]
> > [root at rh54-1 ~]# ps -ef |grep coro
> > root      5263  4617  0 10:54 pts/0    00:00:00 grep coro
> > [root at rh54-1 ~]# ps -ef |grep heartbeat 
> > root      4882     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/stonithd
> > 500       4883     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/cib
> > root      4884     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/lrmd
> > 500       4885     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/attrd
> > 500       4886     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/pengine
> > 500       4887     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/crmd
> > root      5278  4617  0 10:54 pts/0    00:00:00 grep heartbeat
> > [root at rh54-1 ~]# kill -9 4882 4883 4884 4885 4886 4887
> > [root at rh54-1 ~]# ps -ef |grep heartbeat 
> > root      5310  4617  0 10:54 pts/0    00:00:00 grep heartbeat
> > 
> > ------------------------------------------------------------------------------------
> > 
> > I started Corosync again.
> > But, a cib process of Pacemaker seems not to be able to communicate with Corosync.
> > 
> > 
> > ------------------------------------------------------------------------------------
> > Oct 19 10:55:29 rh54-1 cib: [5354]: info: startCib: CIB Initialization completed successfully
> > Oct 19 10:55:29 rh54-1 cib: [5354]: info: crm_cluster_connect: Connecting to OpenAIS
> > Oct 19 10:55:29 rh54-1 cib: [5354]: info: init_ais_connection: Creating connection to our AIS
> plugin
> > Oct 19 10:55:30 rh54-1 mgmtd: [5359]: info: login to cib live: 1, ret:-10
> > Oct 19 10:55:30 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB
> service:
> > connection failed
> > Oct 19 10:55:30 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration
> 1
> > times... pause and retry
> > Oct 19 10:55:30 rh54-1 crmd: [5358]: info: crmd_init: Starting crmd's mainloop
> > Oct 19 10:55:31 rh54-1 mgmtd: [5359]: info: login to cib live: 2, ret:-10
> > Oct 19 10:55:32 rh54-1 mgmtd: [5359]: info: login to cib live: 3, ret:-10
> > Oct 19 10:55:32 rh54-1 crmd: [5358]: info: crm_timer_popped: Wait Timer (I_NULL) just popped!
> > Oct 19 10:55:33 rh54-1 mgmtd: [5359]: info: login to cib live: 4, ret:-10
> > Oct 19 10:55:33 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB
> service:
> > connection failed
> > Oct 19 10:55:33 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration
> 2
> > times... pause and retry
> > 
> > ------------------------------------------------------------------------------------
> > 
> > On this account it does not start definitely even if Pacemaker waits till when.
> > 
> > As for the problem, Corosync seems to fail in poll(?) somehow or other.
> > However, possibly the cause may depend on the failure of the first stop.
> > 
> > ------------------------------------------------------------------------------------
> > [root at rh54-1 ~]# ps -ef |grep coro
> > root      5348     1  0 10:55 ?        00:00:00 /usr/sbin/corosync
> > root      5400  4617  0 10:56 pts/0    00:00:00 grep coro
> > [root at rh54-1 ~]# strace -p 5348
> > Process 5348 attached - interrupt to quit
> > futex(0x805c8c0, FUTEX_WAIT_PRIVATE, 2, NULL
> > ------------------------------------------------------------------------------------
> > 
> > Is there a method with the avoidance of this phenomenon what it is?
> > Can I evade a problem by deleting some file?
> > 
> > * I hope it so that a combination of Corosync and Pacemaker becomes the practical use early.
> > 
> > Best Regards,
> > Hideo Yamauchi.
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 





More information about the Pacemaker mailing list