[Pacemaker] known problem with corosync 1.4.1 on centos64 ?

andreas graeper agraeper at googlemail.com
Fri Jun 21 04:56:29 EDT 2013


hi,
when only i remove or add resources, corosync starts to eat up all cpu.
drbd 8.4.1 (build from source)
corosync 1.4.1
pacemaker 1.1.8
crmsh 1.2.5 (this from extra repo, cause crm is missing in pacemaker-cli ?!
   but it is not reason for trouble ! i use pcs except crm_mon )
pcs 0.9.26

when
 pcs resource stop xxx
 pcs resource delete xxx
i often need to cleanup to remove the 'failed actions' ( monitor of that
resource xxx )
when a resource gets stopped, the monitor should get cancelled ? and every
old failed action should be forgotten ?

but other resources gets stopped and restarted, too
and their monitors fail with timeout or unknown error, though crm_mon shows
them running / started.

now drbd on master was stopped, corosync 100% cpu
and other node does not take over:
 drbd:0 (slave n1) unmanaged FAILED

drbd:
  connected primary diskless (n2, corosync stopped)
  connected secondary uptodate (n1, corosync ok ?)

  when drbd is stopped ( i would excpect similiar to : )
    1) primary -> secondary
    2) disconnect     => cs:standalone
    3) detach           => ds:diskless
  what went wrong ?

what can i do ?
is there a chance that centos63, corosync 1.4.1, pacemaker 1.1.7 is running
more stable ?

when two nodes n1(master) n2(slave) and on n2 corosync is stopped. then in
cib
 n2.standby="off"
does a `corosync stop` not report n2 as offline / standby ?
what can be the reason for lrmd and pengine still running after corosync
was stopped,
pacemaker ( ? parent of lrmd and pengine ) does not run anymore ?



thanks in advance
andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130621/bd0b3fe5/attachment-0002.html>


More information about the Pacemaker mailing list