[Pacemaker] 4th node does not return to cluster

Gabriel Gomiz ggomiz at cooperativaobrera.com.ar
Tue Aug 16 18:27:14 EDT 2011


Hi to all... :)

We are experiencing some difficulties with a pacemaker 4 node cluster. 3 nodes are ok but a 4th 
node, after some corosync failures (with core dumps) and pacemaker restarts included, does not 
returns to cluster.

In the other 3 nodes the 4th appears online, but in the 4th node there is a empty cib when I display 
crm.

Something weird in the logs is this kind of messages:

Aug 16 19:07:15 lorien.cooperativaobrera.com.ar cib: [28120]: WARN: cib_peer_callback: Discarding 
cib_modify message (421) from mordor.cooperativaobrera.com.ar: not in our membership

It seems as the 4th node is not considering itself as a member of the cluster. How can I rejoin the 
member again?

Any help you cah give me will be highly appreciated.

Many thanks in advance

PD: If you need any additional logs, tests I can make, etc. I'm willing to make it.

-----

DATA:

OS is CENTOS 6.0 64 bits
PACEMAKER version 1.1.5
COROSYNC 1.2.3-21

NODE 1:

[DB1] gandalf # crm_mon -1
============
Last updated: Tue Aug 16 19:21:05 2011
Stack: openais
Current DC: gandalf.cooperativaobrera.com.ar - partition with quorum
Version: 1.1.5-1.1.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
4 Nodes configured, 4 expected votes
1 Resources configured.
============

Online: [ isildur.cooperativaobrera.com.ar gandalf.cooperativaobrera.com.ar 
mordor.cooperativaobrera.com.ar lorien.cooperativaobrera.com.ar ]

  Resource Group: dashboard
      fs_dashboard       (ocf::heartbeat:Filesystem):    Started isildur.cooperativaobrera.com.ar
      ip_dashboard       (ocf::heartbeat:IPaddr):        Started isildur.cooperativaobrera.com.ar
      srv_httpd_dashboard        (lsb:httpd.dashboard):  Started isildur.cooperativaobrera.com.ar
      srv_dashjobs       (lsb:dashjobs): Started isildur.cooperativaobrera.com.ar

NODE 2:

[DB2] isildur # crm_mon -1
============
Last updated: Tue Aug 16 19:21:28 2011
Stack: openais
Current DC: gandalf.cooperativaobrera.com.ar - partition with quorum
Version: 1.1.5-1.1.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
4 Nodes configured, 4 expected votes
1 Resources configured.
============

Online: [ isildur.cooperativaobrera.com.ar gandalf.cooperativaobrera.com.ar 
mordor.cooperativaobrera.com.ar lorien.cooperativaobrera.com.ar ]

  Resource Group: dashboard
      fs_dashboard       (ocf::heartbeat:Filesystem):    Started isildur.cooperativaobrera.com.ar
      ip_dashboard       (ocf::heartbeat:IPaddr):        Started isildur.cooperativaobrera.com.ar
      srv_httpd_dashboard        (lsb:httpd.dashboard):  Started isildur.cooperativaobrera.com.ar
      srv_dashjobs       (lsb:dashjobs): Started isildur.cooperativaobrera.com.ar

NODE 3:

[VM1] mordor # crm_mon -1
============
Last updated: Tue Aug 16 19:21:40 2011
Stack: openais
Current DC: gandalf.cooperativaobrera.com.ar - partition with quorum
Version: 1.1.5-1.1.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
4 Nodes configured, 4 expected votes
1 Resources configured.
============

Online: [ isildur.cooperativaobrera.com.ar gandalf.cooperativaobrera.com.ar 
mordor.cooperativaobrera.com.ar lorien.cooperativaobrera.com.ar ]

  Resource Group: dashboard
      fs_dashboard       (ocf::heartbeat:Filesystem):    Started isildur.cooperativaobrera.com.ar
      ip_dashboard       (ocf::heartbeat:IPaddr):        Started isildur.cooperativaobrera.com.ar
      srv_httpd_dashboard        (lsb:httpd.dashboard):  Started isildur.cooperativaobrera.com.ar
      srv_dashjobs       (lsb:dashjobs): Started isildur.cooperativaobrera.com.ar

NODE 4:

[VM2] lorien # crm_mon -1
============
Last updated: Tue Aug 16 19:21:54 2011
Current DC: NONE
0 Nodes configured, unknown expected votes
0 Resources configured.
============

LOGS ON NODE 4:

<attached>

CONFIG COROSYNC (NODE 4, other nodes are the same but changing bindnetaddr):

compatibility: whitetank

totem {
         version: 2
         secauth: off
         threads: 0
         interface {
                 ringnumber: 0
                 bindnetaddr: 192.168.238.43
                 mcastaddr: 226.94.2.1
                 mcastport: 5405
         }
}

logging {
         fileline: off
         to_stderr: no
         to_logfile: yes
         to_syslog: yes
         logfile: /var/log/cluster/corosync.log
         debug: off
         timestamp: on
         logger_subsys {
                 subsys: AMF
                 debug: off
         }
}

amf {
         mode: disabled
}

service {
         # Load the Pacemaker Cluster Resource Manager
         name: pacemaker
         ver:  1
}

-- 
       .^.    Lic. Gabriel Gomiz - Red Hat Certified Engineer (RHCE)
       /V\    Jefe de Sistemas - Administrador Red y Servidores
      // \\   Gerencia de Sistemas - Cooperativa Obrera Ltda.
     /(   )\  Tel (0291) 456-0084
      ^^-^^   s/Window[$s]/LINUX!!/g or die;

PGP: http://admin.cooperativaobrera.com.ar/pgp/ggomiz.txt

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: corosync.log
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110816/fd8c4605/attachment.log>


More information about the Pacemaker mailing list