[Pacemaker] Installation problems

Erich Weiler weiler at soe.ucsc.edu
Sun Mar 7 11:33:41 EST 2010


Hi Y'all,

I'm having some issues getting things running on a stock CentOS 5.4 
install, and I was hoping someone could point me in the right direction...

Through the epel and clusterlabs repos that are referenced in the wiki, 
I installed:

corosync-1.2.0-1.el5
openais-1.1.0-1.el5
pacemaker-1.0.7-4.el5
(and all dependencies, via yum)

and it all installed fine, according to yum.  I installed 
/etc/corosync/corosync.conf as follows:

-----
# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
        user:   root
        group:  root
}

totem {
        version: 2

        # How long before declaring a token lost (ms)
        token:          5000

        # How many token retransmits before forming a new configuration
        token_retransmits_before_loss_const: 20

        # How long to wait for join messages in the membership protocol (ms)
        join:           1000

        # How long to wait for consensus to be achieved before starting 
a new round of membership configuration (ms)
        consensus:      7500

        # Turn off the virtual synchrony filter
        vsftype:        none

        # Number of messages that may be sent by one processor on 
receipt of the token
        max_messages:   20

        # Disable encryption
        secauth:        off

        # How many threads to use for encryption/decryption
        threads:        0

        # Limit generated nodeids to 31-bits (positive signed integers)
        clear_node_high_bit: yes

        # Optionally assign a fixed node id (integer)
        # nodeid:         1234
        interface {
                ringnumber: 0
bindnetaddr: 10.1.0.255
mcastaddr: 226.94.1.90
mcastport: 4000
        }
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        to_syslog: yes
        logfile: /var/log/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}

amf {
        mode: disabled
}

service {
        # Load the Pacemaker Cluster Resource Manager
        name: pacemaker
        ver:  0
}
-----

Then I tried:

# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]

but then when I run crm_mon, it hangs here:

"Attempting connection to the cluster...."

and nothing happens.  A 'ps' shows corosync in a weird state:

[root at server ~]# ps -afe | grep coro
root     12942     1  0 08:20 ?        00:00:00 corosync
root     12947 12942  0 08:20 ?        00:00:00 [corosync] <defunct>
root     12955 12858  0 08:20 pts/0    00:00:00 grep coro

I also tried starting corosync via '/etc/init.d/openais start' after 
changing the line in the /etc/init.d/openais script:

export 
COROSYNC_DEFAULT_CONFIG_IFACE="openaisserviceenableexperimental:corosync_parser"

and it seems to start, but crm_mon still can't connect and I still get 
"Attempting connection to the cluster...." and corosync is in a defunct 
state.  Has anyone else had this problem?  Are the rpms from 
epel/clusterlabs not jiving with each other in some way perhaps?

Here is a clip from /var/log/corosync.log:

Mar 07 08:20:04 corosync [MAIN  ] Corosync Cluster Engine ('1.2.0'): 
started and ready to provide service.
Mar 07 08:20:04 corosync [MAIN  ] Corosync built-in features: nss rdma
Mar 07 08:20:04 corosync [MAIN  ] Successfully read main configuration 
file '/etc/corosync/corosync.conf'.
Mar 07 08:20:04 corosync [TOTEM ] Initializing transport (UDP/IP).
Mar 07 08:20:04 corosync [TOTEM ] Initializing transmit/receive 
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 07 08:20:04 corosync [MAIN  ] Compatibility mode set to whitetank. 
Using V1 and V2 of the synchronization engine.
Mar 07 08:20:04 corosync [TOTEM ] The network interface [10.1.1.84] is 
now up.
Mar 07 08:20:04 corosync [pcmk  ] info: process_ais_conf: Reading configure
Mar 07 08:20:04 corosync [pcmk  ] info: config_find_init: Local handle: 
5650605097994944514 for logging
Mar 07 08:20:04 corosync [pcmk  ] info: config_find_next: Processing 
additional logging options...
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Found 'off' for 
option: debug
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'off' for option: to_file
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'daemon' for option: syslog_facility
Mar 07 08:20:04 corosync [pcmk  ] info: config_find_init: Local handle: 
2730409743423111171 for service
Mar 07 08:20:04 corosync [pcmk  ] info: config_find_next: Processing 
additional service options...
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'pcmk' for option: clustername
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'no' for option: use_logd
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'no' for option: use_mgmtd
Mar 07 08:20:04 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
Mar 07 08:20:04 corosync [pcmk  ] Logging: Initialized pcmk_startup
Mar 07 08:20:04 corosync [pcmk  ] info: pcmk_startup: Maximum core file 
size is: 18446744073709551615
Mar 07 08:20:04 corosync [pcmk  ] ERROR: pcmk_startup: Child 12947 
spawned to record non-fatal assertion failure line 544: pwentry != NULL
Mar 07 08:20:04 corosync [pcmk  ] ERROR: pcmk_startup: Cluster user 
hacluster does not exist
Mar 07 08:20:04 corosync [SERV  ] Service engine loaded: Pacemaker 
Cluster Manager 1.0.7
Mar 07 08:20:04 corosync [SERV  ] Service engine loaded: corosync 
extended virtual synchrony service
Mar 07 08:20:04 corosync [SERV  ] Service engine loaded: corosync 
configuration service
Mar 07 08:20:04 corosync [SERV  ] Service engine loaded: corosync 
cluster closed process group service v1.01
Mar 07 08:20:04 corosync [SERV  ] Service engine loaded: corosync 
cluster config database access v1.01
Mar 07 08:20:04 corosync [SERV  ] Service engine loaded: corosync 
profile loading service
Mar 07 08:20:04 corosync [SERV  ] Service engine loaded: corosync 
cluster quorum service v0.1
Mar 07 08:20:04 corosync [pcmk  ] notice: pcmk_peer_update: Transitional 
membership event on ring 44: memb=0, new=0, lost=0
Mar 07 08:20:04 corosync [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 44: memb=1, new=1, lost=0
Mar 07 08:20:04 corosync [pcmk  ] info: update_member: Creating entry 
for node 1409351946 born on 44
Mar 07 08:20:04 corosync [pcmk  ] info: update_member: Node 
1409351946/unknown is now: member
Mar 07 08:20:04 corosync [pcmk  ] info: pcmk_peer_update: NEW: .pending. 
1409351946
Mar 07 08:20:05 corosync [pcmk  ] info: pcmk_peer_update: MEMB: 
.pending. 1409351946
Mar 07 08:20:05 corosync [pcmk  ] info: pcmk_update_nodeid: Local node 
id: 1409351946
Mar 07 08:20:05 corosync [pcmk  ] info: update_member: Node (null) now 
has 1 quorum votes (was 0)
Mar 07 08:20:05 corosync [pcmk  ] info: send_member_notification: 
Sending membership update 44 to 0 children
Mar 07 08:20:05 corosync [pcmk  ] info: update_member: Node (null) now 
has process list: 00000000000000000000000000000002 (2)
Mar 07 08:20:05 corosync [TOTEM ] A processor joined or left the 
membership and a new membership was formed.
Mar 07 08:20:05 corosync [pcmk  ] info: update_member: 0xec71ac0 Node 
1409351946 now known as  (was: (null))
Mar 07 08:20:05 corosync [pcmk  ] info: send_member_notification: 
Sending membership update 44 to 0 children
Mar 07 08:20:05 corosync [MAIN  ] Completed service synchronization, 
ready to provide service.
Mar 07 08:22:59 corosync [SERV  ] Unloading all Corosync service engines.
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: Shuting down 
Pacemaker
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: crmd confirmed 
stopped
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: pengine 
confirmed stopped
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: attrd confirmed 
stopped
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: lrmd confirmed 
stopped
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: cib confirmed 
stopped
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: stonithd 
confirmed stopped
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: Shutdown complete
Mar 07 08:22:59 corosync [SERV  ] Service engine unloaded: Pacemaker 
Cluster Manager 1.0.7
Mar 07 08:22:59 corosync [SERV  ] Service engine unloaded: corosync 
extended virtual synchrony service
Mar 07 08:22:59 corosync [SERV  ] Service engine unloaded: corosync 
configuration service
Mar 07 08:22:59 corosync [SERV  ] Service engine unloaded: corosync 
cluster closed process group service v1.01
Mar 07 08:22:59 corosync [SERV  ] Service engine unloaded: corosync 
cluster config database access v1.01
Mar 07 08:22:59 corosync [SERV  ] Service engine unloaded: corosync 
profile loading service
Mar 07 08:22:59 corosync [SERV  ] Service engine unloaded: corosync 
cluster quorum service v0.1
Mar 07 08:22:59 corosync [MAIN  ] Corosync Cluster Engine exiting with 
status -1 at main.c:158.

Any hints welcome!!

TIA,
erich




More information about the Pacemaker mailing list