[Pacemaker] corosync pacemaker exit after some time

Tue Jan 22 08:53:51 EST 2013

HI,

on Fedora 17 corosync pacemaker version 1.1.7 (fedora update)
all nodes quit corosync pacemaker after a while

[root at node140 ~]# systemctl status corosync
corosync.service - Corosync Cluster Engine
          Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled)
          Active: failed (Result: exit-code) since Tue, 22 Jan 2013 08:42:47 -0500; 5min ago
         Process: 13152 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
         Process: 26754 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE)
        Main PID: 1442 (code=dumped, signal=BUS)
          CGroup: name=systemd:/system/corosync.service

Jan 22 08:42:47 node140 corosync[26754]: [62B blob data]
Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Unloading all Corosync service engines.
Jan 22 08:42:47 node140 corosync[26761]: [QB    ] withdrawing server sockets
Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Jan 22 08:42:47 node140 corosync[26761]: [QB    ] withdrawing server sockets
Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Service engine unloaded: corosync configuration map access
Jan 22 08:42:47 node140 corosync[26761]: [QB    ] withdrawing server sockets
Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Service engine unloaded: corosync configuration service
Jan 22 08:42:47 node140 corosync[26761]: [QB    ] withdrawing server sockets
Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01

in log:

Jan 22 08:42:23 node140 pacemakerd[26540]:     info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
Jan 22 08:42:23 node140 pacemakerd[26540]: Could not initialize Cluster Configuration Database API instance, error 2
Jan 22 08:42:23 node140 systemd[1]: pacemaker.service: main process exited, code=exited, status=1
Jan 22 08:42:23 node140 systemd[1]: Unit pacemaker.service entered failed state.
Jan 22 08:42:23 node140 systemd[1]: pacemaker.service holdoff time over, scheduling restart.

permission problems ? if yes is cores/root must be other than hacluster.root ?

Thanks

Franck

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130122/1a2465d4/attachment-0002.html>