[Pacemaker] Not connected to AIS

Proskurin Kirill k.proskurin at corp.mail.ru
Fri Jun 24 04:56:23 EDT 2011


Hello.

I have a strange problem.
One node in cluster are not work right.


In logs:
Jun 23 20:25:25 mysender39.example.com lrmd: [10371]: WARN: For LSB init 
script, no additional parameters are needed.
Jun 23 20:25:25 mysender39.example.com lrmd: [30679]: info: RA output: 
(onlineconf.init:3:stop:stdout) Stopping onlineconf_updater:
Jun 23 20:25:25 mysender39.example.com lrmd: [30679]: info: RA output: 
(onlineconf.init:3:stop:stdout) [
Jun 23 20:25:25 mysender39.example.com lrmd: [30679]: info: RA output: 
(onlineconf.init:3:stop:stdout)   OK
Jun 23 20:25:25 mysender39.example.com lrmd: [30679]: info: RA output: 
(onlineconf.init:3:stop:stdout) ]

Jun 23 20:25:25 mysender39.example.com crmd: [30682]: info: 
process_lrm_event: LRM operation onlineconf.init:3_stop_0 (call=181, 
rc=0, cib-update=683339, confirm
ed=true) ok
Jun 23 20:25:25 mysender39.example.com cib: [30678]: ERROR: 
send_ais_message: Not connected to AIS

And then many errors and this string over and over.
But at crm_mod all seems quite:
Last updated: Fri Jun 24 12:35:05 2011
Stack: openais
Current DC: mysender6.example.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
4 Nodes configured, 4 expected votes
7 Resources configured.

Online: [ mysender6.example.com mysender31.example.com 
mysender38.example.com mysender39.example.com ]

And clone resource at this not is "unmanaged".

onlineconf.init:3  (lsb:onlineconf):       Started 
mysender39.example.com (unmanaged) FAILED

Failed actions:
     onlineconf.init:3_monitor_5000 (node=mysender39.example.com, 
call=180, rc=7, status=complete): not running
     onlineconf.init:3_stop_0 (node=mysender39.example.com, call=-1, 
rc=1, status=Timed Out): unknown error

At logs:

Jun 24 12:43:15 mysender39.example.com attrd: [30680]: WARN: 
attrd_cib_callback: Update 333725 for 
fail-count-onlineconf.init:2=(null) failed: Remote node did not respond

But if I run it by hands it is answers immediately:
# /etc/init.d/onlineconf status
onlineconf_updater is stopped

I do /etc/init.d/corosync restart
I wait for 5 min but it still "Waiting for corosync services to unload"
So i kill  with -9 and restart.

And all start normal again.
What was wrong?

Corosync-1.2.7
Pacemaker-1.0.11

-- 
Best regards,
Proskurin Kirill




More information about the Pacemaker mailing list