[Pacemaker] Not connected to AIS

Andrew Beekhof andrew at beekhof.net
Mon Jun 27 01:15:38 EDT 2011


On Fri, Jun 24, 2011 at 6:56 PM, Proskurin Kirill
<k.proskurin at corp.mail.ru> wrote:
> Hello.
>
> I have a strange problem.
> One node in cluster are not work right.
>
>
> In logs:
> Jun 23 20:25:25 mysender39.example.com lrmd: [10371]: WARN: For LSB init
> script, no additional parameters are needed.
> Jun 23 20:25:25 mysender39.example.com lrmd: [30679]: info: RA output:
> (onlineconf.init:3:stop:stdout) Stopping onlineconf_updater:
> Jun 23 20:25:25 mysender39.example.com lrmd: [30679]: info: RA output:
> (onlineconf.init:3:stop:stdout) [
> Jun 23 20:25:25 mysender39.example.com lrmd: [30679]: info: RA output:
> (onlineconf.init:3:stop:stdout)   OK
> Jun 23 20:25:25 mysender39.example.com lrmd: [30679]: info: RA output:
> (onlineconf.init:3:stop:stdout) ]
>
> Jun 23 20:25:25 mysender39.example.com crmd: [30682]: info:
> process_lrm_event: LRM operation onlineconf.init:3_stop_0 (call=181, rc=0,
> cib-update=683339, confirm
> ed=true) ok
> Jun 23 20:25:25 mysender39.example.com cib: [30678]: ERROR:
> send_ais_message: Not connected to AIS
>
> And then many errors and this string over and over.

Not enough information.
Please include a crm_report for the time between 20:20:00 and 20:30:00
on June 23.


> But at crm_mod all seems quite:
> Last updated: Fri Jun 24 12:35:05 2011
> Stack: openais
> Current DC: mysender6.example.com - partition with quorum
> Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
> 4 Nodes configured, 4 expected votes
> 7 Resources configured.
>
> Online: [ mysender6.example.com mysender31.example.com
> mysender38.example.com mysender39.example.com ]
>
> And clone resource at this not is "unmanaged".
>
> onlineconf.init:3  (lsb:onlineconf):       Started mysender39.example.com
> (unmanaged) FAILED
>
> Failed actions:
>    onlineconf.init:3_monitor_5000 (node=mysender39.example.com, call=180,
> rc=7, status=complete): not running
>    onlineconf.init:3_stop_0 (node=mysender39.example.com, call=-1, rc=1,
> status=Timed Out): unknown error
>
> At logs:
>
> Jun 24 12:43:15 mysender39.example.com attrd: [30680]: WARN:
> attrd_cib_callback: Update 333725 for fail-count-onlineconf.init:2=(null)
> failed: Remote node did not respond
>
> But if I run it by hands it is answers immediately:
> # /etc/init.d/onlineconf status
> onlineconf_updater is stopped
>
> I do /etc/init.d/corosync restart
> I wait for 5 min but it still "Waiting for corosync services to unload"
> So i kill  with -9 and restart.
>
> And all start normal again.
> What was wrong?
>
> Corosync-1.2.7
> Pacemaker-1.0.11
>
> --
> Best regards,
> Proskurin Kirill
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




More information about the Pacemaker mailing list