[Pacemaker] node is offline; can't bring online

Wed Nov 7 22:56:04 EST 2012

I don't really know when the trouble started.
I ended up restarting pacemaker on all nodes, and it cleared things
up. I'm not sure why, though.
If I have the same issue come up, I'll run the crm_report and open a bug.

Thanks,

Paul

On Wed, Nov 7, 2012 at 9:22 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Thu, Nov 8, 2012 at 1:55 PM, Paul Archer <paul at paularcher.org> wrote:
>> I'm fairly new to pacemaker, and this is hurting my head.
>> I have a four-node cluster, and one of my nodes (for no reason that I
>> can discern) has gone offline, and I can't get it to come back online.
>>
>> Offline node:
>> root at vmhost2:/var/lib/heartbeat# crm_mon -1
>> ============
>> Last updated: Wed Nov  7 20:52:16 2012
>> Last change: Wed Nov  7 20:28:06 2012 via cibadmin on vmhost2
>> Stack: openais
>> Current DC: NONE
>> 4 Nodes configured, 4 expected votes
>> 7 Resources configured.
>> ============
>>
>> OFFLINE: [ vgs1 vgs2 vmhost1 vmhost2 ]
>>
>>
>>
>> One of the online nodes:
>> root at vmhost1:/var/lib/heartbeat/crm# crm_mon -1
>> ============
>> Last updated: Wed Nov  7 20:45:32 2012
>> Last change: Wed Nov  7 20:44:59 2012 via crm_attribute on vgs2
>> Stack: openais
>> Current DC: vgs1 - partition with quorum
>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
>> 4 Nodes configured, 4 expected votes
>> 7 Resources configured.
>> ============
>>
>> Node vmhost2: standby
>> Online: [ vgs1 vgs2 vmhost1 ]
>>
>>  focus  (ocf::heartbeat:VirtualDomain): Started vmhost1
>>  logger (ocf::heartbeat:VirtualDomain): Started vmhost1
>>  mother (ocf::heartbeat:VirtualDomain): Started vmhost2
>>  vgsIP  (ocf::heartbeat:IPaddr2):       Started vgs2
>>  vgsWebServer   (ocf::heartbeat:apache):        Started vgs2
>>
>>
>> I don't know what's relevant as far as log files, so I will post as
>> people ask for specifics, rather than just dumping everything here to
>> start with.
>
> You should have crm_report and/or hb_report.
> Use it to gather everything from around about the time the node went offline.
> Probably best to open a bug at http://bugs.clusterlabs.org and attach
> the resulting tarball there.
>
> If the cluster is still in this state, it would also be useful to see
> the corosync-objctl -a output from vgs1 and vmhost2.
> As well as the output from cibadmin -Ql from vgs1.
>
>>
>>
>> Thanks for any help,
>>
>> Paul
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org