[ClusterLabs] newbie questions

Wed Jun 1 03:34:09 UTC 2016

On 31/05/16 10:41 PM, Jay Scott wrote:
> hooray for me, but, how?
> 
> I got about 3/4 of Digimer's list done and got stuck.
> I did a pcs cluster status, and, behold, the cluster was up.
> I pinged the ClusterIP and it answered.  I didn't know what
> to do with the 'delay="x"' part, that's the thing I couldn't figure
> out.  (I've been assuming the delay part is a big deal.)

Delay works like this;

Both nodes are up, but comms break (switch loop/broadcast storm,
STP/stack renegotiation, iptables oops, whatever)... Both nodes declare
their peer lost.

Node 1's stonith config includes 'delay="15"'.

Node 1 looks up how to fence node 2, calls the fence.

Node 2 looks up how to fence node 1, calls fence (passing to the agent
the delay).

The fence agent running on node 1 executes without delay.

The fence agent running on node 2 sees a delay of 15 seconds, and sleeps.

Node 1 kills node 2 before the sleep exits, thus ensuring that node 1
lived and node 2 died. Assuming you have your services on node 1, then
that means no recovery is needed.

Now assume that node 1 truly died. Node 2's fence agent would exit the
sleep after 15 seconds and proceed to shoot node 1 and then recover any
resources that had been on node 1.

digimer

> However, there are more things for me to read and more experiments
> for me to try so I'm good for now.
> 
> Thanks to everyone for the prompt help.
> 
> j.
> 
> On Tue, May 31, 2016 at 5:22 PM, Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>> wrote:
> 
>     On 05/31/2016 03:59 PM, Jay Scott wrote:
>     > Greetings,
>     >
>     > Cluster newbie
>     > Centos 7
>     > trying to follow the "Clusters from Scratch" intro.
>     > 2 nodes (yeah, I know, but I'm just learning)
>     > <PRE>
>     > [root at smoking ~]# pcs status
>     > Cluster name:
>     > Last updated: Tue May 31 15:32:18 2016        Last change: Tue May 31
>     > 15:02:21
>     >  2016 by root via cibadmin on smoking
>     > Stack: unknown
> 
>     "Stack: unknown" is a big problem. The cluster isn't aware of the
>     corosync configuration. Did you do the "pcs cluster setup" step?
> 
>     > Current DC: NONE
>     > 2 nodes and 1 resource configured
>     >
>     > OFFLINE: [ mars smoking ]
>     >
>     > Full list of resources:
>     >
>     >  ClusterIP    (ocf::heartbeat:IPaddr2):    Stopped
>     >
>     > PCSD Status:
>     >   smoking: Online
>     >   mars: Online
>     >
>     > Daemon Status:
>     >   corosync: active/enabled
>     >   pacemaker: active/enabled
>     >   pcsd: active/enabled
>     > </PRE>
>     >
>     > What concerns me at the moment:
>     > I did
>     > pcs resource enable ClusterIP
>     > while simultaneously doing
>     > tail -f /var/log/cluster/corosync.log
>     > (the only log in there)
> 
>     The system log (/var/log/messages or whatever your system has
>     configured) is usually the best place to start. The cluster software
>     sends messages of interest to end users there, and it includes messages
>     from all components (corosync, pacemaker, resource agents, etc.).
> 
>     /var/log/cluster/corosync.log (and in some configurations,
>     /var/log/pacemaker.log) have more detailed log information for
>     debugging.
> 
>     > and nothing happens in the log, but the ClusterIP
>     > stays "Stopped".  Should I be able to ping that addr?
>     > I can't.
>     > It also says OFFLINE:  and both of my machines are offline,
>     > though the PCSD says they're online.  Which do I trust?
> 
>     The first online/offline output is most important, and refers to the
>     node's status in the actual cluster; the "PSCD" online/offline output
>     simply tells whether the pcs daemon is running. Typically, the pcs
>     daemon is enabled at boot and is always running. The pcs daemon is not
>     part of the clustering itself; it's a front end to configuring and
>     administering the cluster.
> 
>     > [root at smoking ~]# pcs property show stonith-enabled
>     > Cluster Properties:
>     >  stonith-enabled: false
>     >
>     > yet I see entries in the corosync.log referring to stonith.
>     > I'm guessing that's normal.
> 
>     Yes, you can enable stonith at any time, so the stonith daemon will
>     still run, to stay aware of the cluster status.
> 
>     > My corosync.conf file says the quorum is off.
>     >
>     > I also don't know what to include in this for any of you to
>     > help me debug.
>     >
>     > Ahh, also, is this considered "long", and if so, where would I post
>     > to the web?
>     >
>     > thx.
>     >
>     > j.
> 
>     _______________________________________________
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>     http://clusterlabs.org/mailman/listinfo/users
> 
>     Project Home: http://www.clusterlabs.org
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>     Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?