[ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

Mon Mar 22 13:31:09 EDT 2021

Thank you.
>From chapter 1 to 6, I never saw anything about configuring the floating IP address! Am I wrong?

On Monday, March 22, 2021, 07:06:47 PM GMT+4:30, Ken Gaillot <kgaillot at redhat.com> wrote: 

On Mon, 2021-03-22 at 08:15 +0000, Jason Long wrote:
> Thank you.
> 
> My test lab use VirtualBox with two VMs as below:
> VM1: This VM has two NICs (NAT, Host-only Adapter)
> VM2: This VM has one NIC (Host-only Adapter)
> 
> On VM1, I use the NAT interface for the port forwarding:
> "127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on Guest.
> 
> 
> Yes, "systemctl" tell me:
> 
> # systemctl is-enabled httpd.service
> disabled
> 
> I rebooted my nodes and one of the problems solved:
> https://paste.ubuntu.com/p/7cQQtsXFPV/
> 
> I did:
> # pcs resource defaults resource-stickiness=100
> 
> 
> When I browse "127.0.0.1:2080" then it shows me "My Test Site -
> node1".
> 
> I have two problems:
> 
> 1- When I stopped the node1 VM and refresh the page then I can't see
> "My Test Site - node2"?
> 
> # pcs cluster stop node1
> node1: Stopping Cluster (pacemaker)...
> node1: Stopping Cluster (corosync)...
> 
> # pcs status
> Error: error running crm_mon, is pacemaker running?
> Could not connect to the CIB: Transport endpoint is not connected
> crm_mon: Error: cluster is not available on this node

Hi,

pcs status doesn't test the web site, it shows the internal cluster
status. Since the cluster isn't running on that node, it can't show
anything.

However the website is still active on the other node, and reachable
from this node. You can confirm that by using wget or curl with the
public web site URL (the floating IP address).

> 
> # pcs resource defaults
> Error: unable to get cib
> 
> 
> I think that it must forward my requests from node1 to node2
> automatically and I see "My Test Site - node2" message.
> 
> 
> 2- I start the node1 again, but when I browse "IP:80", then I can't
> see "My Test Site - node1" message.
> 
> # pcs cluster start node1
> node1: Starting Cluster...
> 
> 
> # pcs status
> Cluster name: mycluster
> Cluster Summary:
>  * Stack: corosync
>  * Current DC: node2 (version 2.0.5-10.fc33-ba59be7122) - partition
> with quorum
>  * Last updated: Mon Mar 22 12:26:10 2021
>  * Last change:  Mon Mar 22 12:08:02 2021 by root via cibadmin on
> node1
>  * 2 nodes configured
>  * 2 resource instances configured
> 
> Node List:
>  * Online: [ node1 node2 ]
> 
> Full List of Resources:
>  * WebSite    (ocf::heartbeat:apache):    Started node2
>  * ClusterIP    (ocf::heartbeat:IPaddr2):    Started node2
> 
> Daemon Status:
>  corosync: active/enabled
>  pacemaker: active/enabled
>  pcsd: active/enabled
> 
> 
> 
> Logs are:
> https://paste.ubuntu.com/p/Yt4K2kPM7b/
> 
> 
> Thank you again.
> 
> 
> On Monday, March 22, 2021, 01:12:21 AM GMT+4:30, Reid Wahl <
> nwahl at redhat.com> wrote: 
> 
> 
> 
> 
> 
> Hi, Jason.
> 
> On Sun, Mar 21, 2021 at 5:21 AM Jason Long <hack3rcon at yahoo.com>
> wrote:
> > Hello,
> > I used "Clusters from Scratch" to configuration two nodes. I got
> > below error:
> > 
> > # pcs status
> > Cluster name: mycluster
> > Cluster Summary:
> >  * Stack: corosync
> >  * Current DC: node1 (version 2.0.5-10.fc33-ba59be7122) -
> > partition with quorum
> >  * Last updated: Sun Mar 21 15:35:18 2021
> >  * Last change:  Sun Mar 21 15:29:38 2021 by root via cibadmin on
> > node1
> >  * 2 nodes configured
> >  * 2 resource instances configured
> > 
> > Node List:
> >  * Online: [ node1 node2 ]
> > 
> > Full List of Resources:
> >  * WebSite    (ocf::heartbeat:apache):    Stopped
> >  * ClusterIP    (ocf::heartbeat:IPaddr2):    Started node1
> > 
> > Failed Resource Actions:
> >  * WebSite_start_0 on node1 'error' (1): call=6,
> > status='complete', exitreason='Failed to access httpd status
> > page.', last-rc-change='2021-03-21 15:23:45 +03:30', queued=0ms,
> > exec=1318ms
> >  * WebSite_start_0 on node2 'error' (1): call=6,
> > status='complete', exitreason='Failed to access httpd status
> > page.', last-rc-change='2021-03-21 15:23:47 +03:30', queued=0ms,
> > exec=1380ms
> > 
> > Daemon Status:
> >  corosync: active/enabled
> >  pacemaker: active/enabled
> >  pcsd: active/enabled
> > 
> > 
> > *********
> > I have some questions:
> > 
> > 1- In "Chapter 6. Add Apache HTTP Server as a Cluster Service", an
> > important note said:
> > "Do not enable the httpd service. Services that are intended to be
> > managed via the cluster software should never be managed by the OS.
> > It is often useful, however, to manually start the service, verify
> > that it works, then stop it again, before adding it to the cluster.
> > This allows you to resolve any non-cluster-related problems before
> > continuing. Since this is a simple example, we’ll skip that step
> > here."
> > 
> > If the Apache service is not enabled they how can I connect to it
> > via below command: 
> >  
> > # wget -O - http://localhost/server-status
> > --2021-03-21 15:38:39--  http://localhost/server-status
> > Resolving localhost (localhost)... 127.0.0.1, ::1
> > Connecting to localhost (localhost)|127.0.0.1|:80... failed:
> > Connection timed out.
> > Connecting to localhost (localhost)|::1|:80... failed: Network is
> > unreachable.
> 
> Pacemaker starts the httpd service by starting the
> ocf:heartbeat:apache resource. The article is saying that the
> httpd.service systemd unit should not be enabled to start
> automatically at boot; it should only start when the cluster starts
> it. That is `systemctl is-enabled httpd.service` should print
> "disabled".
> 
> >  
> > 
> > 2- Below commands must be run on both nodes or just one node?
> > 
> > # pcs resource create ClusterIP ocf:heartbeat:IPaddr2
> > ip="IP_That_Never_Used_In_The_Network" cidr_netmask=32 op monitor
> > interval=30s
> > 
> > # pcs resource create WebSite ocf:heartbeat:apache
> > configfile=/etc/httpd/conf/httpd.conf statusurl="
> > http://localhost/server-status" op monitor interval=20s
> 
> Just one node.
> 
> >  
> > 
> > 3- Why "* WebSite    (ocf::heartbeat:apache):    Stopped" ?
> 
> The apache resource agent ran a command similar to `wget -O- -q -L --
> no-proxy --bind-address=127.0.0.1 <status_url>` and got an error. It
> tried this on a start operation on each node, and it failed on both
> nodes. When a resource fails to start on a given node, the default
> response is to prevent it from starting on that node again until the
> failure is cleared.
> 
> 
> 
> >  
> > Logs are:
> > https://paste.ubuntu.com/p/MtkfXyRX4P/
> > 
> > 
> > Thank you.
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> > 
> 
> 
-- 
Ken Gaillot <kgaillot at redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/