[ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

Tue Mar 23 16:15:06 EDT 2021

Thanks.
The floating IP address must not use by other machines. I have two VMs that using "192.168.57.6" and "192.168.57.7". Could the floating IP address be "192.168.57.8"?
Which part of my configuration is wrong? Why, when I disconnect node1, then node2 doesn't replace it?

On Wednesday, March 24, 2021, 12:33:53 AM GMT+4:30, Ken Gaillot <kgaillot at redhat.com> wrote: 

On Tue, 2021-03-23 at 19:07 +0000, Jason Long wrote:
> Thanks, but I want to have a cluster with two nodes and nothing more!

The end result is to have 2 nodes with 3 IP addresses:

* The first node has a permanently assigned IP address that it brings
up when it boots; this address is not managed by the cluster

* The second node also has a permanent address not managed by the
cluster

* A third, unused IP address from the same subnet is used as a
"floating" IP address, which means the cluster can sometimes run it on
the first node and sometimes on the second node. This IP address is the
one that users will use to contact the service.

That way, users always have a single address that they use, no matter
which node is providing the service.

> 
> 
> 
> 
> 
> On Tuesday, March 23, 2021, 07:59:57 PM GMT+4:30, Klaus Wenninger <
> kwenning at redhat.com> wrote: 
> 
> 
> 
> 
> 
> On 3/23/21 4:07 PM, Jason Long wrote:
> > Thank you.
> > Thus, where I must define my node2 IP address? When node1
> > disconnected, I want node2 replace it.
> > 
> 
> You just need a single IP address that you are assigning to the
> virtual 
> IP resource.
> And pacemaker is gonna move that IP address - along with the web-
> proxy - 
> between
> the 2 nodes.
> Of course node1 & node2 have IP addresses that are being used for 
> cluster-communication
> but they are totally independent (well maybe in the same subnet for
> a 
> simple setup)
> from the IP address your web-proxy is reachable at.
> 
> Klaus
> 
> > 
> > 
> > 
> > 
> > On Tuesday, March 23, 2021, 01:03:39 PM GMT+4:30, Klaus Wenninger <
> > kwenning at redhat.com> wrote:
> > 
> > 
> > 
> > 
> > 
> > On 3/23/21 9:13 AM, Jason Long wrote:
> > > Thank you.
> > > But: 
> > > https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch06.html
> > > ?
> > > 
> > > The floating IP address is: 
> > > https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_add_a_resource.html
> > > In the "Warning" written: "The chosen address must not already be
> > > in use on the network. Do not reuse an IP address one of the
> > > nodes already has configured.", what does it mean?
> > 
> > It means that if you would be using an IP that is already in use
> > on your network - by one of your cluster-nodes or something else -
> > pacemaker would possibly activate that IP and you would have
> > a duplicate IP in your network.
> > Thus for the question below: Don't use the IP od node2 for
> > your floating IP.
> > 
> > Klaus
> > 
> > > In the below command, "IP" is the IP address of my node2?
> > > # pcs resource create ClusterIP
> > > ocf:heartbeat:IPaddr2 ip=192.168.122.120 cidr_netmask=32 op
> > > monitor interval=30s
> > > 
> > > If yes, then I must update it with below command?
> > > 
> > > # pcs resource update floating_ip ocf:heartbeat:IPaddr2 ip="Node2
> > > IP" cidr_netmask=32 op monitor interval=30s
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On Tuesday, March 23, 2021, 12:02:15 AM GMT+4:30, Ken Gaillot <
> > > kgaillot at redhat.com> wrote:
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On Mon, 2021-03-22 at 17:31 +0000, Jason Long wrote:
> > > > Thank you.
> > > >    From chapter 1 to 6, I never saw anything about configuring
> > > > the
> > > > floating IP address! Am I wrong?
> > > 
> > > Hi,
> > > 
> > > Chapter 6 should be "Create an Active/Passive Cluster", which
> > > adds a
> > > floating IP, then Chapter 7 is "Add Apache HTTP Server as a
> > > Cluster
> > > Service".
> > > 
> > > 
> > > 
> > > > On Monday, March 22, 2021, 07:06:47 PM GMT+4:30, Ken Gaillot <
> > > > kgaillot at redhat.com> wrote:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > On Mon, 2021-03-22 at 08:15 +0000, Jason Long wrote:
> > > > > Thank you.
> > > > > 
> > > > > My test lab use VirtualBox with two VMs as below:
> > > > > VM1: This VM has two NICs (NAT, Host-only Adapter)
> > > > > VM2: This VM has one NIC (Host-only Adapter)
> > > > > 
> > > > > On VM1, I use the NAT interface for the port forwarding:
> > > > > "127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on
> > > > > Guest.
> > > > > 
> > > > > 
> > > > > Yes, "systemctl" tell me:
> > > > > 
> > > > > # systemctl is-enabled httpd.service
> > > > > disabled
> > > > > 
> > > > > I rebooted my nodes and one of the problems solved:
> > > > > https://paste.ubuntu.com/p/7cQQtsXFPV/
> > > > > 
> > > > > I did:
> > > > > # pcs resource defaults resource-stickiness=100
> > > > > 
> > > > > 
> > > > > When I browse "127.0.0.1:2080" then it shows me "My Test Site
> > > > > -
> > > > > node1".
> > > > > 
> > > > > I have two problems:
> > > > > 
> > > > > 1- When I stopped the node1 VM and refresh the page then I
> > > > > can't
> > > > > see
> > > > > "My Test Site - node2"?
> > > > > 
> > > > > # pcs cluster stop node1
> > > > > node1: Stopping Cluster (pacemaker)...
> > > > > node1: Stopping Cluster (corosync)...
> > > > > 
> > > > > # pcs status
> > > > > Error: error running crm_mon, is pacemaker running?
> > > > > Could not connect to the CIB: Transport endpoint is not
> > > > > connected
> > > > > crm_mon: Error: cluster is not available on this node
> > > > 
> > > > Hi,
> > > > 
> > > > pcs status doesn't test the web site, it shows the internal
> > > > cluster
> > > > status. Since the cluster isn't running on that node, it can't
> > > > show
> > > > anything.
> > > > 
> > > > However the website is still active on the other node, and
> > > > reachable
> > > > from this node. You can confirm that by using wget or curl with
> > > > the
> > > > public web site URL (the floating IP address).
> > > > 
> > > > > # pcs resource defaults
> > > > > Error: unable to get cib
> > > > > 
> > > > > 
> > > > > I think that it must forward my requests from node1 to node2
> > > > > automatically and I see "My Test Site - node2" message.
> > > > > 
> > > > > 
> > > > > 2- I start the node1 again, but when I browse "IP:80", then I
> > > > > can't
> > > > > see "My Test Site - node1" message.
> > > > > 
> > > > > # pcs cluster start node1
> > > > > node1: Starting Cluster...
> > > > > 
> > > > > 
> > > > > # pcs status
> > > > > Cluster name: mycluster
> > > > > Cluster Summary:
> > > > >      * Stack: corosync
> > > > >      * Current DC: node2 (version 2.0.5-10.fc33-ba59be7122)
> > > > > -
> > > > > partition
> > > > > with quorum
> > > > >      * Last updated: Mon Mar 22 12:26:10 2021
> > > > >      * Last change:  Mon Mar 22 12:08:02 2021 by root via
> > > > > cibadmin on
> > > > > node1
> > > > >      * 2 nodes configured
> > > > >      * 2 resource instances configured
> > > > > 
> > > > > Node List:
> > > > >      * Online: [ node1 node2 ]
> > > > > 
> > > > > Full List of Resources:
> > > > >      * WebSite    (ocf::heartbeat:apache):    Started node2
> > > > >      * ClusterIP    (ocf::heartbeat:IPaddr2):    Started
> > > > > node2
> > > > > 
> > > > > Daemon Status:
> > > > >      corosync: active/enabled
> > > > >      pacemaker: active/enabled
> > > > >      pcsd: active/enabled
> > > > > 
> > > > > 
> > > > > 
> > > > > Logs are:
> > > > > https://paste.ubuntu.com/p/Yt4K2kPM7b/
> > > > > 
> > > > > 
> > > > > Thank you again.
> > > > > 
> > > > > 
> > > > > On Monday, March 22, 2021, 01:12:21 AM GMT+4:30, Reid Wahl <
> > > > > nwahl at redhat.com> wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Hi, Jason.
> > > > > 
> > > > > On Sun, Mar 21, 2021 at 5:21 AM Jason Long <
> > > > > hack3rcon at yahoo.com>
> > > > > wrote:
> > > > > > Hello,
> > > > > > I used "Clusters from Scratch" to configuration two nodes.
> > > > > > I got
> > > > > > below error:
> > > > > > 
> > > > > > # pcs status
> > > > > > Cluster name: mycluster
> > > > > > Cluster Summary:
> > > > > >      * Stack: corosync
> > > > > >      * Current DC: node1 (version 2.0.5-10.fc33-
> > > > > > ba59be7122) -
> > > > > > partition with quorum
> > > > > >      * Last updated: Sun Mar 21 15:35:18 2021
> > > > > >      * Last change:  Sun Mar 21 15:29:38 2021 by root via
> > > > > > cibadmin
> > > > > > on
> > > > > > node1
> > > > > >      * 2 nodes configured
> > > > > >      * 2 resource instances configured
> > > > > > 
> > > > > > Node List:
> > > > > >      * Online: [ node1 node2 ]
> > > > > > 
> > > > > > Full List of Resources:
> > > > > >      * WebSite    (ocf::heartbeat:apache):    Stopped
> > > > > >      * ClusterIP    (ocf::heartbeat:IPaddr2):    Started
> > > > > > node1
> > > > > > 
> > > > > > Failed Resource Actions:
> > > > > >      * WebSite_start_0 on node1 'error' (1): call=6,
> > > > > > status='complete', exitreason='Failed to access httpd
> > > > > > status
> > > > > > page.', last-rc-change='2021-03-21 15:23:45 +03:30',
> > > > > > queued=0ms,
> > > > > > exec=1318ms
> > > > > >      * WebSite_start_0 on node2 'error' (1): call=6,
> > > > > > status='complete', exitreason='Failed to access httpd
> > > > > > status
> > > > > > page.', last-rc-change='2021-03-21 15:23:47 +03:30',
> > > > > > queued=0ms,
> > > > > > exec=1380ms
> > > > > > 
> > > > > > Daemon Status:
> > > > > >      corosync: active/enabled
> > > > > >      pacemaker: active/enabled
> > > > > >      pcsd: active/enabled
> > > > > > 
> > > > > > 
> > > > > > *********
> > > > > > I have some questions:
> > > > > > 
> > > > > > 1- In "Chapter 6. Add Apache HTTP Server as a Cluster
> > > > > > Service",
> > > > > > an
> > > > > > important note said:
> > > > > > "Do not enable the httpd service. Services that are
> > > > > > intended to
> > > > > > be
> > > > > > managed via the cluster software should never be managed by
> > > > > > the
> > > > > > OS.
> > > > > > It is often useful, however, to manually start the service,
> > > > > > verify
> > > > > > that it works, then stop it again, before adding it to the
> > > > > > cluster.
> > > > > > This allows you to resolve any non-cluster-related problems
> > > > > > before
> > > > > > continuing. Since this is a simple example, we’ll skip that
> > > > > > step
> > > > > > here."
> > > > > > 
> > > > > > If the Apache service is not enabled they how can I connect
> > > > > > to it
> > > > > > via below command:
> > > > > >      
> > > > > > # wget -O - http://localhost/server-status
> > > > > > --2021-03-21 15:38:39--  http://localhost/server-status
> > > > > > Resolving localhost (localhost)... 127.0.0.1, ::1
> > > > > > Connecting to localhost (localhost)|127.0.0.1|:80...
> > > > > > failed:
> > > > > > Connection timed out.
> > > > > > Connecting to localhost (localhost)|::1|:80... failed:
> > > > > > Network is
> > > > > > unreachable.
> > > > > 
> > > > > Pacemaker starts the httpd service by starting the
> > > > > ocf:heartbeat:apache resource. The article is saying that the
> > > > > httpd.service systemd unit should not be enabled to start
> > > > > automatically at boot; it should only start when the cluster
> > > > > starts
> > > > > it. That is `systemctl is-enabled httpd.service` should print
> > > > > "disabled".
> > > > > 
> > > > > >      
> > > > > > 
> > > > > > 2- Below commands must be run on both nodes or just one
> > > > > > node?
> > > > > > 
> > > > > > # pcs resource create ClusterIP ocf:heartbeat:IPaddr2
> > > > > > ip="IP_That_Never_Used_In_The_Network" cidr_netmask=32 op
> > > > > > monitor
> > > > > > interval=30s
> > > > > > 
> > > > > > # pcs resource create WebSite ocf:heartbeat:apache
> > > > > > configfile=/etc/httpd/conf/httpd.conf statusurl="
> > > > > > http://localhost/server-status" op monitor interval=20s
> > > > > 
> > > > > Just one node.
> > > > > 
> > > > > >      
> > > > > > 
> > > > > > 3- Why "* WebSite    (ocf::heartbeat:apache):    Stopped" ?
> > > > > 
> > > > > The apache resource agent ran a command similar to `wget -O-
> > > > > -q -L
> > > > > --
> > > > > no-proxy --bind-address=127.0.0.1 <status_url>` and got an
> > > > > error.
> > > > > It
> > > > > tried this on a start operation on each node, and it failed
> > > > > on both
> > > > > nodes. When a resource fails to start on a given node, the
> > > > > default
> > > > > response is to prevent it from starting on that node again
> > > > > until
> > > > > the
> > > > > failure is cleared.
> > > > > 
> > > > > 
> > > > > 
> > > > > >      
> > > > > > Logs are:
> > > > > > https://paste.ubuntu.com/p/MtkfXyRX4P/
> > > > > > 
> > > > > > 
> > > > > > Thank you.
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Manage your subscription:
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > 
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > > 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

-- 
Ken Gaillot <kgaillot at redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/