[ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

Wed Mar 24 13:50:50 EDT 2021

On Wed, 2021-03-24 at 10:50 +0000, Jason Long wrote:
> Thank you.
> Form node1 and node2, I can ping the floating IP address
> (192.168.56.9).
> I stopped node1:
> 
> # pcs cluster stop node1
> node1: Stopping Cluster (pacemaker)...
> node1: Stopping Cluster (corosync)...
> 
> And from both machines, I can ping the floating IP address:
> 
> [root at node1 ~]# ping 192.168.56.9
> PING 192.168.56.9 (192.168.56.9) 56(84) bytes of data.
> 64 bytes from 192.168.56.9: icmp_seq=1 ttl=64 time=0.504 ms
> 64 bytes from 192.168.56.9: icmp_seq=2 ttl=64 time=0.750 ms
> ...
> 
> [root at node2 ~]# ping 192.168.56.9
> PING 192.168.56.9 (192.168.56.9) 56(84) bytes of data.
> 64 bytes from 192.168.56.9: icmp_seq=1 ttl=64 time=0.423 ms
> 64 bytes from 192.168.56.9: icmp_seq=2 ttl=64 time=0.096 ms
> ...
> 
> 
> So?

Now you can proceed with the "Add Apache HTTP" section. Once apache is
set up as a cluster resource, you should be able to contact the web
server at the floating IP (or more realistically whatever name you've
associated with that IP), and have the cluster fail over both the IP
address and web server as needed.

> On Wednesday, March 24, 2021, 02:41:44 AM GMT+4:30, Ken Gaillot <
> kgaillot at redhat.com> wrote: 
> 
> 
> 
> 
> 
> On Tue, 2021-03-23 at 20:15 +0000, Jason Long wrote:
> > Thanks.
> > The floating IP address must not use by other machines. I have two
> > VMs that using "192.168.57.6" and "192.168.57.7". Could the
> > floating
> > IP address be "192.168.57.8"?
> 
> Yes, if it's in the same subnet and not already in use by some other
> machine.
> 
> > Which part of my configuration is wrong? Why, when I disconnect
> > node1, then node2 doesn't replace it?
> 
> The first thing I would do is configure and test fencing. Once you're
> confident fencing is working, add the floating IP address. Make sure
> you can ping the floating IP address from some other machine. Then
> test
> fail-over and ensure you can still ping the floating IP. From there
> it
> should be straightforward.
> 
> 
> > 
> > 
> > 
> > 
> > 
> > On Wednesday, March 24, 2021, 12:33:53 AM GMT+4:30, Ken Gaillot <
> > kgaillot at redhat.com> wrote: 
> > 
> > 
> > 
> > 
> > 
> > On Tue, 2021-03-23 at 19:07 +0000, Jason Long wrote:
> > > Thanks, but I want to have a cluster with two nodes and nothing
> > > more!
> > 
> > The end result is to have 2 nodes with 3 IP addresses:
> > 
> > * The first node has a permanently assigned IP address that it
> > brings
> > up when it boots; this address is not managed by the cluster
> > 
> > * The second node also has a permanent address not managed by the
> > cluster
> > 
> > * A third, unused IP address from the same subnet is used as a
> > "floating" IP address, which means the cluster can sometimes run it
> > on
> > the first node and sometimes on the second node. This IP address is
> > the
> > one that users will use to contact the service.
> > 
> > That way, users always have a single address that they use, no
> > matter
> > which node is providing the service.
> > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On Tuesday, March 23, 2021, 07:59:57 PM GMT+4:30, Klaus Wenninger
> > > <
> > > kwenning at redhat.com> wrote: 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On 3/23/21 4:07 PM, Jason Long wrote:
> > > > Thank you.
> > > > Thus, where I must define my node2 IP address? When node1
> > > > disconnected, I want node2 replace it.
> > > > 
> > > 
> > > You just need a single IP address that you are assigning to the
> > > virtual 
> > > IP resource.
> > > And pacemaker is gonna move that IP address - along with the web-
> > > proxy - 
> > > between
> > > the 2 nodes.
> > > Of course node1 & node2 have IP addresses that are being used
> > > for 
> > > cluster-communication
> > > but they are totally independent (well maybe in the same subnet
> > > for
> > > a 
> > > simple setup)
> > > from the IP address your web-proxy is reachable at.
> > > 
> > > Klaus
> > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > On Tuesday, March 23, 2021, 01:03:39 PM GMT+4:30, Klaus
> > > > Wenninger
> > > > <
> > > > kwenning at redhat.com> wrote:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > On 3/23/21 9:13 AM, Jason Long wrote:
> > > > > Thank you.
> > > > > But: 
> > > > > https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch06.html
> > > > > ?
> > > > > 
> > > > > The floating IP address is: 
> > > > > https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_add_a_resource.html
> > > > > In the "Warning" written: "The chosen address must not
> > > > > already
> > > > > be
> > > > > in use on the network. Do not reuse an IP address one of the
> > > > > nodes already has configured.", what does it mean?
> > > > 
> > > > It means that if you would be using an IP that is already in
> > > > use
> > > > on your network - by one of your cluster-nodes or something
> > > > else
> > > > -
> > > > pacemaker would possibly activate that IP and you would have
> > > > a duplicate IP in your network.
> > > > Thus for the question below: Don't use the IP od node2 for
> > > > your floating IP.
> > > > 
> > > > Klaus
> > > > 
> > > > > In the below command, "IP" is the IP address of my node2?
> > > > > # pcs resource create ClusterIP
> > > > > ocf:heartbeat:IPaddr2 ip=192.168.122.120 cidr_netmask=32 op
> > > > > monitor interval=30s
> > > > > 
> > > > > If yes, then I must update it with below command?
> > > > > 
> > > > > # pcs resource update floating_ip ocf:heartbeat:IPaddr2
> > > > > ip="Node2
> > > > > IP" cidr_netmask=32 op monitor interval=30s
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > On Tuesday, March 23, 2021, 12:02:15 AM GMT+4:30, Ken Gaillot
> > > > > <
> > > > > kgaillot at redhat.com> wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > On Mon, 2021-03-22 at 17:31 +0000, Jason Long wrote:
> > > > > > Thank you.
> > > > > >     From chapter 1 to 6, I never saw anything about
> > > > > > configuring
> > > > > > the
> > > > > > floating IP address! Am I wrong?
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Chapter 6 should be "Create an Active/Passive Cluster", which
> > > > > adds a
> > > > > floating IP, then Chapter 7 is "Add Apache HTTP Server as a
> > > > > Cluster
> > > > > Service".
> > > > > 
> > > > > 
> > > > > 
> > > > > > On Monday, March 22, 2021, 07:06:47 PM GMT+4:30, Ken
> > > > > > Gaillot
> > > > > > <
> > > > > > kgaillot at redhat.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Mon, 2021-03-22 at 08:15 +0000, Jason Long wrote:
> > > > > > > Thank you.
> > > > > > > 
> > > > > > > My test lab use VirtualBox with two VMs as below:
> > > > > > > VM1: This VM has two NICs (NAT, Host-only Adapter)
> > > > > > > VM2: This VM has one NIC (Host-only Adapter)
> > > > > > > 
> > > > > > > On VM1, I use the NAT interface for the port forwarding:
> > > > > > > "127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on
> > > > > > > Guest.
> > > > > > > 
> > > > > > > 
> > > > > > > Yes, "systemctl" tell me:
> > > > > > > 
> > > > > > > # systemctl is-enabled httpd.service
> > > > > > > disabled
> > > > > > > 
> > > > > > > I rebooted my nodes and one of the problems solved:
> > > > > > > https://paste.ubuntu.com/p/7cQQtsXFPV/
> > > > > > > 
> > > > > > > I did:
> > > > > > > # pcs resource defaults resource-stickiness=100
> > > > > > > 
> > > > > > > 
> > > > > > > When I browse "127.0.0.1:2080" then it shows me "My Test
> > > > > > > Site
> > > > > > > -
> > > > > > > node1".
> > > > > > > 
> > > > > > > I have two problems:
> > > > > > > 
> > > > > > > 1- When I stopped the node1 VM and refresh the page then
> > > > > > > I
> > > > > > > can't
> > > > > > > see
> > > > > > > "My Test Site - node2"?
> > > > > > > 
> > > > > > > # pcs cluster stop node1
> > > > > > > node1: Stopping Cluster (pacemaker)...
> > > > > > > node1: Stopping Cluster (corosync)...
> > > > > > > 
> > > > > > > # pcs status
> > > > > > > Error: error running crm_mon, is pacemaker running?
> > > > > > > Could not connect to the CIB: Transport endpoint is not
> > > > > > > connected
> > > > > > > crm_mon: Error: cluster is not available on this node
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > pcs status doesn't test the web site, it shows the internal
> > > > > > cluster
> > > > > > status. Since the cluster isn't running on that node, it
> > > > > > can't
> > > > > > show
> > > > > > anything.
> > > > > > 
> > > > > > However the website is still active on the other node, and
> > > > > > reachable
> > > > > > from this node. You can confirm that by using wget or curl
> > > > > > with
> > > > > > the
> > > > > > public web site URL (the floating IP address).
> > > > > > 
> > > > > > > # pcs resource defaults
> > > > > > > Error: unable to get cib
> > > > > > > 
> > > > > > > 
> > > > > > > I think that it must forward my requests from node1 to
> > > > > > > node2
> > > > > > > automatically and I see "My Test Site - node2" message.
> > > > > > > 
> > > > > > > 
> > > > > > > 2- I start the node1 again, but when I browse "IP:80",
> > > > > > > then
> > > > > > > I
> > > > > > > can't
> > > > > > > see "My Test Site - node1" message.
> > > > > > > 
> > > > > > > # pcs cluster start node1
> > > > > > > node1: Starting Cluster...
> > > > > > > 
> > > > > > > 
> > > > > > > # pcs status
> > > > > > > Cluster name: mycluster
> > > > > > > Cluster Summary:
> > > > > > >       * Stack: corosync
> > > > > > >       * Current DC: node2 (version 2.0.5-10.fc33-
> > > > > > > ba59be7122)
> > > > > > > -
> > > > > > > partition
> > > > > > > with quorum
> > > > > > >       * Last updated: Mon Mar 22 12:26:10 2021
> > > > > > >       * Last change:  Mon Mar 22 12:08:02 2021 by root
> > > > > > > via
> > > > > > > cibadmin on
> > > > > > > node1
> > > > > > >       * 2 nodes configured
> > > > > > >       * 2 resource instances configured
> > > > > > > 
> > > > > > > Node List:
> > > > > > >       * Online: [ node1 node2 ]
> > > > > > > 
> > > > > > > Full List of Resources:
> > > > > > >       * WebSite    (ocf::heartbeat:apache):    Started
> > > > > > > node2
> > > > > > >       * ClusterIP    (ocf::heartbeat:IPaddr2):    Started
> > > > > > > node2
> > > > > > > 
> > > > > > > Daemon Status:
> > > > > > >       corosync: active/enabled
> > > > > > >       pacemaker: active/enabled
> > > > > > >       pcsd: active/enabled
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Logs are:
> > > > > > > https://paste.ubuntu.com/p/Yt4K2kPM7b/
> > > > > > > 
> > > > > > > 
> > > > > > > Thank you again.
> > > > > > > 
> > > > > > > 
> > > > > > > On Monday, March 22, 2021, 01:12:21 AM GMT+4:30, Reid
> > > > > > > Wahl
> > > > > > > <
> > > > > > > nwahl at redhat.com> wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Hi, Jason.
> > > > > > > 
> > > > > > > On Sun, Mar 21, 2021 at 5:21 AM Jason Long <
> > > > > > > hack3rcon at yahoo.com>
> > > > > > > wrote:
> > > > > > > > Hello,
> > > > > > > > I used "Clusters from Scratch" to configuration two
> > > > > > > > nodes.
> > > > > > > > I got
> > > > > > > > below error:
> > > > > > > > 
> > > > > > > > # pcs status
> > > > > > > > Cluster name: mycluster
> > > > > > > > Cluster Summary:
> > > > > > > >       * Stack: corosync
> > > > > > > >       * Current DC: node1 (version 2.0.5-10.fc33-
> > > > > > > > ba59be7122) -
> > > > > > > > partition with quorum
> > > > > > > >       * Last updated: Sun Mar 21 15:35:18 2021
> > > > > > > >       * Last change:  Sun Mar 21 15:29:38 2021 by root
> > > > > > > > via
> > > > > > > > cibadmin
> > > > > > > > on
> > > > > > > > node1
> > > > > > > >       * 2 nodes configured
> > > > > > > >       * 2 resource instances configured
> > > > > > > > 
> > > > > > > > Node List:
> > > > > > > >       * Online: [ node1 node2 ]
> > > > > > > > 
> > > > > > > > Full List of Resources:
> > > > > > > >       * WebSite    (ocf::heartbeat:apache):    Stopped
> > > > > > > >       * ClusterIP    (ocf::heartbeat:IPaddr2):   
> > > > > > > > Started
> > > > > > > > node1
> > > > > > > > 
> > > > > > > > Failed Resource Actions:
> > > > > > > >       * WebSite_start_0 on node1 'error' (1): call=6,
> > > > > > > > status='complete', exitreason='Failed to access httpd
> > > > > > > > status
> > > > > > > > page.', last-rc-change='2021-03-21 15:23:45 +03:30',
> > > > > > > > queued=0ms,
> > > > > > > > exec=1318ms
> > > > > > > >       * WebSite_start_0 on node2 'error' (1): call=6,
> > > > > > > > status='complete', exitreason='Failed to access httpd
> > > > > > > > status
> > > > > > > > page.', last-rc-change='2021-03-21 15:23:47 +03:30',
> > > > > > > > queued=0ms,
> > > > > > > > exec=1380ms
> > > > > > > > 
> > > > > > > > Daemon Status:
> > > > > > > >       corosync: active/enabled
> > > > > > > >       pacemaker: active/enabled
> > > > > > > >       pcsd: active/enabled
> > > > > > > > 
> > > > > > > > 
> > > > > > > > *********
> > > > > > > > I have some questions:
> > > > > > > > 
> > > > > > > > 1- In "Chapter 6. Add Apache HTTP Server as a Cluster
> > > > > > > > Service",
> > > > > > > > an
> > > > > > > > important note said:
> > > > > > > > "Do not enable the httpd service. Services that are
> > > > > > > > intended to
> > > > > > > > be
> > > > > > > > managed via the cluster software should never be
> > > > > > > > managed
> > > > > > > > by
> > > > > > > > the
> > > > > > > > OS.
> > > > > > > > It is often useful, however, to manually start the
> > > > > > > > service,
> > > > > > > > verify
> > > > > > > > that it works, then stop it again, before adding it to
> > > > > > > > the
> > > > > > > > cluster.
> > > > > > > > This allows you to resolve any non-cluster-related
> > > > > > > > problems
> > > > > > > > before
> > > > > > > > continuing. Since this is a simple example, we’ll skip
> > > > > > > > that
> > > > > > > > step
> > > > > > > > here."
> > > > > > > > 
> > > > > > > > If the Apache service is not enabled they how can I
> > > > > > > > connect
> > > > > > > > to it
> > > > > > > > via below command:
> > > > > > > >       
> > > > > > > > # wget -O - http://localhost/server-status
> > > > > > > > --2021-03-21 15:38:39--  http://localhost/server-status
> > > > > > > > Resolving localhost (localhost)... 127.0.0.1, ::1
> > > > > > > > Connecting to localhost (localhost)|127.0.0.1|:80...
> > > > > > > > failed:
> > > > > > > > Connection timed out.
> > > > > > > > Connecting to localhost (localhost)|::1|:80... failed:
> > > > > > > > Network is
> > > > > > > > unreachable.
> > > > > > > 
> > > > > > > Pacemaker starts the httpd service by starting the
> > > > > > > ocf:heartbeat:apache resource. The article is saying that
> > > > > > > the
> > > > > > > httpd.service systemd unit should not be enabled to start
> > > > > > > automatically at boot; it should only start when the
> > > > > > > cluster
> > > > > > > starts
> > > > > > > it. That is `systemctl is-enabled httpd.service` should
> > > > > > > print
> > > > > > > "disabled".
> > > > > > > 
> > > > > > > >       
> > > > > > > > 
> > > > > > > > 2- Below commands must be run on both nodes or just one
> > > > > > > > node?
> > > > > > > > 
> > > > > > > > # pcs resource create ClusterIP ocf:heartbeat:IPaddr2
> > > > > > > > ip="IP_That_Never_Used_In_The_Network" cidr_netmask=32
> > > > > > > > op
> > > > > > > > monitor
> > > > > > > > interval=30s
> > > > > > > > 
> > > > > > > > # pcs resource create WebSite ocf:heartbeat:apache
> > > > > > > > configfile=/etc/httpd/conf/httpd.conf statusurl="
> > > > > > > > http://localhost/server-status" op monitor interval=20s
> > > > > > > 
> > > > > > > Just one node.
> > > > > > > 
> > > > > > > >       
> > > > > > > > 
> > > > > > > > 3- Why "* WebSite    (ocf::heartbeat:apache):   
> > > > > > > > Stopped"
> > > > > > > > ?
> > > > > > > 
> > > > > > > The apache resource agent ran a command similar to `wget
> > > > > > > -O-
> > > > > > > -q -L
> > > > > > > --
> > > > > > > no-proxy --bind-address=127.0.0.1 <status_url>` and got
> > > > > > > an
> > > > > > > error.
> > > > > > > It
> > > > > > > tried this on a start operation on each node, and it
> > > > > > > failed
> > > > > > > on both
> > > > > > > nodes. When a resource fails to start on a given node,
> > > > > > > the
> > > > > > > default
> > > > > > > response is to prevent it from starting on that node
> > > > > > > again
> > > > > > > until
> > > > > > > the
> > > > > > > failure is cleared.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > >       
> > > > > > > > Logs are:
> > > > > > > > https://paste.ubuntu.com/p/MtkfXyRX4P/
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Thank you.
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > Manage your subscription:
> > > > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > > > 
> > > > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > > > > 
> > > 
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
> > 
> > 
-- 
Ken Gaillot <kgaillot at redhat.com>