[ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

Mon Mar 22 04:15:49 EDT 2021

Thank you.

My test lab use VirtualBox with two VMs as below:
VM1: This VM has two NICs (NAT, Host-only Adapter)
VM2: This VM has one NIC (Host-only Adapter)

On VM1, I use the NAT interface for the port forwarding: "127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on Guest.

Yes, "systemctl" tell me:

# systemctl is-enabled httpd.service
disabled

I rebooted my nodes and one of the problems solved:
https://paste.ubuntu.com/p/7cQQtsXFPV/

I did:
# pcs resource defaults resource-stickiness=100

When I browse "127.0.0.1:2080" then it shows me "My Test Site - node1".

I have two problems:

1- When I stopped the node1 VM and refresh the page then I can't see "My Test Site - node2"?

# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...

# pcs status
Error: error running crm_mon, is pacemaker running?
Could not connect to the CIB: Transport endpoint is not connected
crm_mon: Error: cluster is not available on this node

# pcs resource defaults
Error: unable to get cib

I think that it must forward my requests from node1 to node2 automatically and I see "My Test Site - node2" message.

2- I start the node1 again, but when I browse "IP:80", then I can't see "My Test Site - node1" message.

# pcs cluster start node1
node1: Starting Cluster...

# pcs status
Cluster name: mycluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.5-10.fc33-ba59be7122) - partition with quorum
  * Last updated: Mon Mar 22 12:26:10 2021
  * Last change:  Mon Mar 22 12:08:02 2021 by root via cibadmin on node1
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node1 node2 ]

Full List of Resources:
  * WebSite    (ocf::heartbeat:apache):     Started node2
  * ClusterIP    (ocf::heartbeat:IPaddr2):     Started node2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Logs are:
https://paste.ubuntu.com/p/Yt4K2kPM7b/

Thank you again.

On Monday, March 22, 2021, 01:12:21 AM GMT+4:30, Reid Wahl <nwahl at redhat.com> wrote: 

Hi, Jason.

On Sun, Mar 21, 2021 at 5:21 AM Jason Long <hack3rcon at yahoo.com> wrote:
> Hello,
> I used "Clusters from Scratch" to configuration two nodes. I got below error:
> 
> # pcs status
> Cluster name: mycluster
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: node1 (version 2.0.5-10.fc33-ba59be7122) - partition with quorum
>   * Last updated: Sun Mar 21 15:35:18 2021
>   * Last change:  Sun Mar 21 15:29:38 2021 by root via cibadmin on node1
>   * 2 nodes configured
>   * 2 resource instances configured
> 
> Node List:
>   * Online: [ node1 node2 ]
> 
> Full List of Resources:
>   * WebSite    (ocf::heartbeat:apache):     Stopped
>   * ClusterIP    (ocf::heartbeat:IPaddr2):     Started node1
> 
> Failed Resource Actions:
>   * WebSite_start_0 on node1 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.', last-rc-change='2021-03-21 15:23:45 +03:30', queued=0ms, exec=1318ms
>   * WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.', last-rc-change='2021-03-21 15:23:47 +03:30', queued=0ms, exec=1380ms
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> 
> *********
> I have some questions:
> 
> 1- In "Chapter 6. Add Apache HTTP Server as a Cluster Service", an important note said:
> "Do not enable the httpd service. Services that are intended to be managed via the cluster software should never be managed by the OS. It is often useful, however, to manually start the service, verify that it works, then stop it again, before adding it to the cluster. This allows you to resolve any non-cluster-related problems before continuing. Since this is a simple example, we’ll skip that step here."
> 
> If the Apache service is not enabled they how can I connect to it via below command: 
>  
> # wget -O - http://localhost/server-status
> --2021-03-21 15:38:39--  http://localhost/server-status
> Resolving localhost (localhost)... 127.0.0.1, ::1
> Connecting to localhost (localhost)|127.0.0.1|:80... failed: Connection timed out.
> Connecting to localhost (localhost)|::1|:80... failed: Network is unreachable.

Pacemaker starts the httpd service by starting the ocf:heartbeat:apache resource. The article is saying that the httpd.service systemd unit should not be enabled to start automatically at boot; it should only start when the cluster starts it. That is `systemctl is-enabled httpd.service` should print "disabled".

>  
> 
> 2- Below commands must be run on both nodes or just one node?
> 
> # pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip="IP_That_Never_Used_In_The_Network" cidr_netmask=32 op monitor interval=30s
> 
> # pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=20s

Just one node.

>  
> 
> 3- Why "* WebSite    (ocf::heartbeat:apache):     Stopped" ?

The apache resource agent ran a command similar to `wget -O- -q -L --no-proxy --bind-address=127.0.0.1 <status_url>` and got an error. It tried this on a start operation on each node, and it failed on both nodes. When a resource fails to start on a given node, the default response is to prevent it from starting on that node again until the failure is cleared.

>  
> Logs are:
> https://paste.ubuntu.com/p/MtkfXyRX4P/
> 
> 
> Thank you.
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/