[Pacemaker] Debian Unstable (sid) Problem with Pacemaker/Corosync Apache HA-Load Balanced cluster

Sat Oct 1 10:32:56 UTC 2011

Hello everyone,

My goal is to build a Round Robin balanced, HA Apache Web server 
cluster. The
main purpose is to balance HTTP requests evenly between the nodes and 
have one
machine pickup all requests if and ONLY if the others are not available 
at the
moment. The cluster will be accessible only from internal network. Any 
advise on
this will be highly appreciated (resources to use, services to install and
configure etc.). After walking through ClusterLabs documentation, I 
think the
proper deployment is an active/active Pacemaker managed cluster.

I'm trying to follow the "Cluster from scratch" article in order to 
build a 2
node cluster on an experimental setup:

2 GNU/Linux Debian Unstable (sid) Virtual Machines (Kernel 3.0.0-1-686-pae,
Apache/2.2.21 (Debian)) on same LAN network.

node-0 IP: 192.168.0.101
node-1 IP: 192.168.0.102
Desired Cluster Virtual IP: 192.168.0.100

The two nodes are setup to communicate with proper SSH keys and it works
flawlessly. Also they can communicate with short names:

root at node-0:~# ssh node-1 -- hostname
node-1

root at node-1:~# ssh node-0 -- hostname
node-0

My problem is that although I've reached the part where you have the 
ClusterIP
resource setup properly, the Apache resource does not get started in either
node. The logs do not have a message explaining the failure in detail, 
even with
debug messages enabled. All related messages report unknown errors while 
trying
to start the service and after a while the cluster manager gives up. 
 From the
messages it seems like the manager is getting unexpected exit codes from 
the
Apache resource. The server-status URL is accessible from 127.0.0.1 in 
both nodes.

root at node-0:~# crm_mon -1
============
Last updated: Fri Sep 30 14:04:55 2011
Stack: openais
Current DC: node-1 - partition with quorum
Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ node-1 node-0 ]

  ClusterIP    (ocf::heartbeat:IPaddr2):    Started node-1

Failed actions:
     Apache2_monitor_0 (node=node-0, call=3, rc=1, status=complete): 
unknown error
     Apache2_start_0 (node=node-0, call=5, rc=1, status=complete): 
unknown error
     Apache2_monitor_0 (node=node-1, call=8, rc=1, status=complete): 
unknown error
     Apache2_start_0 (node=node-1, call=10, rc=1, status=complete): 
unknown error

Let's checkout the logs for this resource:

root at node-0:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
(Nothing)

root at node-0:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
Sep 30 14:04:23 node-0 lrmd: [2555]: WARN: Managed Apache2:monitor 
process 2802 exited with return code 1.
Sep 30 14:04:30 node-0 lrmd: [2555]: WARN: Managed Apache2:start process 
2942 exited with return code 1.

root at node-1:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
Sep 30 14:04:23 node-1 pengine: [1676]: ERROR: native_create_actions: 
Resource Apache2 (ocf::apache) is active on 2 nodes attempting recovery

root at node-1:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
Sep 30 14:04:23 node-1 lrmd: [1674]: WARN: Managed Apache2:monitor 
process 3006 exited with return code 1.
Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 5 
(Apache2_monitor_0) on node-1 failed (target: 7 vs. rc: 1): Error
Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 7 
(Apache2_monitor_0) on node-0 failed (target: 7 vs. rc: 1): Error
Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-0: unknown error (1)
Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-1: unknown error (1)
Sep 30 14:04:30 node-1 crmd: [1677]: WARN: status_from_rc: Action 10 
(Apache2_start_0) on node-0 failed (target: 0 vs. rc: 1): Error
Sep 30 14:04:30 node-1 crmd: [1677]: WARN: update_failcount: Updating 
failcount for Apache2 on node-0 after failed start: rc=1 
(update=INFINITY, time=1317380670)
Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-0: unknown error (1)
Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-0: unknown error (1)
Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-1: unknown error (1)
Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-0: unknown error (1)
Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-0: unknown error (1)
Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-1: unknown error (1)
Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
Sep 30 14:04:36 node-1 lrmd: [1674]: WARN: Managed Apache2:start process 
3146 exited with return code 1.
Sep 30 14:04:36 node-1 crmd: [1677]: WARN: status_from_rc: Action 9 
(Apache2_start_0) on node-1 failed (target: 0 vs. rc: 1): Error
Sep 30 14:04:36 node-1 crmd: [1677]: WARN: update_failcount: Updating 
failcount for Apache2 on node-1 after failed start: rc=1 
(update=INFINITY, time=1317380676)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-0: unknown error (1)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-0: unknown error (1)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-1: unknown error (1)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-1: unknown error (1)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-0: unknown error (1)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-0: unknown error (1)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-1: unknown error (1)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-1: unknown error (1)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-0: unknown error (1)
Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-0: unknown error (1)
Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-1: unknown error (1)
Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-1: unknown error (1)
Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_monitor_0 on node-1: unknown error (1)
Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing 
failed op Apache2_start_0 on node-1: unknown error (1)
Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness: 
Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)

Any suggestions?

File /etc/corosync/corosync.conf (Only changes here , see attached for 
full file)

# Please read the openais.conf.5 manual page

totem {

... (Default)

      interface {
         # The following values need to be set based on your environment
         ringnumber: 0
         bindnetaddr: 192.168.0.0
         mcastaddr: 226.94.1.1
         mcastport: 5405
     }
}

... (Default)

service {
      # Load the Pacemaker Cluster Resource Manager
      ver:       1
      name:      pacemaker
}

... (Default)

logging {
         fileline: off
         to_stderr: no
         to_logfile: yes
         logfile: /var/log/corosync/corosync.log
         to_syslog: no
         syslog_facility: daemon
         debug: on
         timestamp: on
         logger_subsys {
                 subsys: AMF
                 debug: off
                 tags: enter|leave|trace1|trace2|trace3|trace4|trace6
         }
}

-- 
Koutsokeras Miltiadis M.Sc.
Software Engineer
Biovista Inc.

US Offices
2421 Ivy Road
Charlottesville, VA 22903
USA
T: +1.434.971.1141
F: +1.434.971.1144

European Offices
34 Rodopoleos Street
Ellinikon, Athens 16777
GREECE
T: +30.210.9629848
F: +30.210.9647606

www.biovista.com

Biovista is a privately held biotechnology company that finds novel uses for existing drugs, and profiles their side effects using their mechanism of action. Biovista develops its own pipeline of drugs in CNS, oncology, auto-immune and rare diseases. Biovista is collaborating with biopharmaceutical companies on indication expansion and de-risking of their portfolios and with the FDA on adverse event prediction.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: corosync.conf
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111001/f47af86f/attachment-0001.conf>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: node-0.apache2-server-status.txt
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111001/f47af86f/attachment-0006.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: node-1.apache2-server-status.txt
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111001/f47af86f/attachment-0007.txt>