[Pacemaker] Debian Unstable (sid) Problem with Pacemaker/Corosync Apache HA-Load Balanced cluster

Tue Oct 11 20:43:37 EDT 2011

I'd be checking your apache logs, my guess is that it doesn't like the config.
Or see where/why the Apache RA could be returning 1.

On Mon, Oct 3, 2011 at 5:58 PM, Miltiadis Koutsokeras
<m.koutsokeras at biovista.com> wrote:
> Hi again,
>
> I have gathered all interesting config and log files to a single archive.
> See the attachment. Thanks in advance for any help/advise.
>
> Miltos
>
> On 10/02/2011 06:19 PM, Miltiadis Koutsokeras wrote:
>>
>> Hi Nick,
>>
>> Here is the output of the "crm configure show":
>>
>> node node-0
>> node node-1
>> primitive Apache2 ocf:heartbeat:apache \
>>    params configfile="/etc/apache2/apache2.conf" \
>>    op monitor interval="1min" \
>>    meta target-role="Started"
>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>    params ip="192.168.0.100" cidr_netmask="32" \
>>    op monitor interval="30s" \
>>    meta target-role="Started"
>> colocation Apache2-ClusterIP-colocation inf: Apache2 ClusterIP
>> order Apache2-after-ClusterIP inf: ClusterIP Apache2
>> property $id="cib-bootstrap-options" \
>>    dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>>    cluster-infrastructure="openais" \
>>    expected-quorum-votes="2" \
>>    stonith-enabled="false" \
>>    no-quorum-policy="ignore"
>> rsc_defaults $id="rsc-options" \
>>    resource-stickiness="100"
>>
>> If you wish anything else, please feel free to ask.
>>
>> On 10/01/2011 02:50 PM, Nick Khamis wrote:
>>>
>>> Can you post your crm please.
>>>
>>> Nick.
>>>
>>> On Sat, Oct 1, 2011 at 6:32 AM, Miltiadis Koutsokeras
>>> <m.koutsokeras at biovista.com>  wrote:
>>>>
>>>> Hello everyone,
>>>>
>>>> My goal is to build a Round Robin balanced, HA Apache Web server
>>>> cluster.
>>>> The
>>>> main purpose is to balance HTTP requests evenly between the nodes and
>>>> have
>>>> one
>>>> machine pickup all requests if and ONLY if the others are not available
>>>> at
>>>> the
>>>> moment. The cluster will be accessible only from internal network. Any
>>>> advise on
>>>> this will be highly appreciated (resources to use, services to install
>>>> and
>>>> configure etc.). After walking through ClusterLabs documentation, I
>>>> think
>>>> the
>>>> proper deployment is an active/active Pacemaker managed cluster.
>>>>
>>>> I'm trying to follow the "Cluster from scratch" article in order to
>>>> build a
>>>> 2
>>>> node cluster on an experimental setup:
>>>>
>>>> 2 GNU/Linux Debian Unstable (sid) Virtual Machines (Kernel
>>>> 3.0.0-1-686-pae,
>>>> Apache/2.2.21 (Debian)) on same LAN network.
>>>>
>>>> node-0 IP: 192.168.0.101
>>>> node-1 IP: 192.168.0.102
>>>> Desired Cluster Virtual IP: 192.168.0.100
>>>>
>>>> The two nodes are setup to communicate with proper SSH keys and it works
>>>> flawlessly. Also they can communicate with short names:
>>>>
>>>> root at node-0:~# ssh node-1 -- hostname
>>>> node-1
>>>>
>>>> root at node-1:~# ssh node-0 -- hostname
>>>> node-0
>>>>
>>>> My problem is that although I've reached the part where you have the
>>>> ClusterIP
>>>> resource setup properly, the Apache resource does not get started in
>>>> either
>>>> node. The logs do not have a message explaining the failure in detail,
>>>> even
>>>> with
>>>> debug messages enabled. All related messages report unknown errors while
>>>> trying
>>>> to start the service and after a while the cluster manager gives up.
>>>> From
>>>> the
>>>> messages it seems like the manager is getting unexpected exit codes from
>>>> the
>>>> Apache resource. The server-status URL is accessible from 127.0.0.1 in
>>>> both
>>>> nodes.
>>>>
>>>> root at node-0:~# crm_mon -1
>>>> ============
>>>> Last updated: Fri Sep 30 14:04:55 2011
>>>> Stack: openais
>>>> Current DC: node-1 - partition with quorum
>>>> Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
>>>> 2 Nodes configured, 2 expected votes
>>>> 2 Resources configured.
>>>> ============
>>>>
>>>> Online: [ node-1 node-0 ]
>>>>
>>>>  ClusterIP    (ocf::heartbeat:IPaddr2):    Started node-1
>>>>
>>>> Failed actions:
>>>>    Apache2_monitor_0 (node=node-0, call=3, rc=1, status=complete):
>>>> unknown
>>>> error
>>>>    Apache2_start_0 (node=node-0, call=5, rc=1, status=complete): unknown
>>>> error
>>>>    Apache2_monitor_0 (node=node-1, call=8, rc=1, status=complete):
>>>> unknown
>>>> error
>>>>    Apache2_start_0 (node=node-1, call=10, rc=1, status=complete):
>>>> unknown
>>>> error
>>>>
>>>> Let's checkout the logs for this resource:
>>>>
>>>> root at node-0:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
>>>> (Nothing)
>>>>
>>>> root at node-0:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
>>>> Sep 30 14:04:23 node-0 lrmd: [2555]: WARN: Managed Apache2:monitor
>>>> process
>>>> 2802 exited with return code 1.
>>>> Sep 30 14:04:30 node-0 lrmd: [2555]: WARN: Managed Apache2:start process
>>>> 2942 exited with return code 1.
>>>>
>>>> root at node-1:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
>>>> Sep 30 14:04:23 node-1 pengine: [1676]: ERROR: native_create_actions:
>>>> Resource Apache2 (ocf::apache) is active on 2 nodes attempting recovery
>>>>
>>>> root at node-1:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
>>>> Sep 30 14:04:23 node-1 lrmd: [1674]: WARN: Managed Apache2:monitor
>>>> process
>>>> 3006 exited with return code 1.
>>>> Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 5
>>>> (Apache2_monitor_0) on node-1 failed (target: 7 vs. rc: 1): Error
>>>> Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 7
>>>> (Apache2_monitor_0) on node-0 failed (target: 7 vs. rc: 1): Error
>>>> Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>>>> Sep 30 14:04:30 node-1 crmd: [1677]: WARN: status_from_rc: Action 10
>>>> (Apache2_start_0) on node-0 failed (target: 0 vs. rc: 1): Error
>>>> Sep 30 14:04:30 node-1 crmd: [1677]: WARN: update_failcount: Updating
>>>> failcount for Apache2 on node-0 after failed start: rc=1
>>>> (update=INFINITY,
>>>> time=1317380670)
>>>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>>>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>>>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>>>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>>>> Sep 30 14:04:36 node-1 lrmd: [1674]: WARN: Managed Apache2:start process
>>>> 3146 exited with return code 1.
>>>> Sep 30 14:04:36 node-1 crmd: [1677]: WARN: status_from_rc: Action 9
>>>> (Apache2_start_0) on node-1 failed (target: 0 vs. rc: 1): Error
>>>> Sep 30 14:04:36 node-1 crmd: [1677]: WARN: update_failcount: Updating
>>>> failcount for Apache2 on node-1 after failed start: rc=1
>>>> (update=INFINITY,
>>>> time=1317380676)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-1: unknown error (1)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-0: unknown error (1)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-1: unknown error (1)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
>>>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>>>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>>>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-0: unknown error (1)
>>>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>>>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-1: unknown error (1)
>>>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
>>>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>>>> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>>>> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>>>> failed op Apache2_start_0 on node-1: unknown error (1)
>>>> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
>>>> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>>>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>>>>
>>>> Any suggestions?
>>>>
>>>> File /etc/corosync/corosync.conf (Only changes here , see attached for
>>>> full
>>>> file)
>>>>
>>>> # Please read the openais.conf.5 manual page
>>>>
>>>> totem {
>>>>
>>>> ... (Default)
>>>>
>>>>     interface {
>>>>        # The following values need to be set based on your environment
>>>>        ringnumber: 0
>>>>        bindnetaddr: 192.168.0.0
>>>>        mcastaddr: 226.94.1.1
>>>>        mcastport: 5405
>>>>    }
>>>> }
>>>>
>>>> ... (Default)
>>>>
>>>> service {
>>>>     # Load the Pacemaker Cluster Resource Manager
>>>>     ver:       1
>>>>     name:      pacemaker
>>>> }
>>>>
>>>> ... (Default)
>>>>
>>>> logging {
>>>>        fileline: off
>>>>        to_stderr: no
>>>>        to_logfile: yes
>>>>        logfile: /var/log/corosync/corosync.log
>>>>        to_syslog: no
>>>>        syslog_facility: daemon
>>>>        debug: on
>>>>        timestamp: on
>>>>        logger_subsys {
>>>>                subsys: AMF
>>>>                debug: off
>>>>                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
>>>>        }
>>>> }
>>>>
>>>> --
>>>> Koutsokeras Miltiadis M.Sc.
>>>> Software Engineer
>>>> Biovista Inc.
>>>>
>>>> US Offices
>>>> 2421 Ivy Road
>>>> Charlottesville, VA 22903
>>>> USA
>>>> T: +1.434.971.1141
>>>> F: +1.434.971.1144
>>>>
>>>> European Offices
>>>> 34 Rodopoleos Street
>>>> Ellinikon, Athens 16777
>>>> GREECE
>>>> T: +30.210.9629848
>>>> F: +30.210.9647606
>>>>
>>>> www.biovista.com
>>>>
>>>> Biovista is a privately held biotechnology company that finds novel uses
>>>> for
>>>> existing drugs, and profiles their side effects using their mechanism of
>>>> action. Biovista develops its own pipeline of drugs in CNS, oncology,
>>>> auto-immune and rare diseases. Biovista is collaborating with
>>>> biopharmaceutical companies on indication expansion and de-risking of
>>>> their
>>>> portfolios and with the FDA on adverse event prediction.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs:
>>>>
>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>
>
> --
> Koutsokeras Miltiadis M.Sc.
> Software Engineer
> Biovista Inc.
>
> US Offices
> 2421 Ivy Road
> Charlottesville, VA 22903
> USA
> T: +1.434.971.1141
> F: +1.434.971.1144
>
> European Offices
> 34 Rodopoleos Street
> Ellinikon, Athens 16777
> GREECE
> T: +30.210.9629848
> F: +30.210.9647606
>
> www.biovista.com
>
> Biovista is a privately held biotechnology company that finds novel uses for
> existing drugs, and profiles their side effects using their mechanism of
> action. Biovista develops its own pipeline of drugs in CNS, oncology,
> auto-immune and rare diseases. Biovista is collaborating with
> biopharmaceutical companies on indication expansion and de-risking of their
> portfolios and with the FDA on adverse event prediction.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>