[Pacemaker] Error starting Apache on 2 nodes cluster

Thu Nov 19 05:35:36 EST 2009

On Thu, Nov 19, 2009 at 2:39 AM, Luke Bigum <lbigum at iseek.com.au> wrote:

>  Angie,
>
>
>
> I can't tell exactly what's you've provided, can you post your CRM
> configuration (the output of 'crm configure show')? While you're at it, also
> provide ' crm_verify -LV' and 'crm_mon -fo1'.
>
>  Here are the outputs:
>
# crm configure show
node test1.localdomain
node test2.localdomain
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="10.0.0.102" cidr_netmask="255.255.255.0" \
        op monitor interval="10s"
primitive LoadBalancer lsb:haproxy \
        op monitor interval="10s"
primitive WebSite ocf:heartbeat:apache \
        params configfile="/etc/httpd/conf/httpd.conf" \
        op monitor interval="1min"
colocation LoadBalancer-with-ClusterIP inf: LoadBalancer ClusterIP
order LoadBalancer-after-ClusterIP inf: ClusterIP LoadBalancer
property $id="cib-bootstrap-options" \
        stonith-enabled="false" \
        expected-quorum-votes="2" \
        dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
        cluster-infrastructure="openais" \
        no-quorum-policy="ignore"

# crm_verify -VL
crm_verify[14263]: 2009/11/19_12:22:57 WARN: unpack_rsc_op: Processing
failed op WebSite_start_0 on test1.localdomain: unknown error
crm_verify[14263]: 2009/11/19_12:22:57 WARN: unpack_rsc_op: Processing
failed op WebSite_start_0 on test2.localdomain: unknown error
crm_verify[14263]: 2009/11/19_12:22:57 WARN: common_apply_stickiness:
Forcing WebSite away from test1.localdomain after 1000000 failures
(max=1000000)
crm_verify[14263]: 2009/11/19_12:22:57 WARN: common_apply_stickiness:
Forcing WebSite away from test2.localdomain after 1000000 failures
(max=1000000)
crm_verify[14263]: 2009/11/19_12:22:57 WARN: native_color: Resource WebSite
cannot run anywhere
Warnings found during check: config may not be valid

# crm_mon -fo1
============
Last updated: Thu Nov 19 12:29:41 2009
Stack: openais
Current DC: test1.localdomain - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ test1.localdomain test2.localdomain ]

ClusterIP       (ocf::heartbeat:IPaddr2):       Started test1.localdomain
LoadBalancer    (lsb:haproxy):  Started test1.localdomain

Operations:
* Node test1.localdomain:
   ClusterIP: migration-threshold=1000000
    + (4) start: rc=0 (ok)
    + (5) monitor: interval=10000ms rc=0 (ok)
   LoadBalancer: migration-threshold=1000000
    + (6) start: rc=0 (ok)
    + (7) monitor: interval=10000ms rc=0 (ok)
   WebSite: migration-threshold=1000000 fail-count=1000000
    + (9) start: rc=1 (unknown error)
    + (10) stop: rc=0 (ok)
* Node test2.localdomain:
   WebSite: migration-threshold=1000000 fail-count=1000000
    + (5) start: rc=1 (unknown error)
    + (6) stop: rc=0 (ok)

Failed actions:
    WebSite_start_0 (node=test1.localdomain, call=9, rc=1, status=complete):
unknown error
    WebSite_start_0 (node=test2.localdomain, call=5, rc=1, status=complete):
unknown error

This looks suspicious though:
>
>
>
> Nov 19 01:25:08 test2 crmd: [24251]: info: process_lrm_event: LRM operation
> WebServer_monitor_60000 (call=483, rc=-2, cib-update=0, confirmed=true)
> Cancelled unknown exec error
>
>
>
> Personally I'd start with the OCF RA and leave LSB:httpd alone. From the
> above error message, something inside lssb:httpd is returning -2, which is
> not a supported return code.
>
>
>
> Depending on how confident you are with shell scripts, you might find it
> helpful to eliminate Pacemaker from the equation and call the Resource Agent
> script yourself to debug problems manually, like so...
>
>  I'll be doing this and reporting you back.
>
> Disable your resource so Pacemaker doesn't interfere:
>
>
>
> crm_resource -r WebSite -m -p target-role -v stopped
>
>
>
> Then move into the RA directory and set a necessary environment variable:
>
>
>
> cd =/usr/lib/ocf/resource.d/heartbeat
>
> export OCF_ROOT=/usr/lib/ocf
>
>
>
> Start testing the apache RA, setting the only mandatory environment
> variable for ocf:heartbeat:apache :
>
>
>
> export OCF_RESKEY_configfile=/path/to/your/main/apache/config
>
> ./apache start
>
> echo $?
>
>
>
> That should echo "0" for success. Judging by your logs, you can start
> Apache but the monitor is failing:
>
>
>
> ./apache monitor
>
> echo $?
>
>
>
> If that doesn't echo "0", you might get a helpful error message explaining
> what's wrong. You might have to read through the apache script itself to
> figure out why it's failing. Finally test the 'stop' operation:
>
>
>
> ./apache stop
>
> echo $?
>
>
>
> Should echo "0" as well. If this all works for you, but the resource in
> Pacemaker is still not working, then it's probably something in your CIB
> (like a bad attribute), as you've just done pretty much exactly what
> Pacemaker will do.
>
>
>
> Let us know how you go.
>
Sure, I will. Thank you so much.

> Tod
>
> *Luke Bigum*
>
> *Systems Administrator*
>
>  (p) 1300 661 668
>
>  (f)  1300 661 540
>
> (e)  lbigum at iseek.com.au
>
> http://www.iseek.com.au
>
> Level 1, 100 Ipswich Road Woolloongabba QLD 4102
>
>
>
> [image: iseekbar.jpg]
>
>
>
> This e-mail and any files transmitted with it may contain confidential and
> privileged material for the sole use of the intended recipient. Any review,
> use, distribution or disclosure by others is strictly prohibited. If you are
> not the intended recipient (or authorised to receive for the recipient),
> please contact the sender by reply e-mail and delete all copies of this
> message.
>
>
>
>
>
> *From:* Angie T. Muhammad [mailto:angie.tawfik at gmail.com]
> *Sent:* Thursday 19 November 2009 9:57 AM
> *To:* pacemaker at oss.clusterlabs.org
> *Subject:* [Pacemaker] Error starting Apache on 2 nodes cluster
>
>
>
> Hello
> I'm a pacemaker and openais beginner.
> I followed the document 'cluster from scratch' and I successfully managed
> to create and monitor a 'ClusterIP' and 'LoadBalancer' resources.
>
> But, Whenever I try to start Apache:
> # crm configure primitive WebSite ocf:heartbeat:apache params
> configfile=/etc/httpd/conf/httpd.conf op monitor interval=1min
>
> whether using (ocf:heartbeat:apache) or (lsb::httpd) I get the following
> errors when watching crm_mon:
>
> ============
> Last updated: Thu Nov 19 01:38:33 2009
> Stack: openais
> Current DC: test1.localdomain - partition with quorum
> Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> ============
>
> Online: [ test1.localdomain test2.localdomain ]
>
> ClusterIP       (ocf::heartbeat:IPaddr2):       Started test1.localdomain
> LoadBalancer    (lsb:haproxy):  Started test1.localdomain
>
> Failed actions:
>     WebSite_start_0 (node=test1.localdomain, call=9, rc=1,
> status=complete): unknown error
>     WebSite_start_0 (node=test2.localdomain, call=5, rc=1,
> status=complete): unknown error
>
> /************************************************************************************************************/
>
> Knowing that I am using:
> CentOS 5.4..
> openais-0.80.5-15.1
> pacemaker-1.0.5-4.1
> # chkconfig httpd off
> server-status is not enabled in my httpd.conf ...
>
> I always check apache processes before configuring my crm using:
>
> # ps aux | grep httpd
> /* to make sure there are no zombie processes */
>
> # /etc/init.d/httpd status
> /* to gurantee it's stopped and nothing is locked */
>
> Last but not least I am ataching the *last 100 lines of my
> /var/log/messages* of the 2nd node to help you help me.
> I have been on this loop for four days now and I have no idea why the crm
> can't start apache though when manually starting it, everything runs
> smoothly!!!
>
> Thank you in advance
> --
> All the best,
> Angie
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>

-- 
All the best,
Angie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091119/753041df/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 3245 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091119/753041df/attachment-0001.jpg>