[Pacemaker] Failed start of a resource after a Debian upgrading

Wed Feb 22 08:46:27 EST 2012

Hello Dejan,
I have solved the problem described below by 
adding the parameter 

  statusurl="http://127.0.0.1/server-status"

to the apache resource and by replacing this line

  Listen 10.5.75.83:80

by the line

  Listen 80

in the file /etc/apache2/ports.conf. So we enabled
the Apache to listen on all IP addresses. Thank you.

Best regards,
Michal Vyoral

On Wed, Jan 25, 2012 at 05:53:26PM +0100, Dejan Muhamedagic wrote:
> On Wed, Jan 25, 2012 at 04:35:39PM +0100, Michal Vyoral wrote:
> > Hi Dejan,
> > 
> > On Tue, Jan 24, 2012 at 11:52:20PM +0100, Dejan Muhamedagic wrote:
> > > Hi,
> > > 
> > > On Tue, Jan 24, 2012 at 06:31:54PM +0100, Michal Vyoral wrote:
> > > > Hello,
> > > > we had a cluster of two nodes both running Debian 5.0, each with two resources
> > > > IPaddr2 and apache managed by pacemaker 1.0.9.1. After an upgrading of
> > > > one node from Debian 5.0 to 6.0 we have a problem to start the
> > > > apache resource on the upgraded node. Here are the details:
> > > > 
> > > > Versions of heartbeat and pacemaker before the upgrade:
> > > > pr-iso1:~# dpkg -l pacemaker heartbeat
> > > > Desired=Unknown/Install/Remove/Purge/Hold
> > > > | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> > > > |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
> > > > ||/ Name           Version        Description
> > > > +++-==============-==============-============================================
> > > > ii  heartbeat      1:3.0.3-2~bpo5 Subsystem for High-Availability Linux
> > > > ii  pacemaker      1.0.9.1+hg1562 HA cluster resource manager
> > > > 
> > > > Versions of heartbeat and pacemaker after the upgrade:
> > > > pr-iso2:~# dpkg -l pacemaker heartbeat
> > > > Desired=Unknown/Install/Remove/Purge/Hold
> > > > | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> > > > |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> > > > ||/ Name           Version        Description
> > > > +++-==============-==============-============================================
> > > > ii  heartbeat      1:3.0.3-2      Subsystem for High-Availability Linux
> > > > ii  pacemaker      1.0.9.1+hg1562 HA cluster resource manager
> > > > 
> > > > Status of the resources on the upgraded node:
> > > > pr-iso2:~# crm_mon
> > > > ============
> > > > Last updated: Tue Jan 24 10:14:12 2012
> > > > Stack: Heartbeat
> > > > Current DC: pr-iso2 (511079a9-0f71-4537-bdf9-07714b454441) - partition with quorum
> > > > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > > > 2 Nodes configured, unknown expected votes
> > > > 2 Resources configured.
> > > > ============
> > > > 
> > > > Online: [ pr-iso2 ]
> > > > OFFLINE: [ pr-iso1 ]
> > > > 
> > > > ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso2
> > > > 
> > > > Failed actions:
> > > >     RTWeb_start_0 (node=pr-iso2, call=7, rc=1, status=complete): unknown error
> > > > 
> > > > Status of the resources on the non upgraded node:
> > > > pr-iso1:~# crm_mon
> > > > ============
> > > > Last updated: Tue Jan 24 17:08:22 2012
> > > > Stack: Heartbeat
> > > > Current DC: pr-iso1 (014268aa-f234-4789-b4a1-0053cf4e61b9) - partition with quor
> > > > um
> > > > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > > > 2 Nodes configured, unknown expected votes
> > > > 2 Resources configured.
> > > > ============
> > > > 
> > > > Online: [ pr-iso1 pr-iso2 ]
> > > > 
> > > > ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso1
> > > > RTWeb   (ocf::heartbeat:apache):        Started pr-iso1
> > > > 
> > > > Configuration of the resources:
> > > > pr-iso1:~# crm configure show
> > > > node $id="014268aa-f234-4789-b4a1-0053cf4e61b9" pr-iso1
> > > > node $id="511079a9-0f71-4537-bdf9-07714b454441" pr-iso2
> > > > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> > > >         params ip="10.5.75.83" cidr_netmask="24" \
> > > >         op monitor interval="30s"
> > > > primitive RTWeb ocf:heartbeat:apache \
> > > >         params configfile="/etc/apache2/apache2.conf" \
> > > >         op monitor interval="1min" \
> > > >         meta target-role="Started" is-managed="true"
> > > > colocation website-with-ip inf: RTWeb ClusterIP
> > > > order rtweb_after_clustrip inf: ClusterIP RTWeb
> > > > property $id="cib-bootstrap-options" \
> > > >         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> > > >         cluster-infrastructure="Heartbeat" \
> > > >         stonith-enabled="false" \
> > > >         last-lrm-refresh="1327399494"
> > > > rsc_defaults $id="rsc-options" \
> > > >         resource-stickiness="100"
> > > > 
> > > > Records in the /var/log/ha-log related to RTWeb resource:
> > > > pr-iso2:~# grep RTWeb /var/log/ha-log
> > > > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing key=7:76:7:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_monitor_0 )
> > > > Jan 24 10:04:56 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:4: probe
> > > > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: process_lrm_event: LRM operation RTWeb_monitor_0 (call=4, rc=7, cib-update=13, confirmed=true) not running
> > > > Jan 24 10:12:48 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing key=11:77:0:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_start_0 )
> > > > Jan 24 10:12:48 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:7: start
> > > 
> > > After this message there should be a bit more (look for "apache"
> > > or "lrmd"). Next resource agents are going to log the resource
> > > name too (RTWeb in this case). If you cannot find anything here,
> > > then the answer must be in the apache logs.
> > > 
> > > Thanks,
> > > 
> > > Dejan
> > 
> > Yes, you are right: here are two more lines after the previous line:
> > 
> >   apache[9454]:   2012/01/24_10:12:49 INFO: apache not running
> >   apache[9454]:   2012/01/24_10:12:49 INFO: waiting for apache /etc/apache2/apache2.conf to come up
> 
> That's all?
> 
> > There are no records in /var/log/apache2/error.log giving some clue, see:
> > 
> >   pr-iso2:/var/log/apache2# cat error.log
> >   [Tue Jan 24 11:12:50 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations
> >   [Tue Jan 24 11:13:08 2012] [notice] caught SIGTERM, shutting down
> >   [Wed Jan 25 13:09:02 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations
> >   [Wed Jan 25 13:09:21 2012] [notice] caught SIGTERM, shutting down
> > 
> > See the interesting thing: our nodes shold use UTC time, but after the upgrade
> > we have noticed, that the time on the upgraded node is our local time (= UTC + 1)
> > I have return the system time back to UTC, but Apache still uses the local time in the log. 
> > 
> > We have tried to start the Apache on the upgraded node alone:
> > 
> > 1. we have modified the file /etc/apache2/ports2.conf to
> > Apache listen on the physical address
> > 2. we have run the command '/etc/init.d/apache2 start'
> > 3. we have download an index.html page
> > 
> > Here is the record in the error log:
> > 
> >  [Wed Jan 25 13:28:11 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations
> >  [Wed Jan 25 13:28:27 2012] [warn] [client 10.5.77.29] incomplete redirection target of '/rt/' for URI '/' modified to 'http://10.5.75.82/rt/'
> > 
> > So, Apache alone could run. 
> > 
> > Before the upgrade we have made some minor changes to apache2.conf
> > on the active node, but not on the passive node. We have return 
> > the changes back, but the resource is stil failed, see the tail from th ha-log
> > on the upgraded node:
> 
> [...]
> > Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: get_failcount: RTWeb has failed INFINITY times on pr-iso2
> 
> You need to cleanup the resource: crm resource cleanup RTWeb
> 
> Otherwise, I really cannot say what's wrong with your apache, but
> it's definitely resource specific. You can leave out the cluster
> and try to resolve the issue using ocf-tester. Also, it is
> necessary that the apache status module is enabled.
> 
> Thanks,
> 
> Dejan
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org