[ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

Ken Gaillot kgaillot at redhat.com
Tue Jun 14 14:39:51 UTC 2016


On 06/14/2016 03:10 AM, Jeremy Voisin wrote:
> Hi all,
> 
>  
> 
> We actually have a 2 nodes cluster with corosync and pacemaker for
> httpd. We have 2 VIP configured.
> 
>  
> 
> Since we’ve added ModSecurity 2.9, httpd restart is very slow. So I
> increased the start / stop timeout. But sometimes, after logrotate the
> following error occurs :
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_300000 on node1 'not running' (7): call=26,
> status=complete, exitreason='none',
> 
>     last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> Here is the full output of crm_mon :
> 
> Last updated: Tue Jun 14 07:22:28 2016          Last change: Fri Jun 10
> 09:28:03 2016 by root via cibadmin on node1
> 
> Stack: corosync
> 
> Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
> quorum
> 
> 2 nodes and 4 resources configured
> 
>  
> 
> Online: [ node1 node2 ]
> 
>  
> 
> WebSite (systemd:httpd):        Started node1
> 
> Resource Group: WAFCluster
> 
>      VirtualIP  (ocf::heartbeat:IPaddr2):       Started node1
> 
>      MailMon    (ocf::heartbeat:MailTo):        Started node1
> 
>      VirtualIP2 (ocf::heartbeat:IPaddr2):       Started node1
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_300000 on node1 'not running' (7): call=26,
> status=complete, exitreason='none',
> 
>     last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> # pcs resource --full
> 
> Resource: WebSite (class=systemd type=httpd)
> 
>   Attributes: configfile=/etc/httpd/conf/httpd.conf
> statusurl=http://127.0.0.1/server-status monitor=1min
> 
>   Operations: monitor interval=300s (WebSite-monitor-interval-300s)
> 
>               start interval=0s timeout=300s (WebSite-start-interval-0s)
> 
>               stop interval=0s timeout=300s (WebSite-stop-interval-0s)
> 
> Group: WAFCluster
> 
>   Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
> 
>    Attributes: ip=195.70.7.74 cidr_netmask=27
> 
>    Operations: start interval=0s timeout=20s (VirtualIP-start-interval-0s)
> 
>                stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)
> 
>                monitor interval=30s (VirtualIP-monitor-interval-30s)
> 
>   Resource: MailMon (class=ocf provider=heartbeat type=MailTo)
> 
>    Attributes: email=system at dfi.ch
> 
>    Operations: start interval=0s timeout=10 (MailMon-start-interval-0s)
> 
>                stop interval=0s timeout=10 (MailMon-stop-interval-0s)
> 
>                monitor interval=10 timeout=10 (MailMon-monitor-interval-10)
> 
>   Resource: VirtualIP2 (class=ocf provider=heartbeat type=IPaddr2)
> 
>    Attributes: ip=195.70.7.75 cidr_netmask=27
> 
>    Operations: start interval=0s timeout=20s (VirtualIP2-start-interval-0s)
> 
>                stop interval=0s timeout=20s (VirtualIP2-stop-interval-0s)
> 
>                monitor interval=30s (VirtualIP2-monitor-interval-30s)
> 
>  
> 
>  
> 
> If I run /crm_resource –P/ the Failed Actions disappear.
> 
>  
> 
> How can I fix the monitor “not running” error ?
> 
>  
> 
> Thanks,
> 
> Jérémy

Why does logrotate cause the site to stop responding? Normally it's a
graceful restart, which shouldn't cause any interruptions.

Any solution will have to be in logrotate, to keep it from interrupting
service.

Personally, my preferred configuration is to make apache log to syslog
instead of its usual log file. You can even configure syslog to log it
to the usual file, so there's no major difference. Then, you don't need
a separate logrotate script for apache, it gets rotated with the system
log. That avoids having to restart apache, which for a busy site can be
a big deal. It also gives you the option of tying into syslog tools such
as remote logging.




More information about the Users mailing list