[ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

Tue Jun 14 14:55:28 UTC 2016

Hi,

Every action on httpd is very slow due to ModSecurity 2.9. The reload in
postrotate may take awhile.

Here is the output log for message this morning : 
Jun 14 03:43:05 mail-px-** crmd[2685]:  notice: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: On loss of CCM Quorum:
Ignore
Jun 14 03:43:05 mail-px-** pengine[2684]: warning: Processing failed op
monitor for WebSite on node1: not running (7)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Recover
WebSite#011(Started node1)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Calculated Transition
367: /var/lib/pacemaker/pengine/pe-input-173.bz2
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: On loss of CCM Quorum:
Ignore
Jun 14 03:43:05 mail-px-** pengine[2684]: warning: Processing failed op
monitor for WebSite on node1: not running (7)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Recover
WebSite#011(Started node1)
Jun 14 03:43:05 mail-px-** crmd[2685]:  notice: Initiating action 4: stop
WebSite_stop_0 on node1 (local)
Jun 14 03:43:05 mail-px-** systemd: Reloading.
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Calculated Transition
368: /var/lib/pacemaker/pengine/pe-input-174.bz2
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/fusioninventory-agent.service is marked executable.
Please remove executable permission bits. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/auditd.service is marked world-inaccessible. This
has no effect as configuration data is accessible via APIs without
restrictions. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/ebtables.service is marked executable. Please remove
executable permission bits. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Removed slice user-0.slice.
Jun 14 03:43:05 mail-px-** systemd: Stopping user-0.slice.
Jun 14 03:44:35 mail-px-** systemd: httpd.service stop-sigterm timed out.
Killing.
Jun 14 03:44:35 mail-px-** systemd: httpd.service: main process exited,
code=killed, status=9/KILL
Jun 14 03:44:35 mail-px-** systemd: Stopped The Apache HTTP Server.
Jun 14 03:44:35 mail-px-** systemd: Unit httpd.service entered failed state.
Jun 14 03:44:35 mail-px-** systemd: httpd.service failed.
Jun 14 03:44:37 mail-px-** crmd[2685]:  notice: Operation WebSite_stop_0: ok
(node=node1, call=29, rc=0, cib-update=464, confirmed=true)
Jun 14 03:44:37 mail-px-** crmd[2685]:  notice: Initiating action 10: start
WebSite_start_0 on node1 (local)
Jun 14 03:44:37 mail-px-** systemd: Reloading.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/fusioninventory-agent.service is marked executable.
Please remove executable permission bits. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/auditd.service is marked world-inaccessible. This
has no effect as configuration data is accessible via APIs without
restrictions. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/ebtables.service is marked executable. Please remove
executable permission bits. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/run/systemd/system/httpd.service.d/50-pacemaker.conf is marked
world-inaccessible. This has no effect as configuration data is accessible
via APIs without restrictions. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Starting Cluster Controlled httpd...
Jun 14 03:44:55 mail-px-** puppet-agent[1645]: Did not receive certificate
Jun 14 03:44:57 mail-px-** systemd: Started Cluster Controlled httpd.
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Operation WebSite_start_0:
ok (node=node1, call=30, rc=0, cib-update=465, confirmed=true)
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Initiating action 3: monitor
WebSite_monitor_300000 on node1 (local)
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Transition 368 (Complete=4,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-174.bz2): Complete
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]

The strange thing is that the problem is not present every logrotate...

Jérémy

-----Message d'origine-----
De : Ken Gaillot [mailto:kgaillot at redhat.com] 
Envoyé : mardi 14 juin 2016 16:40
À : users at clusterlabs.org
Objet : Re: [ClusterLabs] Processing failed op monitor for WebSite on node1:
not running (7)

On 06/14/2016 03:10 AM, Jeremy Voisin wrote:
> Hi all,
> 
>  
> 
> We actually have a 2 nodes cluster with corosync and pacemaker for 
> httpd. We have 2 VIP configured.
> 
>  
> 
> Since we’ve added ModSecurity 2.9, httpd restart is very slow. So I 
> increased the start / stop timeout. But sometimes, after logrotate the 
> following error occurs :
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_300000 on node1 'not running' (7): call=26, 
> status=complete, exitreason='none',
> 
>     last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> Here is the full output of crm_mon :
> 
> Last updated: Tue Jun 14 07:22:28 2016          Last change: Fri Jun 10
> 09:28:03 2016 by root via cibadmin on node1
> 
> Stack: corosync
> 
> Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with 
> quorum
> 
> 2 nodes and 4 resources configured
> 
>  
> 
> Online: [ node1 node2 ]
> 
>  
> 
> WebSite (systemd:httpd):        Started node1
> 
> Resource Group: WAFCluster
> 
>      VirtualIP  (ocf::heartbeat:IPaddr2):       Started node1
> 
>      MailMon    (ocf::heartbeat:MailTo):        Started node1
> 
>      VirtualIP2 (ocf::heartbeat:IPaddr2):       Started node1
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_300000 on node1 'not running' (7): call=26, 
> status=complete, exitreason='none',
> 
>     last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> # pcs resource --full
> 
> Resource: WebSite (class=systemd type=httpd)
> 
>   Attributes: configfile=/etc/httpd/conf/httpd.conf
> statusurl=http://127.0.0.1/server-status monitor=1min
> 
>   Operations: monitor interval=300s (WebSite-monitor-interval-300s)
> 
>               start interval=0s timeout=300s 
> (WebSite-start-interval-0s)
> 
>               stop interval=0s timeout=300s (WebSite-stop-interval-0s)
> 
> Group: WAFCluster
> 
>   Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
> 
>    Attributes: ip=195.70.7.74 cidr_netmask=27
> 
>    Operations: start interval=0s timeout=20s 
> (VirtualIP-start-interval-0s)
> 
>                stop interval=0s timeout=20s 
> (VirtualIP-stop-interval-0s)
> 
>                monitor interval=30s (VirtualIP-monitor-interval-30s)
> 
>   Resource: MailMon (class=ocf provider=heartbeat type=MailTo)
> 
>    Attributes: email=system at dfi.ch
> 
>    Operations: start interval=0s timeout=10 
> (MailMon-start-interval-0s)
> 
>                stop interval=0s timeout=10 (MailMon-stop-interval-0s)
> 
>                monitor interval=10 timeout=10 
> (MailMon-monitor-interval-10)
> 
>   Resource: VirtualIP2 (class=ocf provider=heartbeat type=IPaddr2)
> 
>    Attributes: ip=195.70.7.75 cidr_netmask=27
> 
>    Operations: start interval=0s timeout=20s 
> (VirtualIP2-start-interval-0s)
> 
>                stop interval=0s timeout=20s 
> (VirtualIP2-stop-interval-0s)
> 
>                monitor interval=30s (VirtualIP2-monitor-interval-30s)
> 
>  
> 
>  
> 
> If I run /crm_resource –P/ the Failed Actions disappear.
> 
>  
> 
> How can I fix the monitor “not running” error ?
> 
>  
> 
> Thanks,
> 
> Jérémy

Why does logrotate cause the site to stop responding? Normally it's a
graceful restart, which shouldn't cause any interruptions.

Any solution will have to be in logrotate, to keep it from interrupting
service.

Personally, my preferred configuration is to make apache log to syslog
instead of its usual log file. You can even configure syslog to log it to
the usual file, so there's no major difference. Then, you don't need a
separate logrotate script for apache, it gets rotated with the system log.
That avoids having to restart apache, which for a busy site can be a big
deal. It also gives you the option of tying into syslog tools such as remote
logging.

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6041 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160614/f90cf957/attachment-0002.p7s>