[Pacemaker] rsc_op: Hard error - res_Nagios_monitor_0 failedwith rc=6: Preventing res_Nagios from re-starting anywhere inthe cluster

Koch, Sebastian Sebastian.Koch at netzwerk.de
Thu Jun 24 07:54:40 EDT 2010


Hi,

thanks for your reply. It wasn't clear to me that pacemaker is issuing status commands in the background even on the passive node. The problem was, that on the passive node the symlinks tot he nagios configuration were broken cause the drbd was mounted on the other node. Therefore i just copied all needed configs to my /mnt/cluster/ dir and if the node is passive it can use the configs from there. If it gets active the drbd will be mounted on /mnt/cluster.

Do you have a better idea because it seems to me like a from-back-through-the-eye-into-the-chest solution and i would like to solve it in a more elegant way. Currently i have the same issue with ClusterMonitor because ist trying to write the html to /var/www but i symlinked this tot he cluster dir and therefore the status command fails on the passive node.

Thanks in advance.

Sebastian Koch
                                                         

NETZWERK GmbH

Fon:  +49.711.220 5498 81
Achtung neue Mobilfunknummer: +49.1522.299 6524
Fax:  +49.711.220 5499 77
Email: sebastian.koch at netzwerk.de
Web:  www.netzwerk.de
NETZWERK GmbH, Kurze Str. 40, 70794 Filderstadt-Bonlanden
Geschäftsführer: Siegfried Herner, Hans-Baldung Luley, Olaf Müller-Haberland
Sitz der Gesellschaft: Filderstadt-Bonlanden, Amtsgericht Stuttgart HRB 225547, WEEE-Reg Nr. DE 185 622 492

-----Ursprüngliche Nachricht-----
Von: Andrew Beekhof [mailto:andrew at beekhof.net] 
Gesendet: Donnerstag, 24. Juni 2010 09:19
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] rsc_op: Hard error - res_Nagios_monitor_0 failedwith rc=6: Preventing res_Nagios from re-starting anywhere inthe cluster

On Wed, Jun 23, 2010 at 5:19 PM, Koch, Sebastian
<Sebastian.Koch at netzwerk.de> wrote:
> Hi,
>
>
>
> i got a 2 Node Cluster up and running and right know i am trying to
> configure a Nagios3 Resource. Therefore i already fixed the nagios init
> script as it dind't pass the LSB Compatibility Checks as described here:
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html
>
>
>
> I just needed to make sure the pid file gets removed if the stop function is
> called. After this small change i passed all the LSB Checks. Below you find
> the error message:
>
>
>
> root at pilot01-node2:/var/run/nagios3# crm_verify -LV
>
> crm_verify[7094]: 2010/06/23_16:37:27 ERROR: unpack_rsc_op: Hard error -
> res_Nagios_monitor_0 failed with rc=6: Preventing res_Nagios from
> re-starting anywhere in the cluster

Looks like its still failing the fifth LSB check from the above url.
"Did the command print result: 3"

>
> crm_verify[7094]: 2010/06/23_16:37:27 WARN: native_color: Resource
> res_Nagios cannot run anywhere
>
> Warnings found during check: config may not be valid
>
>
>
> I tried to find out what the init scripts must provide for allowing it to
> use it in pacemaker but i just found the LSB Compatib. Hints on the
> pacemaker website. I think i configured the primitive wrong or maybe the
> init script is still wrong? Even if i configure it with a op monitor action
> it fails. And even a crm resource cleanup  res_Nagios doesn't help me
> starting the resource.
>
>
>
> I can run Nagios manually on the active node. I linked all shared
> directories to my cluster storage device like this:
>
>
>
> root at pilot01-node2:/etc# ll /var/lib/nagios3* /etc/nagios*
>
> lrwxrwxrwx 1 root   root    25 23. Jun 13:54 /etc/nagios3 ->
> /mnt/cluster/etc/nagios3/
>
> lrwxrwxrwx 1 root   root    29 23. Jun 14:04 /var/lib/nagios3 ->
> /mnt/cluster/var/lib/nagios3/
>
>
>
> /etc/nagios3_bak:
>
> insgesamt 88K
>
> drwxr-xr-x  4 root root    146 23. Jun 13:54 .
>
> drwxr-xr-x 75 root root   4,0K 23. Jun 17:08 ..
>
> -rw-r--r--  1 root root   1,9K 30. Jun 2009  apache2.conf
>
> -rw-r--r--  1 root root    11K 23. Jun 13:49 cgi.cfg
>
> -rw-r--r--  1 root root   2,4K  2. Jul 2009  commands.cfg
>
> drwxr-xr-x  2 root root   4,0K  7. Jun 19:16 conf.d
>
> -rw-r--r--  1 root root     20 23. Jun 13:49 htpasswd.users
>
> -rw-r--r--  1 root root    42K  2. Jul 2009  nagios.cfg
>
> -rw-r-----  1 root nagios 1,3K 30. Jun 2009  resource.cfg
>
> drwxr-xr-x  2 root root   4,0K  7. Jun 19:16 stylesheets
>
>
>
> /etc/nagios-plugins:
>
> insgesamt 12K
>
> drwxr-xr-x  3 root root   19  7. Jun 19:16 .
>
> drwxr-xr-x 75 root root 4,0K 23. Jun 17:08 ..
>
> drwxr-xr-x  2 root root 4,0K  7. Jun 19:16 config
>
>
>
> /var/lib/nagios3_bak:
>
> insgesamt 20K
>
> drwxr-x---  4 nagios nagios     47 23. Jun 14:02 .
>
> drwxr-xr-x 33 root   root     4,0K 23. Jun 14:04 ..
>
> -rw-------  1 nagios www-data  14K 23. Jun 14:02 retention.dat
>
> drwx------  2 nagios www-data    6  2. Jul 2009  rw
>
> drwxr-x---  3 nagios nagios     25  7. Jun 19:16 spool
>
>
>
> Here is my Config.
>
>
>
> ########################
>
> ### 3. Cluster State ###
>
> ########################
>
>
>
> ============
>
> Last updated: Wed Jun 23 17:16:33 2010
>
> Stack: openais
>
> Current DC: pilot01-node2 - partition with quorum
>
> Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75
>
> 2 Nodes configured, 2 expected votes
>
> 4 Resources configured.
>
> ============
>
>
>
> Node pilot01-node1: standby
>
> Online: [ pilot01-node2 ]
>
>
>
> Full list of resources:
>
>
>
>  Resource Group: grp_MySQL
>
>      res_Filesystem     (ocf::heartbeat:Filesystem):    Started
> pilot01-node2
>
>      res_ClusterIP      (ocf::heartbeat:IPaddr2):       Started
> pilot01-node2
>
>      res_MySQL  (lsb:mysql):    Started pilot01-node2
>
>      res_Apache (lsb:apache2):  Started pilot01-node2
>
>      res_ClusterMonitor (ocf::pacemaker:ClusterMon):    Started
> pilot01-node2
>
>      res_Nagios (lsb:nagios3):  Stopped
>
>  Master/Slave Set: ms_drbd_mysql0
>
>      Masters: [ pilot01-node2 ]
>
>      Stopped: [ drbd_pilot0:0 ]
>
>  Clone Set: cl-pinggw
>
>      Started: [ pilot01-node2 ]
>
>      Stopped: [ pinggw:0 ]
>
> Monitor-Cluster (ocf::pacemaker:ClusterMon):    Started pilot01-node1
> (unmanaged) FAILED
>
>
>
> Failed actions:
>
>     Monitor-Cluster_stop_0 (node=pilot01-node1, call=34, rc=1,
> status=complete): unknown error
>
>     res_Nagios_monitor_0 (node=pilot01-node1, call=84, rc=6,
> status=complete): not configured
>
> #########################
>
> ### 4. Cluster Config ###
>
> #########################
>
>
>
> node pilot01-node1 \
>
>         attributes standby="on"
>
> node pilot01-node2 \
>
>         attributes standby="off"
>
> primitive Monitor-Cluster ocf:pacemaker:ClusterMon \
>
>         params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \
>
>         params pidfile="/var/run/rlb-cluster-monitor.pid" \
>
>         op start interval="0" timeout="90s" \
>
>         op stop interval="0" timeout="100s"
>
> primitive drbd_pilot0 ocf:linbit:drbd \
>
>         params drbd_resource="pilot0" \
>
>         operations $id="drbd_pilot0-operations" \
>
>         op monitor interval="15s"
>
> primitive pinggw ocf:pacemaker:pingd \
>
>         params host_list="10.1.1.162" multiplier="200" \
>
>         op monitor interval="10s"
>
> primitive res_Apache lsb:apache2 \
>
>         operations $id="res_Apache-operations" \
>
>         op monitor interval="15s" timeout="20s" start-delay="15s"
>
> primitive res_ClusterIP ocf:heartbeat:IPaddr2 \
>
>         params iflabel="ClusterIP" ip="10.1.1.12" nic="eth0"
> cidr_netmask="24" \
>
>         operations $id="res_ClusterIP_1-operations" \
>
>         op monitor start-delay="0" interval="10s"
>
> primitive res_ClusterMonitor ocf:pacemaker:ClusterMon \
>
>         params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \
>
>         params pidfile="/var/run/rlb-cluster-monitor.pid" \
>
>         op start interval="0" timeout="90s" \
>
>         op stop interval="0" timeout="100s" \
>
>         meta target-role="Started"
>
> primitive res_Filesystem ocf:heartbeat:Filesystem \
>
>         params fstype="xfs" directory="/mnt/cluster" device="/dev/drbd0"
> options="noatime,nodiratime,barrier=0"
>
> primitive res_MySQL lsb:mysql
>
> primitive res_Nagios lsb:nagios3 \
>
>         operations $id="res_Nagios-operations" \
>
>         op monitor interval="15s" timeout="20s" \
>
>         meta target-role="Started"
>
> group grp_MySQL res_Filesystem res_ClusterIP res_MySQL res_Apache
> res_ClusterMonitor res_Nagios
>
> ms ms_drbd_mysql0 drbd_pilot0 \
>
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
>
> clone cl-pinggw pinggw \
>
>         meta globally-unique="false"
>
> location drbd-fence-by-handler-ms_drbd_mysql0 ms_drbd_mysql0 \
>
>         rule $id="drbd-fence-by-handler-rule-ms_drbd_mysql0" $role="Master"
> -inf: #uname ne pilot01-node2
>
> location grp_MySQL-with-pinggw grp_MySQL \
>
>         rule $id="grp_MySQL-with-pinggw-rule-1" -inf: not_defined pingd or
> pingd lte 0
>
> colocation col_drbd_on_mysql inf: grp_MySQL ms_drbd_mysql0:Master
>
> order mysql_after_drbd inf: ms_drbd_mysql0:promote grp_MySQL:start
>
> property $id="cib-bootstrap-options" \
>
>         expected-quorum-votes="2" \
>
>         stonith-enabled="false" \
>
>         no-quorum-policy="ignore" \
>
>         dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \
>
>         cluster-infrastructure="openais" \
>
>         last-lrm-refresh="1277306106" \
>
>         symmetric-cluster="true" \
>
>         migration-threshold="1" \
>
>         default-action-timeout="240s"
>
>
>
> Thanks for your help in advance.
>
> Sebastian
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list