[Pacemaker] Single Node Cluster and Resource Management

ant1spamz-pacemaker at yahoo.com ant1spamz-pacemaker at yahoo.com
Tue Dec 7 07:24:40 EST 2010


Hi there,

I have a requirement to make a single node cluster primarily for resource 
monitoring on the local node so that network load balancing from my front end 
load balancers works correctly and the node in question fails out due to either 
my public or private interface or both interfaces fail (typical OR Truth Table)

my NLB has the following setup

2 front end LB's with a failover IP between them and direct routing to my nodes 
public interface with monitoring on the private interface

my nodes

one public interface and one private interface,  this is how things are and I 
cant change it.

=================
setup
===========

pingd to my LB1 - 192.168.0.68
pingd to my LB2 (represents a "public" ping destination) - 192.168.0.69

location constraint to fail if either one of the ping times out

now on startup everything is ok, apache launches along with my 2 pingd, the fail 
constraint works as well

================
the problem
=============

now when I simulate a network failure (iptables -s web1.testcluster -j DROP) 
apache is correctly failed.  When pingd re-establishes connection the 
Apache constraint must be reversed and Apache simply started.

how do I achieve the automatic resource restart? 
Could my monitor constraint on the apache resource be in conflict with pingd? 
am I simply missing a "recovery" constraint to start the service? 
is my location constraint not correctly done?  
Other?

possibly this: Resource apache cannot run anywhere (what this means I have no 
idea)

icmp is ok

Last updated: Tue Dec  7 07:09:11 2010
Stack: Heartbeat
Current DC: web1.testcluster (ae391b6f-176d-43bc-93b4-8104ff3414c8) - partition 
with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
1 Nodes configured, unknown expected votes
3 Resources configured.
============

Online: [ web1.testcluster ]

 pingdnet1(ocf::pacemaker:pingd):Started web1.testcluster
 pingdnet2(ocf::pacemaker:pingd):Started web1.testcluster
crm(live)# Ctrl-C, leaving

[root at web1 ~]# date
Tue Dec  7 07:13:11 EST 2010

[root at web1 ~]# ping 192.168.0.69
PING 192.168.0.69 (192.168.0.69) 56(84) bytes of data.
64 bytes from 192.168.0.69: icmp_seq=1 ttl=64 time=0.104 ms

--- 192.168.0.69 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.104/0.104/0.104/0.000 ms
[root at web1 ~]# ping 192.168.0.68
PING 192.168.0.68 (192.168.0.68) 56(84) bytes of data.
64 bytes from 192.168.0.68: icmp_seq=1 ttl=64 time=0.151 ms

--- 192.168.0.68 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.151/0.151/0.151/0.000 ms

================
conf
====
primitive pingdnet1 ocf:pacemaker:pingd params host_list=192.168.0.69 
name=pingdnet1 op monitor interval=15s timeout=5s
primitive pingdnet2 ocf:pacemaker:pingd params host_list=192.168.0.68 
name=pingdnet2 op monitor interval=15s timeout=5s
primitive apache lsb::httpd op monitor interval=15s

location apache-ping-constraint apache rule -inf: not_defined pingdnet1 or 
pingdnet1 lte 0
location apache-ping-constraint2 apache rule -inf: not_defined pingdnet2 or 
pingdnet2 lte 0

order ping-then-apache inf: pingdnet1 pingdnet2 apache
===============================================
logs to help
======================
Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) 
Starting httpd: 
Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) [
Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)   OK  
Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) ]
Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) 
Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)  
Dec  7 06:39:41 web1 lrmd: [2471]: info: Managed apache:start process 10650 
exited with return code 0.
Dec  7 06:39:41 web1 crmd: [2474]: info: process_lrm_event: LRM operation 
apache_start_0 (call=25, rc=0, cib-update=196, confirmed=true) ok
Dec  7 06:39:41 web1 crmd: [2474]: info: match_graph_event: Action 
apache_start_0 (14) confirmed on web1.testcluster (rc=0)
Dec  7 06:39:41 web1 crmd: [2474]: info: te_rsc_command: Initiating action 15: 
monitor apache_monitor_15000 on web1.testcluster (local)
Dec  7 06:39:41 web1 crmd: [2474]: info: do_lrm_rsc_op: Performing 
key=15:34:0:02fb0ab7-1384-4125-b14a-0ab5b4e9d1e8 op=apache_monitor_15000 )
Dec  7 06:39:41 web1 lrmd: [2471]: info: rsc:apache:26: monitor
Dec  7 06:39:41 web1 crmd: [2474]: info: te_pseudo_action: Pseudo action 3 fired 
and confirmed
Dec  7 06:39:41 web1 lrmd: [2471]: info: Managed apache:monitor process 10666 
exited with return code 0.
Dec  7 06:39:41 web1 crmd: [2474]: info: process_lrm_event: LRM operation 
apache_monitor_15000 (call=26, rc=0, cib-update=197, confirmed=false) ok
Dec  7 06:39:41 web1 crmd: [2474]: info: match_graph_event: Action 
apache_monitor_15000 (15) confirmed on web1.testcluster (rc=0)


Dec  7 06:56:08 web1 pengine: [2487]: notice: native_print: 
pingdnet1(ocf::pacemaker:pingd):Started web1.testcluster
Dec  7 06:56:08 web1 pengine: [2487]: notice: native_print: 
pingdnet2(ocf::pacemaker:pingd):Started web1.testcluster
Dec  7 06:56:08 web1 pengine: [2487]: notice: native_print: 
apache(lsb:httpd):Stopped 
Dec  7 06:56:08 web1 pengine: [2487]: info: native_color: Resource apache cannot 
run anywhere
Dec  7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource 
pingdnet1(Started web1.testcluster)
Dec  7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource 
pingdnet2(Started web1.testcluster)
Dec  7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource 
apache(Stopped)





Dec  7 07:10:15 web1 pingd: [9653]: info: ping_read: Retrying...
Dec  7 07:10:16 web1 pingd: [9521]: info: ping_read: Retrying...
Dec  7 07:10:47 web1 last message repeated 31 times
Dec  7 07:11:08 web1 last message repeated 21 times
Dec  7 07:11:08 web1 last message repeated 21 times
Dec  7 07:11:08 web1 crmd: [2474]: info: crm_timer_popped: PEngine Recheck Timer 
(I_PE_CALC) just popped!
Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State transition 
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: Progressed to 
state S_POLICY_ENGINE after C_TIMER_POPPED
Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: All 1 cluster 
nodes are eligible to run resources.
Dec  7 07:11:08 web1 crmd: [2474]: info: do_pe_invoke: Query 201: Requesting the 
current CIB: S_POLICY_ENGINE
Dec  7 07:11:08 web1 crmd: [2474]: info: do_pe_invoke_callback: Invoking the PE: 
query=201, ref=pe_calc-dc-1291723868-103, seq=1, quorate=1
Dec  7 07:11:08 web1 pengine: [2487]: notice: unpack_config: On loss of CCM 
Quorum: Ignore
Dec  7 07:11:08 web1 pengine: [2487]: info: unpack_config: Node scores: 'red' = 
-INFINITY, 'yellow' = 0, 'green' = 0
Dec  7 07:11:08 web1 pengine: [2487]: info: determine_online_status: Node 
web1.testcluster is online
Dec  7 07:11:08 web1 pengine: [2487]: notice: native_print: 
pingdnet1(ocf::pacemaker:pingd):Started web1.testcluster
Dec  7 07:11:08 web1 pengine: [2487]: notice: native_print: 
pingdnet2(ocf::pacemaker:pingd):Started web1.testcluster
Dec  7 07:11:08 web1 pengine: [2487]: notice: native_print: 
apache(lsb:httpd):Stopped 
Dec  7 07:11:08 web1 pengine: [2487]: info: native_color: Resource apache cannot 
run anywhere
Dec  7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource 
pingdnet1(Started web1.testcluster)
Dec  7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource 
pingdnet2(Started web1.testcluster)
Dec  7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource 
apache(Stopped)
Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State transition 
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
origin=handle_response ]
Dec  7 07:11:08 web1 crmd: [2474]: info: unpack_graph: Unpacked transition 37: 0 
actions in 0 synapses
Dec  7 07:11:08 web1 crmd: [2474]: info: do_te_invoke: Processing graph 37 
(ref=pe_calc-dc-1291723868-103) derived from /var/lib/pengine/pe-input-555.bz2
Dec  7 07:11:08 web1 crmd: [2474]: info: run_graph: 
====================================================
Dec  7 07:11:08 web1 crmd: [2474]: notice: run_graph: Transition 37 (Complete=0, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-555.bz2): Complete
Dec  7 07:11:08 web1 crmd: [2474]: info: te_graph_trigger: Transition 37 is now 
complete
Dec  7 07:11:08 web1 crmd: [2474]: info: notify_crmd: Transition 37 status: done 
- <null>
Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]
Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: Starting PEngine 
Recheck Timer
Dec  7 07:11:08 web1 pengine: [2487]: info: process_pe_message: Transition 37: 
PEngine Input stored in: /var/lib/pengine/pe-input-555.bz2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101207/17caaed0/attachment.html>


More information about the Pacemaker mailing list