[Pacemaker] Help with N+1 configuration

Cal Heldenbrand cal at fbsdata.com
Fri Jul 27 11:48:14 EDT 2012


Here's one more scenario with a high load test, and my active content check
fails.  Instead of a recursive fork bomb, I slowly added a CPU intensive
process to the system until I arrived at around 260 load on a single CPU
box.

My memcached status check was returning:  "INFO: memcached is down"

But, this is what the cluster status looked like:

Online: [ mem1 mem2 mem3 ]

 cluster-ip-mem2        (ocf::heartbeat:IPaddr2):       Started mem2
 cluster-ip-mem1        (ocf::heartbeat:IPaddr2):       Started mem1
(unmanaged) FAILED
 Clone Set: memcached_clone [memcached]
     Started: [ mem3 mem2 ]
     Stopped: [ memcached:2 ]

Failed actions:
    memcached:2_start_0 (node=mem1, call=21, rc=-2, status=Timed Out):
unknown exec error
    cluster-ip-mem1_stop_0 (node=mem1, call=22, rc=-2, status=Timed Out):
unknown exec error

Why wouldn't my mem3 failover happen if it timed out stopping the cluster
IP?

Thank you,

--Cal

On Thu, Jul 26, 2012 at 4:09 PM, Cal Heldenbrand <cal at fbsdata.com> wrote:

> A few more questions, as I test various outage scenarios:
>
> My memcached OCF script appears to give a false positive occasionally, and
> pacemaker restarts the service.  Under the hood, it uses netcat to
> localhost with a 3 second connection timeout.  I've run my script manually
> in a loop and it never seems to time out.
>
> My primitive looks like this:
>
> primitive memcached ocf:fbs:memcached \
>
>         meta is-managed="true" target-role="Started" \
>         op monitor interval="1s" timeout="5s"
>
> I've played around with the primitive's interval and timeout.  All that
> seems to do is decrease the frequency that the false positive happens.  Is
> there any way to add logic to the monitor to say "restart the service only
> if 3 failures in a row happen?"
>
> Also, I've tried to create a massive load failure by using a fork bomb.  A
> few of the outages we've had on our memcache servers appear to be heavy
> loads -- the machine response to ICMP on the ethernet card, but doesn't
> respond on ssh.  A fork bomb pretty much recreates the same problem.  When
> I fire off a fork bomb on my test machine, it seems to take 5 minutes or
> more to actually trigger the failover event.  It's difficult for me to make
> sense of all the logging going on, but these two timeout values seem to be
> interesting:
>
> crmd:    error: crm_timer_popped:         Election Timeout (I_ELECTION_DC)
> just popped in state S_ELECTION! (120000ms)
> crmd:    error: crm_timer_popped:         Integration Timer (I_INTEGRATED)
> just popped in state S_INTEGRATION! (180000ms)
>
> Can those values be adjusted?  Or is there a common configuration change
> to be more responsive to an active content check like I'm doing?
>
> For reference, please see my attached script *memcached*.
>
> Thanks!
>
> --Cal
>
>
> On Thu, Jul 26, 2012 at 1:35 PM, Phil Frost <phil at macprofessionals.com>wrote:
>
>> On 07/26/2012 02:16 PM, Cal Heldenbrand wrote:
>>
>>> That seems very handy -- and I don't need to specify 3 clones?   Once my
>>> memcached OCF script reports a downed service, one of them will
>>> automatically transition to the current failover node?
>>>
>>
>> There are options for the clone on how many instances of the cloned
>> resource to create, but they default to the number of nodes in the cluster.
>> See: http://www.clusterlabs.org/**doc/en-US/Pacemaker/1.1/html/**
>> Pacemaker_Explained/**ch10s02s02.html<http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch10s02s02.html>
>>
>>
>>  Is there any reason you specified just a single memcache_clone, instead
>>> of both the memcache primitive and memcached_clone?  I might not be
>>> understanding exactly how a clone works.  Is it like... maybe a "symbolic
>>> link" to a primitive, with the ability to specify different metadata and
>>> parameters?
>>>
>>
>> Once you make a clone, the underlying primitive isn't referenced anywhere
>> else (that I can think of). If you want to stop memcache, you don't stop
>> the primitive; you add a location constraint forbidding the clone from
>> running on the node where you want to stop memcache ("crm resource migrate"
>> is easiest). I can't find the relevant documentation, but this is just how
>> they work. The same is true for groups -- the member primitives are never
>> referenced except by the group. I believe in most cases if you try to
>> reference the primitive, you will get an error.
>>
>>
>>  Despite the advertisement of consistent hashing with memcache clients,
>>> I've found that they still have long timeouts waiting on connecting to an
>>> IP.  So, keeping the clustered IPs up at all times is more important than
>>> having a seasoned cache behind them.
>>>
>>
>> I don't know a whole lot about memcache, but it sounds like you might
>> even want to reduce the colocation score for the ips on memcache to be a
>> large number, but not infinity. This way in the case that memcache is
>> broken everywhere, the ips are still permitted to run. This might also
>> cover you in the case that a bug in your resource agent thinks memcache has
>> failed everywhere, but actually it's still running fine. The decision
>> depends which failure the memcache clients handle better: the IP being
>> down, or the IP being up but not having a working memcache server behind it.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120727/013c923e/attachment-0003.html>


More information about the Pacemaker mailing list