[ClusterLabs] Antw: Re: Regular pengine warnings after a transient failure

Tue Jul 12 00:25:29 EDT 2016

On Wed, Mar 9, 2016 at 6:31 PM, Ulrich Windl
<Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>>> Ferenc Wágner <wferi at niif.hu> schrieb am 08.03.2016 um 15:08 in Nachricht
> <87wppdoydv.fsf at lant.ki.iif.hu>:
>> Ken Gaillot <kgaillot at redhat.com> writes:
>>
>>> On 03/07/2016 02:03 PM, Ferenc Wágner wrote:
>>>
>>>> The transition-keys match, does this mean that the above is a late
>>>> result from the monitor operation which was considered timed-out
>>>> previously?  How did it reach vhbl07, if the DC at that time was vhbl03?
>>>>
>>>>> The pe-input files from the transitions around here should help.
>>>>
>>>> They are available.  What shall I look for?
>>>
>>> It's not the most user-friendly of tools, but crm_simulate can show how
>>> the cluster would react to each transition: crm_simulate -Sx $FILE.bz2
>>
>> $ /usr/sbin/crm_simulate -Sx pe-input-430.bz2 -D recover_many.dot
>> [...]
>> $ dot recover_many.dot -Tpng >recover_many.png
>> dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.573572 to
>> fit
>>
>> The result is a 32767x254 bitmap of green ellipses connected by arrows.
>
> That completely agrees with my experience on this: FOr real-life
> configurations those graphs are gigantic. When outputting SVG instead of PNG,
> you can at least zoom and pan in the graph (even Firefox can do it). Or (if you
> feel crazy enough) you can load the graph in Inkscape and delete the details
> you are not interested in.
>
> The best solution of course would be to omit irrelevant details from the graph
> before creating the dot file...

Personally I just use graphviz to view the raw .dot file
Even grepping it for the resource name you're interested in can be
highly useful.

>
>
>> Most arrows are impossible to follow, but the picture seems to agree
>> with the textual output from crm_simulate:

I should hope so, what the crm_simulate output misses however is
details of the ordering because it can't express what happens in
parallel, nor how long running actions would affect the order.

>>
>> * 30 FAILED resources on vhbl05 are to be recovered
>> * 32 Stopped resources are to be started (these are actually running,
>>   but considered Stopped as a consequence of the crmd restart on vhbl03)

I've lost all context here, the crmd process failed?

>>
>> On the other hand, simulation based on pe-input-431.bz2 reports
>> * only 2 FAILED resources to recover on vhbl05
>> * 36 resources to start (the 4 new are the ones whose recoveries started
>>   during the previous -- aborted -- transition)
>>
>> I failed to extract anything out of these simulations than what was
>> already known from the logs.

The dot files and the list of actions aren't normally the interesting
parts of crm_simulate - thats just the "what" that you already knew.

The real power of crm_simulate is the ability to replay events at a
higher verbosity, using any and all of:
* extra '-v' options
* export PCMK_trace_functions="some_function,another_one"
* export PCMK_trace_tags=${resource_name}

to find out WHY it did the thing you already know it did.

>  But I'm happy to see that the cluster
>> probes the disappeared resources on vhbl03 (where they disappeared with
>> the crmd restart) even though it plans to start some of them on other
>> nodes.

It may plan to start them elsewhere, but it wont go through with it
unless the probes return what we expect them to return (NOT_RUNNING).
When they fail to do this, we recompute everything and arrive at the
conclusion that at least most of them should stay where they are.

>> --
>> Regards,
>> Feri
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org