[Pacemaker] migration-threshold causing unnecessary restart of underlying resources

Mon Aug 16 22:14:17 EDT 2010

  Am 16.08.2010 13:29, schrieb Dejan Muhamedagic:
> On Sat, Aug 14, 2010 at 06:26:58AM +0200, Cnut Jansen wrote:
>> Am 12.08.2010 18:46, schrieb Dejan Muhamedagic:
>>> The migration-threshold shouldn't in any way influence resources
>>> which don't depend on the resource which fails over. Couldn't
>>> reproduce it here with our example RAs.

>> So it seems that - for what reason ever - those constrainted
>> resources are considered and treated just as they were in a
>> resource-group, because they move to where they all can run, instead
>> of the "eat or die" for the dependent resource (mysql) to the
>> underlying resource (mount) that I had expected with such
>> constraints as I set them... shouldn't I?! o_O
> Yes, those two constraints are equivalent to a group.
So in fact migration-threshold actually does influence resources that 
are neither grouped with nor dependent on the failing resource, when the 
failing resource depends on them?!

Of course I allready knew that from groups, and there it - imho - also 
makes sense, since defining a group means like saying "I want to have 
all these resources run together on one node; no matter how and where". 
But when setting constraints respectively defining dependencies, at 
least I understand "dependency" one-sided, not mutual; meaning the 
underlying resource is independent towards its dependent, therefor it 
can do whatever it wants to do and doesn't have to care about its 
dependent at all, while the dependent shall only start when and where 
the underlying resource it depends on is started.
So did I understand you right, that for Pacemaker it's actually the 
intentional way of working for both, groups and constraints, that they 
are mutual dependencies?

And if so: Is there also any possibility to define one-sided 
dependencies/influences?

>> And - concerning the failure-timeout - quite a while later, without
>> having resetted mysql's failure counter or having done anything else
>> in the meantime:
>>
>> 4) alpha: FC(mysql)=3, crm_resource -F -r mysql -H alpha
>> Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
>> operation mysql_asyncmon_0 (call=59, rc=1, cib-update=592,
>> confirmed=false) unknown error
>> Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
>> operation mysql_stop_0 (call=60, rc=0, cib-update=596,
>> confirmed=true) ok
>> Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
>> operation mount_stop_0 (call=61, rc=0, cib-update=597,
>> confirmed=true) ok
>> beta: FC(mysql)=0
>> Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
>> operation mount_start_0 (call=40, rc=0, cib-update=96,
>> confirmed=true) ok
>> Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
>> operation mysql_start_0 (call=41, rc=0, cib-update=97,
>> confirmed=true) ok
>> Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
>> operation mysql_stop_0 (call=42, rc=0, cib-update=98,
>> confirmed=true) ok
>> Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
>> operation mount_stop_0 (call=43, rc=0, cib-update=99,
>> confirmed=true) ok
>> alpha: FC(mysql)=4
>> Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
>> operation mount_start_0 (call=62, rc=0, cib-update=599,
>> confirmed=true) ok
>> Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
>> operation mysql_start_0 (call=63, rc=0, cib-update=600,
>> confirmed=true) ok
> This worked as expected, i.e. after the 150s cluster-recheck
> interval the resources were started at alpha.
Is it really "as exspected" that many(!) minutes - and even 
cluster-rechecks - after the last picking-on and with a failure-timeout 
of 45 seconds the failure counter is still not only showing a count of 
3, but also obviously really being 3 (not 0, after being reset), thus 
now migrating resource allready on the first following picking-on?!

>>> BTW, what's the point of cloneMountMysql? If it can run only
>>> where drbd is master, then it can run on one node only:
>>>
>>> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
>>> order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start
>> It's a dual-primary-DRBD-configuration, so there are actually - when
>> everything is ok (-; - 2 masters of each DRBD-multistate-resource...
>> even though I admit that at least the dual primary respectively
>> master for msDrbdMysql is currently (quite) redundant, since in the
>> current cluster configuration there's only one, primitive
>> MySQL-resource and thus there'd be no inevitable need for MySQL's
>> data-dir being mounted all time on both nodes.
>> But since it's not harmful to have it mounted on the other node too,
>> and since msDrbdOpencms and msDrbdShared need to be mounted on both
>> nodes and since I put the complete installation and configuration of
>> the cluster into flexibly configurable shell-scripts, it's easier
>> respectively done with less typing to just put all DRBD- and
>> mount-resources' configuration into just one common loop. (-;
> OK. It did cross my mind that it may be a dual-master drbd.
>
> Your configuration is large. If you are going to run that in
> producetion and don't really need a dual-master, then it'd be
> good to get rid of the ocfs2 bits to make maintenance easier.
Well, there are 3 DRBD resources, and the other 2 DRBD resources except 
the DRBD for MySQL's datadir must be dual-primary allready now, since 
they're needed being mounted on all nodes for the 
Apache/Tomcat/Opencms-teams. Therefor it's indeed easier for maintenance 
to just keep all 3 DRBD's configurations in sync, and only requiring one 
little line more for cloning mountMysql. (-;

>>>> d) I also have the impression that fail-counters don't get reset
>>>> after their failure-timeout, because when migration-threshold=3 is
>>>> set, upon every(!) following picking-on those issues occure, even
>>>> when I've waited for nearly 5 minutes (with failure-timeout=90)
>>>> without any touching the cluster
>>> That seems to be a bug though I couldn't reproduce it with a
>>> simple configuration.
>> I just also tested this once again: It seems like that
>> failure-timeout only sets back scores from -inf to around 0
>> (whereever they should normally be), allowing the resources to
>> return back to the node. I tested with setting a location constraint
>> for the underlying resource (see configuration): After the
>> failure-timeout has been completed, on the next cluster-recheck (and
>> only then!) the underlying resource and its dependants return to the
>> underlying resource's prefered location, as you see in logs above.
> The count gets reset, but the cluster acts on it only after the
> cluster-recheck-interval, unless something else makes the cluster
> calculate new scores.
See above, picking-on #4: More than 26 minutes after the last picking-on 
with settings of migration-threshold=3, timeout-failure=40 and 
cluster-recheck-interval=150, resources get allready migrated upon first 
picking-on (and shown failure-counter raises to 4). To me that doesn't 
look like resetting failure-counter to 0 after failure-timeout, but just 
resetting scores. Actually - except maybe by tricks/force - it shouldn't 
be possible at all to get the resource running again on the node it 
failed on for as long as its failure counter there has still reached 
migration-threshold's limit, right?
How can then failure counter ever reach counts beyond 
migration-threshold's limit (ok, I could still imagine reasons for that) 
at all, and exspecially why does migration-threshold from then on behave 
on every failure like it was set to 1, even when it's i.e. set to 3?