[ClusterLabs] Pacemaker shows false status of a resource and doesn't react on OCF_NOT_RUNNING rc.

Tue Jan 19 13:25:08 EST 2016

On 01/19/2016 12:20 PM, Kostiantyn Ponomarenko wrote:
> I've put the wrong entry from "journalctl --since="2016-01-19"
> --until="2016-01-20"".
> The correct one is:
> 
> Jan 19 23:42:24 A2-2U12-302-LS ntpd[2204]: 0.0.0.0 c61c 0c clock_step
> -43194.111405 s
> Jan 19 11:42:29 A2-2U12-302-LS ntpd[2204]: 0.0.0.0 c614 04 freq_mode
> Jan 19 11:42:29 A2-2U12-302-LS systemd[1]: Time has been changed
> 
> Yes, the really first monitor operation was successful.
> But I still have a question.
> The first monitor op was at [Jan 19 23:42:16] and the failure was
> considered by Pacemaker at [Jan 19 12:57:53].
> So, the first one still remains older and here I don't understand why
> Pacemaker considers it failed?

23:42 is later (newer) than 12:57. In a cluster, with the possibility of
nodes leaving and rejoining, and having different loads, etc., it is
possible for updates and operation results to come into the DC out of
chronological order. So pacemaker relies on the timestamps to determine
what's current.

I haven't traced this issue through the code, but my guess is that when
pacemaker receives the 12:57 failure, it will of course log that it
happened, but if it has what it thinks is a newer result (from 23:42),
it won't treat the failure as the current state. Again, that's just a
guess at this point, but I think that's what's happening.

> Thank you,
> Kostia
> 
> On Tue, Jan 19, 2016 at 8:02 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
> 
>> On 01/19/2016 10:30 AM, Kostiantyn Ponomarenko wrote:
>>> The resource that wasn't running, but was reported as running, is
>>> "adminServer".
>>>
>>> Here are a brief chronological description:
>>>
>>> [Jan 19 23:42:16] The first time Pacemaker triggers its monitor function
>> at
>>> line #1107. (those lines are from its Resource Agent)
>>> [Jan 19 23:42:16] Then Pacemaker starts the resource - line #1191.
>>> [Jan 19 11:42:53] The first failure is reported by monitor operation at
>>> line #1543.
>>> [Jan 19 11:42:53] The fail-count is set, but I don't see any attempt from
>>> Pacemaker to "start" the resource - the start function is not called
>> (from
>>> the logs) - line #1553.
>>> [Jan 19 12:27:56] Then adminServer's monitor operation keeps returning
>>> $OCF_NOT_RUNNING - starts at line #1860.
>>> [Jan 19 12:57:53] Then the expired failcount is cleared at line #1969.
>>> [Jan 19 12:57:53] Another call of the monitor function happens at line
>>> #2038.
>>> [Jan 19 12:57:53] I assume that the line #2046 means "not running" (?).
>>> [Jan 19 12:57:53] The "stop" function is called - line #2150
>>> [Jan 19 12:57:53] The "start" function is called and the resource is
>>> successfully started - line #2164
>>>
>>>
>>> The time change occurred while cluster was starting, I see this from
>>> "journalctl --since="2016-01-19" --until="2016-01-20"":
>>>
>>> Jan 19 23:10:39 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c61c 0c clock_step
>>> -43193.793349 s
>>> Jan 19 11:10:45 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c614 04 freq_mode
>>> Jan 19 11:10:45 A2-2U12-302-LS systemd[1]: Time has been changed
>>>
>>> I am attaching corosync.log.
>>
>> The time change is interesting. I suspect what's happening is that
>> pacemaker considers the failed monitor "older" than the original
>> successful one, and so ignores it.
>>
>> In general, we don't support large clock shifts in a running cluster (a
>> topic that has come up before on this list). But if you can reproduce
>> the behavior with 1.1.14, feel free to open a bug report. It might be
>> worth revisiting to see if there is anything we can do about it.
>>
>>> Thank you,
>>> Kostia
>>>
>>> On Tue, Jan 19, 2016 at 5:17 PM, Bogdan Dobrelya <bdobrelia at mirantis.com
>>>
>>> wrote:
>>>
>>>> On 19.01.2016 16:13, Ken Gaillot wrote:
>>>>> On 01/19/2016 06:49 AM, Kostiantyn Ponomarenko wrote:
>>>>>> One of resources in my cluster is not actually running, but "crm_mon"
>>>> shows
>>>>>> it with the "Started" status.
>>>>>> Its resource agent's monitor function returns "$OCF_NOT_RUNNING", but
>>>>>> Pacemaker doesn't react on this anyhow - crm_mon show the resource as
>>>>>> Started.
>>>>>> I couldn't find an explanation to this behavior, so I suppose it is a
>>>> bug,
>>>>>> is it?
>>>>>
>>>>> That is unexpected. Can you post the configuration and logs from around
>>>>> the time of the issue?
>>>>>
>>>>
>>>> Oh, sorry, I forgot to mention the related thread [0]. That is exactly
>>>> the case I reported there. Looks same, so I thought you've just updated
>>>> my thread :)
>>>>
>>>> These may be merged perhaps.
>>>>
>>>> [0] http://clusterlabs.org/pipermail/users/2016-January/002035.html