[Pacemaker] crm_resource -L not trustable right after restart

Andrew Beekhof andrew at beekhof.net
Wed Jan 15 01:11:36 EST 2014


On 14 Jan 2014, at 11:50 pm, Brian J. Murrell (brian) <brian at interlinx.bc.ca> wrote:

> On Tue, 2014-01-14 at 16:01 +1100, Andrew Beekhof wrote:
>> 
>>> On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote:
>>>> 
>>>> The local cib hasn't caught up yet by the looks of it.
> 
> I should have asked in my previous message: is this entirely an artifact
> of having just restarted or are there any other times where the local
> CIB can in fact be out of date (and thus crm_resource is inaccurate), if
> even for a brief period of time?  I just want to completely understand
> the nature of this situation.

Consider any long running action, such as starting a database.
We do not update the CIB until after actions have completed, so there can and will be times when the status section is out of date to one degree or another.
At node startup is another point at which the status could potentially be behind.

It sounds to me like you're trying to second guess the cluster, which is a dangerous path.

> 
>> It doesn't know that it doesn't know.
> 
> But it (pacemaker at least) does know that it's just started up, and
> should also know whether it's gotten a fresh copy of the CIB since
> starting up, right?  

What if its the first node to start up?  There'd be no fresh copy to arrive in that case.
Many things are obvious to external observers that are not at all obvious to the cluster.

If it had enough information to know it was out of date, it wouldn't be out of date.

> I think I'd consider it required behaviour that
> pacemaker not consider itself authoritative enough to provide answers
> like "location" until it has gotten a fresh copy of the CIB.
> 
>> Does it show anything as running?  Any nodes as online?
> 
> 
>> I'd not expect that it stays in that situation for more than a second or two...
> 
> You are probably right about that.  But unfortunately that second or two
> provides a large enough window to provide mis-information.
> 
>> We could add an option to force crm_resource to use the master instance instead of the local one I guess.
> 
> Or, depending on the answers to above (like can this local-is-not-true
> situation every manifest itself at times other than "just started")
> perhaps just don't allow crm_resource (or any other tool) to provide
> information from the local CIB until it's been refreshed at least once
> since a startup.

As above, there are situations when you'd never get an answer.

> 
> I would much rather crm_resource experience some latency in being able
> to provide answers than provide wrong ones.  Perhaps there needs to be a
> switch to indicate if it should block waiting for the local CIB to be
> up-to-date or should return immediately with an "unknown" type response
> if the local CIB has not yet been updated since a start.
> 
> Cheers,
> b.
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140115/397a58ec/attachment-0003.sig>


More information about the Pacemaker mailing list