[Pacemaker] Managing DRBD Dual Primary with Pacemaker always initial Split Brains

Sun Oct 5 22:30:52 EDT 2014

On 3 Oct 2014, at 5:07 am, Felix Zachlod <fz.lists at sis-gmbh.info> wrote:

> Am 02.10.2014 18:02, schrieb Digimer:
>> On 02/10/14 02:44 AM, Felix Zachlod wrote:
>>> I am currently running 8.4.5 on to of Debian Wheezy with Pacemaker 1.1.7
>> 
>> Please upgrade to 1.1.10+!
>> 
> 
> Are you referring to a special bug/ code change? I normally don't like building all this stuff from source instead using the packages if there are not very good reasons for it. I run some 1.1.7 debian base pacemaker clusters for a long time now without any issue and I am sure that this version seems to run very stable so as long as I am not facing a specific problem with this version

According to git, there are 1143 specific problems with 1.1.7
In total there have been 3815 commits and 5 releases in the last 2.5 years, we don't do all that for fun :-)

Also, since our resources are severely constrained, "get something recent" helps us focus our efforts on a limited number of recent resources (of which 1.1.7 isn't one).
Its great when something older is working for people, but we generally leave "long term support" to vendors like Red Hat and SUSE.

On the other hand, if both sides think they have up-to-date data it might not be anything to do with pacemaker at all.

> I'd prefer sticking to it rather than putting brand new stuff from source together which might face other compatibility issues later on.
> 
> 
> I am nearly sure that I found a hint to the problem:
> 
> adjust_master_score (string, [5 10 1000 10000]): master score adjustments
>    Space separated list of four master score adjustments for different scenarios:
>     - only access to 'consistent' data
>     - only remote access to 'uptodate' data
>     - currently Secondary, local access to 'uptodate' data, but remote is unknown
> 
> This is from the drbd resource agent's meta data.
> 
> As you can see the RA will report a master score of 1000 if it is secondary and (thinks) it has up to date data. According to the logs it is reporting 1000 though... I set a location rule with a score of -1001 for the Master role and finally Pacemaker is waiting to promote the nodes to Master till the next monitor action when it notices until the nodes are connected and synced and report a MS of 10000. What is interesting to me is
> 
> a) why do both drbd nodes think they have uptodate data when coming back online- at least one should know that it has been disconnected when another node was still up and consider that data might have been changed in the meantime. and in case I am rebooting a single node it can almost be sure that it has only "consistent" data cause the other side was still primary when shutting down this one
> 
> b) why does obviously nobody face this problem as it should behave like this in any primary primary cluster
> 
> but I think I will try passing this on to the drbd mailing list too.
> 
> regards, Felix
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141006/ba17c024/attachment-0003.sig>