[ClusterLabs] corosync 2.4 CPG config change callback

Jan Friesse jfriesse at redhat.com
Mon Jul 2 02:43:37 EDT 2018


Hi Thomas,
> Hi,
> 
> Am 04/25/2018 um 09:57 AM schrieb Jan Friesse:
>> Thomas Lamprecht napsal(a):
>>> On 4/24/18 6:38 PM, Jan Friesse wrote:
>>>>> On 4/6/18 10:59 AM, Jan Friesse wrote:
>>>>>> Thomas Lamprecht napsal(a):
>>>>>>> Am 03/09/2018 um 05:26 PM schrieb Jan Friesse:
>>>>>>>> I've tested it too and yes, you are 100% right. Bug is there and 
>>>>>>>> it's
>>>>>>>> pretty easy to reproduce when node with lowest nodeid is paused. 
>>>>>>>> It's
>>>>>>>> slightly harder when node with higher nodeid is paused.
>>>>>>>>
>>>>>>>
>>>>>>> Do you were able to make some progress on this issue?
>>>>>>
>>>>>> Ya, kind of. Sadly I had to work on different problem, but I'm 
>>>>>> expecting to sent patch next week.
>>>>>>
>>>>>
>>>>> I guess the different problems where the ones related to the issued 
>>>>> CVEs :)
>>>>
>>>> Yep.
>>>>
>>>> Also I've spent quite a lot of the time thinking about best possible 
>>>> solution. CPG is quite old, it was full of weird bugs and risk of 
>>>> breakage is very high.
>>>>
>>>> Anyway, I've decided to not to try hack what is apparently broken 
>>>> and just go for risky but proper solution (= needs a LOT more 
>>>> testing, but so far looks good).
>>>>
>>>
>>> I did not looked deep into how your revert plays out with the
>>> mentioned commits of the heuristics approach, but this fix would
>>> mean to bring corosync back to a state it had already, and thus
>>> was already battle tested?
>>
>> Yep, but not fully. Important change was to use joinlists as 
>> authoritative source of information about other node clients, so I 
>> believe that solved problems which should had been "solved" by 
>> downlist heuristics.
>>
>>
>>>
>>> Patch and approach seems good to me, with my limited knowledge,
>>> when looking at the various "bandaid" fix commits you mentioned.
>>>
>>>> Patch is in PR (needle): https://github.com/corosync/corosync/pull/347
>>>>
>>>
>>> Much thanks! First tests work well here.
>>> I could not yet reproduce the problem with the patch applied in both,
>>> testcpg and our cluster configuration file system.
>>
>> That's good to hear :)
>>
>>>
>>> I'll let it run
>>
>> Perfect.
>>
> 
> 
> Just wanted to give some quick feedback.
> We deployed this to your community repository about a week ago (after
> another week of successful testing), we had no negative feedback or
> issues reported or seen yet, with (strong lower bound) > 10k systems
> running the fix by now.

Thanks, that's exciting news.

> 
> I saw just now that you merged it into needle and master, so, while a 
> bit late, this just backs the confidence into the fix up.

Definitively not late until it's released :)

> 
> Much thanks for your, and the reviewers, work!

Yep, you are welcomed.

Honza

> 
> cheers,
> Thomas
> 



More information about the Users mailing list