[ClusterLabs] [corosync] Virtual Synchrony Property guarantees in case of network partition

Sun Jul 3 09:10:12 EDT 2016

Hello Honza and others,

It seems, Corosync is not reliable in network partition. Here is the test I
ran:

Process P1 on node N1
Process P2 on node N2
Process P3 on node N2 again

When all the processes has joined the cluster, this is what happens:

1. P1 in one of its thread, continuously {while(1)} multi-casting messages.
2. All the processes, i.e. P1,P2 and P3,
      - has separate listening thread, calling {cpg_dispatch()}
      - inside cpg_deliver_fn_t, printf counter, which represent how many
messages has received
      - inside cpg_confchg_fn_t, put process to sleep as in when any
process leaves the cluster

First test case: P2 was stopped forcefully.
      - P1 and P3 received same number of messages before configuration
change message
      - PASS expected result, configuration message order is maintained

Second test case: manually pulled the cable connecting node N1 and N2
      - P1 received more number of messages in compare to P2 and P3 before
configuration change message was delievered
      - Configuration message was not delivered in order, FAIL

In case of network partition, configuration messages are not ordered with
regard to 'extended virtual synchrony' property. I believe CPG_TYPE_SAFE
implementation is required not only to guarantee that messages are received
at all the process but also to guarantee the ordering of configuration
messages in network partition.

--
Satish

On Mon, Jun 6, 2016 at 9:50 PM, satish kumar <satish.kr2008 at gmail.com>
wrote:

> Thanks, really appreciate your help.
>
> On Mon, Jun 6, 2016 at 9:17 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>
>> But C1 is *guaranteed *to deliver *before *m(k)? No case where C1 is
>>>
>>
>> Yes
>>
>> delivered after m(k)?
>>>
>>
>> Nope.
>>
>>
>>
>>>
>>> Regards,
>>> Satish
>>>
>>> On Mon, Jun 6, 2016 at 8:10 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>>>
>>> satish kumar napsal(a):
>>>>
>>>> Hello honza, thanks for the response !
>>>>
>>>>>
>>>>> With state sync, I simply mean that 'k-1' messages were delivered to
>>>>> N1,
>>>>> N2
>>>>> and N3 and they have applied these messages to change their program
>>>>> state.
>>>>> N1.state = apply(m(k-1);
>>>>> N2.state = apply(m(k-1);
>>>>> N3.state = apply(m(k-1);
>>>>>
>>>>> The document you shared cleared many doubts. However I still need one
>>>>> clarification.
>>>>>
>>>>> According to the document:
>>>>> "The configuration change messages warn the application that a
>>>>> membership
>>>>> change has occurred, so that the application program can take
>>>>> appropriate
>>>>> action based on the membership change. Extended virtual synchrony
>>>>> guarantees a consistent order of messages delivery across a partition,
>>>>> which is essential if the application program are to be able to
>>>>> reconcile
>>>>> their states following repair of a failed processor or reemerging of
>>>>> the
>>>>> partitioned network."
>>>>>
>>>>> I just want to know that this property is not something related to
>>>>> CPG_TYPE_SAFE, which is still not implemented.
>>>>> Please consider this scenario:
>>>>> 0. N1, N2 and N3 has received the message m(k-1).
>>>>> 1. N1 mcast(CPG_TYPE_AGREED) m(k) message.
>>>>> 2. As it is not CPG_TYPE_SAFE, m(k) delievered to N1 but was not yet
>>>>> delivered to N2 and N3.
>>>>> 3. Network partition separate N1 from N2 and N3. N2 and N3 can never
>>>>> see
>>>>> m(k).
>>>>> 4. Configuration change message is now delivered to N1, N2 and N3.
>>>>>
>>>>> Here, N1 will change its state to N1.state = apply(m(k), thinking all
>>>>> in
>>>>> the current configuration has received the message.
>>>>>
>>>>> According to your reply it looks like N1 will not receive m(k). So
>>>>> this is
>>>>> what each node will see:
>>>>> N1 will see: m(k-1) -> C1 (config change)
>>>>> N2 will see: m(k-1) -> C1 (config change)
>>>>> N3 will see: m(k-1) -> C1 (config change)
>>>>>
>>>>>
>>>> For N2 and N3, it's not same C1. So let's call it C2. Because C1 for N1
>>>> is
>>>> (N2 and N3 left) and C2 for N2 and N3 is (N1 left).
>>>>
>>>>
>>>>
>>>> Message m(k) will be discarded, and will not be delivered to N1 even if
>>>>> it
>>>>> was sent by N1 before the network partition.
>>>>>
>>>>>
>>>> No. m(k) will be delivered to app running on N1. So N1 will see m(k-1),
>>>> C1, m(k). So application exactly knows which node got message m(k).
>>>>
>>>> Regards,
>>>>    Honza
>>>>
>>>>
>>>>
>>>> This is the expected behavior with CPG_TYPE_AGREED?
>>>>>
>>>>> Regards,
>>>>> Satish
>>>>>
>>>>>
>>>>> On Mon, Jun 6, 2016 at 4:15 PM, Jan Friesse <jfriesse at redhat.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>>
>>>>>>> Virtual Synchrony Property - messages are delivered in agreed order
>>>>>>> and
>>>>>>> configuration changes are delivered in agreed order relative to
>>>>>>> message.
>>>>>>>
>>>>>>> What happen to this property when network is partitioned the cluster
>>>>>>> into
>>>>>>> two. Consider following scenario (which I took from one of the
>>>>>>> previous query by Andrei Elkin):
>>>>>>>
>>>>>>> * N1, N2 and N3 are in state sync with m(k-1) messages are delivered.
>>>>>>>
>>>>>>>
>>>>>>> What exactly you mean by "state sync"?
>>>>>>
>>>>>> * N1 sends m(k) and just now network partition N1 node from N2 and N3.
>>>>>>
>>>>>>
>>>>>>> Does CPG_TYPE_AGREED guarantee that virtual synchrony is held?
>>>>>>>
>>>>>>>
>>>>>>> Yes it does (actually higher level of VS called EVS)
>>>>>>
>>>>>>
>>>>>> When property is held, configuration change message C1 is guaranteed
>>>>>> to
>>>>>>
>>>>>>> delivered before m(k) to N1.
>>>>>>> N1 will see: m(k-1) C1 m(k)
>>>>>>> N2 and N3 will see: m(k-1) C1
>>>>>>>
>>>>>>> But if this property is violated:
>>>>>>> N1 will see: m(k-1) m(k) C1
>>>>>>> N2 and N3 will see: m(k-1) C1
>>>>>>>
>>>>>>> Violation will screw any user application running on the cluster.
>>>>>>>
>>>>>>> Could someone please explain what is the behavior of Corosync in this
>>>>>>> scenario with CPG_TYPE_AGREED ordering.
>>>>>>>
>>>>>>>
>>>>>>> For description how exactly totem synchronization works take a look
>>>>>> to
>>>>>> http://corosync.github.com/corosync/doc/DAAgarwal.thesis.ps.gz
>>>>>>
>>>>>> After totem is synchronized, there is another level of
>>>>>> synchronization of
>>>>>> services (not described in above doc). All services synchronize in
>>>>>> very
>>>>>> similar way, so you can take a look to CPG as example. Basically only
>>>>>> state
>>>>>> held by CPG is connected clients. So every node sends it's connected
>>>>>> clients list to every other node. If sync is aborted (change of
>>>>>> membership), it's restarted. These sync messages has priority over
>>>>>> user
>>>>>> messages (actually it's not possible to send messages during sync).
>>>>>> User
>>>>>> app can be sure that message was delivered only after it gets it's own
>>>>>> message. Also app gets configuration change message so it knows, who
>>>>>> got
>>>>>> the message.
>>>>>>
>>>>>> Regards,
>>>>>>     Honza
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>> Satish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160703/9502e5b5/attachment-0002.html>