[Pacemaker] Improvement for the communication failure of booth

Jiaju Zhang jjzhang at suse.de
Tue Feb 12 10:57:29 UTC 2013


Hi Yusuke,

On Tue, 2013-02-12 at 11:52 +0900, yusuke iida wrote:
> Hi, Jiaju
> 
> 2012/12/18 Jiaju Zhang <jjzhang at suse.de>:
> > Good suggestion! I think it may need to introduce a notifier callback so
> > that the failure of communicating with the problematic node can be
> > notified to the "active" node. This makes sense for the active node,
> > because it will make the admin know how many healthy "passive" nodes
> > currently there are and any potential issues might be resolved in
> > advance.
> >
> > Regarding the implementation of this feature, I think it is doable
> > although it may need a lot of changes;)
> 
> I performed implementation about the state indication.
> 
> This function displays the communication state of paxos of each site.
> In order to view the status we have extended the display of the "booth
> client list" command.
> 
> The display format is as follows.
> # booth client list
> ticket: ticketA, owner: None, expires: INF
>         site: <site ip>, state: <init | waiting promise | promised |
> waiting accept | acceped>
> 
> I below shows an example of a status display.
> 
> Constitution: 3 sites, 1 ticket
> siteA: 192.168.201.131(grant)
> siteB: 192.168.201.132
> siteC: 192.168.201.133
> 
> """initial state"""
> siteA # booth client list
> ticket: ticketA, owner: 192.168.201.131, expires: 2013/02/08 20:21:54
>         site: 192.168.201.131, state: accepted
>         site: 192.168.201.132, state: accepted
>         site: 192.168.201.133, state: accepted
> 
> siteB # booth client list
> ticket: ticketA, owner: 192.168.201.131, expires: 2013/02/08 20:21:53
>         site: 192.168.201.131, state: accepted
>         site: 192.168.201.132, state: accepted
>         site: 192.168.201.133, state: accepted
> 
> siteC # booth client list
> ticket: ticketA, owner: 192.168.201.131, expires: 2013/02/08 20:21:54
>         site: 192.168.201.131, state: accepted
>         site: 192.168.201.132, state: accepted
>         site: 192.168.201.133, state: accepted
> 
> """siteA is down"""
> siteB # booth client list
> ticket: ticketA, owner: 192.168.201.133, expires: 2013/02/12 11:00:45
>         site: 192.168.201.131, state: waiting accept
>         site: 192.168.201.132, state: accepted
>         site: 192.168.201.133, state: accepted
> 
> siteC # booth client list
> ticket: ticketA, owner: 192.168.201.133, expires: 2013/02/12 11:00:46
>         site: 192.168.201.131, state: waiting accept
>         site: 192.168.201.132, state: accepted
>         site: 192.168.201.133, state: accepted
> 
> """siteB is down"""
> siteA # booth client list
> ticket: ticketA, owner: 192.168.201.131, expires: 2013/02/12 11:14:41
>         site: 192.168.201.131, state: accepted
>         site: 192.168.201.132, state: waiting accept
>         site: 192.168.201.133, state: accepted
> 
> siteC # booth client list
> ticket: ticketA, owner: 192.168.201.131, expires: 2013/02/12 11:14:41
>         site: 192.168.201.131, state: accepted
>         site: 192.168.201.132, state: waiting accept
>         site: 192.168.201.133, state: accepted
> 
> """communication blockade between siteB-siteC"""
> siteA # booth client list
> ticket: ticketA, owner: 192.168.201.131, expires: 2013/02/12 11:33:20
>         site: 192.168.201.131, state: accepted
>         site: 192.168.201.132, state: accepted
>         site: 192.168.201.133, state: accepted
> 
> siteB # booth client list
> ticket: ticketA, owner: 192.168.201.131, expires: 2013/02/12 11:33:19
>         site: 192.168.201.131, state: accepted
>         site: 192.168.201.132, state: accepted
>         site: 192.168.201.133, state: waiting accept
> 
> siteC # booth client list
> ticket: ticketA, owner: 192.168.201.131, expires: 2013/02/12 11:33:19
>         site: 192.168.201.131, state: accepted
>         site: 192.168.201.132, state: waiting accept
>         site: 192.168.201.133, state: accepted
> 
> I want a repository to merge it if I do not have any problem.
> https://github.com/jjzhang/booth/pull/49

Just look at the patch, it seems to me that it wanted to differentiate
every state like "init", "waiting promise", "promised", "waiting accept"
and "accepted", etc ... However I'm afraid in this way, it can only
differentiate "accepted" or "not accepted" (for the "not accepted" case
here, it will shows "waiting accept").

In acceptor_accepted function,

+	pi->state_monitoring = 1;
+	for (i = 0; i < booth_conf->node_count; i++)
+		pi->node[i].connect_state = PROPOSING;
+

For this acceptor, every node was set to PROPOSING state here, but we cannot 
make sure what state other nodes were in at that moment.

Thanks,
Jiaju  







More information about the Pacemaker mailing list