[ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

Digimer lists at alteeve.ca
Sun Apr 23 01:00:29 EDT 2017


On 23/04/17 12:51 AM, Andrei Borzenkov wrote:
> 22.04.2017 23:33, Dmitri Maziuk пишет:
>> On 4/22/2017 12:02 PM, Digimer wrote:
>>
>>> Having SBD properly configured is *massively* safer than no fencing at
>>> all. So for people where other fence methods are not available for
>>> whatever reason, SBD is the way to go.
>>
>> Now you're talking. IMO in a 2-node cluster, a node that kills itself in
>> response to, say, losing link on eth0 is infinitely preferable to a node
>> that tries to shoot the other node when it can't ping it.
>>
> 
> How do you know whether node actually killed itself? How do you know
> when it is safe to takeover resources from this node?

Watchdog timers work outside the OS. They're hardware devices that will
reboot the host unless told not to. So it doesn't matter what state the
host is in; It can be stuck in a hung state, paniced, whatever. If the
watchdog timer isn't kicked, it will face having it's reset button
pressed (effectively). That's why, if you know the kick time, you just
have to wait longer than that to know that the lost node is no longer
operational.

> As a real life example (not Linux/pacemaker) - panicking node flush
> eddisk buffers, so it was not safe to access shared filesystem until
> this was complete. This could take quite a lot of time, so without agent
> on *surviving* node(s) that monitors and acknowledges this process this
> resulted in data corruption.
> 
> The problem is not so much how to put node in known state, but how other
> node(s) can ensure it was done.



-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould




More information about the Users mailing list