[ClusterLabs] Antw: [EXT] Suggestions for multiple NFS mounts as LSB script

Andrei Borzenkov arvidjaar at gmail.com
Mon Jun 29 14:01:14 EDT 2020


29.06.2020 20:20, Tony Stocker пишет:
> 
>>
>>
>> The most interesting part seems to be the question whow you define (and
>> detect) a failure that will cause a node switch.
> 
> That is a VERY good question! How many mounts failed is the critical
> number when you have 130+? If a single one fails, do you suddenly move
> everything to the other node (even though it's just as likely to fail
> there)? Do you just monitor and issue complaints?

Is it rhetorical question, or who are those "you"?

Pacemaker works with resources. Resource is considered failed if either
its state does not match expectation (resource agent reports resource is
stopped while pacemaker started it a while back) or resource agent
explicitly reports resource as failed. What constitutes "failed" in the
latter case is entirely up to resource agent. To notice resource failure
resource definition must include periodical monitor, otherwise there is
no way pacemaker will be aware that anything happened.

There are also resource dependencies, so you can define that resource B
must always be on the same node as A, if A ever needs to be switched
over, B will follow.

That's basically all. It is up to you to simply do not configure
monitoring of "unimportant" resources so that after initial start
nothing ever happens. You can even ignore initial start failures if you
want.

> At the moment
> there's zero checking of this, so until someone complains that they
> can't reach something, we don't know that the mount isn't working
> properly -- so apparently I guess it's not viewed as that critical.
> But at the very least, the main home directory for the https/ftps file
> server operations should be operational, or else it's all moot.
> 

With single monolithic script your script is responsible for
distinguishing between "important" and "unimportant" mounts. With
individual resources you have boiler plate to fill in mount point.

> Is ocf_tester still available? I installed via 'yum' from the High
> Availability repository and don't see it. I also did a 'yum
> whatprovides *bin/ocf-tester' and no package came back. Do I have to
> manual download it from somewhere? If so, could someone provide a link
> to the most up-to-date source?
> 

It is part of resource-agents:

https://github.com/ClusterLabs/resource-agents/blob/master/tools/ocf-tester.in

But ocf-tester preforms exactly actual resource operations
(start/stop/etc) that you want to avoid. It is not syntactic or semantic
offline checker.


More information about the Users mailing list