[ClusterLabs] Antw: [EXT] LVM and Filesystem resources ‑ ordering and starting/stopping as a unit
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Nov 5 04:01:25 EDT 2021
>>> "Neitzert, Greg A" <greg.neitzert at unisys.com> schrieb am 05.11.2021 um
06:28 in
Nachricht
<DM8PR07MB8854271EEC0543389891DC10888E9 at DM8PR07MB8854.namprd07.prod.outlook.com>
> Hello,
>
> With a Pacemaker 1.1.13/Corosync 2.3.5 cluster is it possible to define a
> relationship between two resources so that:
>
> 1. B depends on A (a normal order constraint)
>
> AND
>
> 2. If either fails, they both need to be stopped and restarted, in the
> order defined above (B stops, A stops, A starts, then B starts)
That would mean A depends on B, and B depends on A, probably hard to start
;-)
Reading the scenario below, I must admit that I don't fully understand the
problem:
If SAN "seeds in" an error to LVM, and then FS (and eventually to the
application), what will be the net effect?
Filesystem hanging? FS seeing write errors? etc.?
How would a recovery be done? Fix the SAN? Just wait?
Could it be that your multipath configuration is simply done wrong?
regards,
Ulrich
>
>
>
> In the normal configuration, if A fails, then A and B will be restarted,
> because B depends on A. However, if B fails, only B is restarted because A
> does not depend on it. In most cases this is going to be fine, but we have
a
> case where in some situations B is failing precisely because A above it is
> having a failure (but we don't know it yet).
>
>
>
> The order attribute takes care of the ordering of the start/stop (along with
> adding colocation so they stay on the same node).
>
>
>
> The problem I am trying to address is the case where the monitor for B fires
> first, and B is attempted to be restarted, but it won't work until A is.
>
>
>
> Case in point, LVM and Filesystem2 resources.
>
>
>
> If LVM needs to be refreshed, the Filesystem above it stops working (e.g.
> I/O fails). However, Filesystem noticed a problem first, and LVM didn't
have
> a chance to see it also had a problem. Therefore, Filesystem will try to
> restart itself until it exhausts its retries. At that point, a cleanup is
> required to get things going again, and LVM has to be manually restarted.
>
>
>
> We have a case where the LVM cache needs to be refreshed and the volumes
> reactivated to clear up a problem caused by paths going down and coming back
> up in a SAN causing the LVM VG to get in a compromised state, and the LVM
> problem causes the Filesystem I/O to fail, and Filesystem notices first,
> monitor fails, it stops itself, and tries in vain to restart, because it
will
> not until the LVM resource is restarted.
>
>
>
> I made the monitor interval longer for Filesystem than LVM which makes LVM
> find the problem first, but that isn't foolproof.
>
> If it was a rule that if a Filesystem resource needs to be stopped and
> started that the LVM resource it depends on has to be restarted first, I
> should be able to avoid the problem entirely.
>
>
>
> In essence, what I'm asking is if I can make two resource start and stop in
> a particular order, but also define that if one has to be started or stopped
> the other must as well (in my defined order).
>
>
>
> Thanks.
>
>
>
> Greg Neitzert | Lead Software Engineer | RTC Software Engineering 2B ‑
> Middleware
>
> Unisys Corp
More information about the Users
mailing list