[Pacemaker] Pseudo RAs do not work properly on Corosync stack

Keisuke MORI keisuke.mori+ha at gmail.com
Tue Mar 16 04:26:14 EDT 2010


Hi,

Sorry for a bit long mail.
I'm going to describe the issue of the Subject: and would like to
suggest some changes to the agents package (and possibly Pacemaker, too).
I would be grad if you could give me your thought and comments.



A pseudo RA which creates a stat file under HA_RSCTMP
(/var/run/heartbeat/rsctmp), such as Dummy, MailTo, etc. do not
work properly on the Pacemaker+Corosync stack.

When a node crashed and was rebooted, a stale stat file is
left over the reboot and hence the RA misbehaves as if the
resource was already started when the cluster is launched again
for the recovery.

This problem does not occur on Heartbeat stack because
Heartbeat removes HA_RSCTMP when its startup,
while on Pacemaker stack none of Pacemaker/Corosync removes it.

But removing them by Pacemaker does not seem to be correct -
if they were removed at the cluster startup time then the
maintenance mode would no longer work properly.

In my understanding, the "correct" behavior is:
 - They should NOT be removed at the cluster startup time.
 - They should be removed at the OS bootup time.



My suggestion to address this issue is, to fix as the following;

 - 1) change the HA_RSCTMP location to /var/run/resource-agents,
      or wherever a subdirectory right under /var/run.
 - 2) having the directory permission as 01777 (with sticky bit)
 - 3) change IPaddr/SendArp RA not to use its own subdirectory
      but instead, add a prefix for the filename.
 - 4) make /var/run/heartbeat/rsctmp as obsolete;
      Heartbeat/Pacemaker could preserve the current behavior
      for a while for the compatibility.


The basic idea of the changes is that, we're now going to follow the
file removal procedure defined by FHS(Filesystem Hierarchy Standard).

http://www.pathname.com/fhs/pub/fhs-2.3.html#VARRUNRUNTIMEVARIABLEDATA

FHS defines that any files under a subdirectory of /var/run
should be removed at the OS bootup time.

Unfortunately the second level subdirectory is out of the scope and
you can not rely on the removal (and that's the case of
/var/run/heartbeat/rsctmp).


I believe that the impacts for existing RAs are minimum.
If your RA is implemented "correctly" then you need to do nothing -
just notice that the location of the stat file is changed.

If your RA has hardcoded /var/run/heartbeat/rsctmp, or it
creates its own subdirectory, it is encouraged to fix because it
may not work well with the maintenance mode, but you can
continue to use the old rsctmp if you would like.


I would like to hear your thought and comments.

Regards,
-- 
Keisuke MORI




More information about the Pacemaker mailing list