[ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

Wed May 30 02:24:50 UTC 2018

On Tue, 2018-05-29 at 13:09 -0600, Casey & Gina wrote:
> > On May 27, 2018, at 2:28 PM, Ken Gaillot <kgaillot at redhat.com>
> > wrote:
> > 
> > Pacemaker isn't fencing because the start failed, at least not
> > directly:
> > 
> > > May 22 23:57:24 [2196] d-gp2-dbpg0-2    pengine:     info:
> > > determine_op_status: Operation monitor found resource postgresql-
> > > 10-
> > > main:2 active on d-gp2-dbpg0-2
> > > May 22 23:57:24 [2196] d-gp2-dbpg0-2    pengine:   notice:
> > > LogActions:  Demote  postgresql-10-main:1    (Master -> Slave d-
> > > gp2-
> > > dbpg0-1)
> > > May 22 23:57:24 [2196] d-gp2-dbpg0-2    pengine:   notice:
> > > LogActions:  Recover postgresql-10-main:1    (Master d-gp2-dbpg0-
> > > 1)
> > 
> > From the above, we can see that the initial probe after the node
> > rejoined found that the resource was already running in master mode
> > there (at least, that's what the agent thinks). So, the cluster
> > wants
> > to demote it, stop it, and start it again as a slave.
> 
> Well, it was running in master node prior to being power-
> cycled.  However my understanding was that PAF always tries to
> initially start PostgreSQL in standby mode.  There would be no reason
> for it to promote node 1 to master since node 2 has already taken
> over the master role, and there is no location constraint set that
> would cause it to try to move this role back to node 1 after it
> rejoins the cluster.
> 
> Jehan-Guillaume wrote:  "on resource start, PAF will create the
> "PGDATA/recovery.conf" file based on your template anyway. No need to
> create it
> yourself.".  The recovery.conf file being present upon PostgreSQL
> startup is what makes it start in standby mode.
> 
> Since no new log output is ever written to the PostgreSQL log file,
> it does not seem that it's ever actually doing anything to try to
> start the resource.  The recovery.conf doesn't get copied in, and no
> new data appears in the PostgreSQL log.  As far as I can tell,
> nothing ever happens on the rejoined node at all, before it gets
> fenced.
> 
> How can I tell what the resource agent is trying to do behind the
> scenes?  Is there a way that I can see what command(s) it is trying
> to run, so that I may try them manually?

The standard ocf:heartbeat agents will turn on "set -x" (i.e. printing
out every single line executed) when the OCF_TRACE_RA environment
variable is set to 1.

You can debug like this:

1. Unmanage the resource in pacemaker, so you can mess with it
manually.

2. Cause the desired failure for testing. Pacemaker should detect the
failure, but not do anything about it.

3. Run crm_resource with the -VV option and --force-* with whatever
action you want to attempt (in this case, demote or stop). The -VV (aka
--verbose --verbose) will turn on OCF_TRACE_RA. The --force-* command
will read the resource configuration and do the same thing pacemaker
would do to execute the command.

It will be ... verbose, but if you really want to see the commands the
agent is doing, that should do it.

> > But the demote failed
> 
> I reckon that it probably couldn't demote what was never started.
> 
> > But the stop fails too
> 
> I guess that it can't stop what is already stopped?  Although, I'm
> surprised that it would error in this case, instead of just realizing
> that it was already stopped...
> 
> > 
> > > May 22 23:57:24 [2196] d-gp2-dbpg0-2    pengine:  warning:
> > > pe_fence_node:       Node d-gp2-dbpg0-1 will be fenced because of
> > > resource failure(s)
> > 
> > which is why the cluster then wants to fence the node. (If a
> > resource
> > won't stop, the only way to recover it is to kill the entire node.)
> 
> But the resource is *never started*!?  There is never any postgres
> process running, and nothing appears in the PostgreSQL log file.  I'm
> really confused as to why pacemaker thinks it needs to fence
> something that's never running at all...  I guess what I need is to
> somehow figure out what the resource agent is doing that makes it
> think the resource is already active; is there a way to do this?
> 
> It would be really helpful, if somewhere within this verbose logging,
> were an indication of what commands were actually being run to
> monitor, start, stop, etc. as it seems like a black box.
> 
> I'm wondering if some stale PID file is getting left around after the
> hard reboot, and that is what the resource agent is checking instead
> of the actual running status, but I would hope that the resource
> agent would be smarter than that.
> 
> Thanks,
-- 
Ken Gaillot <kgaillot at redhat.com>