[Pacemaker] Primitive stuck after resource agent failure?

Fri Feb 18 11:41:14 EST 2011

Hi,

On Fri, Feb 18, 2011 at 10:59:52AM -0500, Jody McIntyre wrote:
> [Sorry for the partial message I sent earlier.  Here's the full one.]
> 
> I am attempting to write my own resource agent to support postgres WAL log
> shipping.

Did you consider improving the existing resource agent? We do
accept contributions. But we also dislike duplicating effort and
resource agents functionality.

>  My PostgreSQL primitive is currently stuck in a FAILED state due
> to a bug in the resource agent script that I have fixed, and I can't figure
> out how to get the primitive working again.
> 
> I tried moving it to another node:
> root at trustcentric2:~# crm resource move PostgreSQL trustcentric1
> 
> This does not give an error, but the primitive is still on trustcentric2:
> 
> root at trustcentric1:~# crm_mon -1
> ============
> Last updated: Fri Feb 18 07:49:13 2011
> Stack: Heartbeat
> Current DC: trustcentric2 (28ebee49-31c7-419e-a29a-c939c3a241bd) - partition
> with quorum
> Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
> 
> Online: [ trustcentric1 trustcentric2 ]
> 
>  ClusterIP      (ocf::heartbeat:IPaddr2) Started [      trustcentric1
> trustcentric2 ]
>  PostgreSQL     (ocf::trustcentric:postgresql): Started trustcentric2
> (unmanaged) FAILED
> 
> Failed actions:
>     PostgreSQL_start_0 (node=trustcentric2, call=10, rc=-2, status=Timed
> Out): unknown exec error
>     PostgreSQL_stop_0 (node=trustcentric2, call=11, rc=1, status=complete):
> unknown error
> 
> How do I get PostgreSQL running again?  I have attached an XML dump.

Did you try "crm resource cleanup"?

Thanks,

Dejan

> Thanks,
> Jody