[Pacemaker] crm resource restart doesn't restart the correct resource

Thu Nov 25 17:03:30 EST 2010

On Thu, 25 Nov 2010 07:09:28 -0500
Vadym Chepkov <vchepkov at gmail.com> wrote:

> 
> On Nov 25, 2010, at 7:01 AM, Pavlos Parissis wrote:
> 
> > On 25 November 2010 12:44, Vadym Chepkov <vchepkov at gmail.com> wrote:
> >> 
> >> On Nov 25, 2010, at 6:31 AM, Pavlos Parissis wrote:
> >> 
> >>> Hi,
> >>> When issue crm resource restart pbx_01 PE restarts the wrong resource.
> >>> The pbx_01 belongs to a resource group and the last resource of that
> >>> group is restarted.
> >> 
> >> This is why cluster has groups. groups define collocation/ordering, so if you
> >> stop a resource everything depending on it has to be stopped, and group
> >> describes this dependency.
> > If that was the case then sshd_01 should have been restarted it as well.
> 
> Well it tried, but failed, I see it in the log
Is this the log which you are referring to?

12:04:43 pbxsrv3 pengine: [6396]: notice: unpack_rsc_op: Hard error -
sshd_01_monitor_0 failed with rc=5: Preventing sshd_01 from
re-starting on pbxsrv2

this is normal, because sshd_01 is not supposed to run on pbxsrv2 node, it runs only on pbxsrv1 and pbxsrv3. This error is harmless according to this post http://www.gossamer-threads.com/lists/linuxha/pacemaker/67208#67208

If you read the log on DC, you think that pbx_01, sshd_01 were actually restarted
12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Stop resource
pbx_01      (pbxsrv1)
12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Stop resource
sshd_01     (pbxsrv1)
12:04:43 pbxsrv3 pengine: [6396]: notice: LogActions: Stop resource
mailAlert_01        (pbxsrv1)

but they weren't.

Cheers,
Pavlos