[ClusterLabs] [cluster-lab] reboot standby node
Ken Gaillot
kgaillot at redhat.com
Mon Dec 12 23:23:40 CET 2016
On 12/11/2016 04:19 PM, Omar Jaber wrote:
> Hi all ,
>
> I have cluster contains three nodes with different sore for location
> constrain and I have group resource (it’s a service exsists in
> /etc/init.d/ folder)
>
> Running on the node the have the highest score for location
> constrain when I try to reboot one of the standby node I see
> when the standby node become up the resource stopped in master node
> and restart again after I check the pacemaker status I see the
> following error :
>
> "error: resource 'resource_name' is active on 2 nodes attempting
> recovery "
>
> Then I disables the pcs cluster service in boot t time in standby
> node by run the command "/_pcs_//cluster disable / " then I reboot the
> node and I see the resource is started in standby node ( because the
> resource stored in /etc/init.d folder)
>
> After that I run the pcs cluster service in standby node and I see
> the same error is generated
>
> "error: resource 'resource_name' is active on 2 nodes attempting
> recovery "
>
>
>
> The problem is without reboot standby node this problem not happen
> for example
>
> If I stop pcs cluster service in standby , run the resource in
> standby node , then I start pcs cluster
>
> The error "error: resource 'resource_name' is active on 2 nodes
> attempting recovery " not generated in this case.
Make sure your resource agent returns exit codes expected by Pacemaker:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes
In particular, if a monitor command returns 0 (OCF_SUCCESS), it means
the service is running.
When any node reboots, Pacemaker will "probe" the existing state of all
resources on it, by running a one-time monitor command. If the service
is not running, the command should return 7 (OCF_NOT_RUNNING).
So, I'm guessing that either the resource agent is wrongly returning 0
for monitor when the service is not actually running, or the node is
wrongly starting the service at boot, outside cluster control.
More information about the Users
mailing list