[ClusterLabs] [cluster-lab] reboot standby node

Ken Gaillot kgaillot at redhat.com
Mon Dec 12 17:23:40 EST 2016


On 12/11/2016 04:19 PM, Omar Jaber wrote:
> Hi all ,
> 
> I have cluster contains three  nodes  with different sore  for location 
> constrain and  I have  group resource (it’s a service  exsists  in
> /etc/init.d/  folder)
> 
> Running  on the  node  the  have  the highest score  for   location
> constrain when I  try to  reboot one  of  the standby node    I  see
> when the standby node become  up  the resource  stopped  in master node
> and restart again    after  I check the  pacemaker  status  I see the
> following error  :
> 
> "error: resource  'resource_name' is active on 2 nodes attempting
> recovery "  
> 
> Then I disables the  pcs  cluster  service in boot t time in standby
> node by run the command  "/_pcs_//cluster disable / " then I reboot the
> node  and I  see the resource  is started in standby node ( because  the
> resource  stored in /etc/init.d folder)
> 
> After that I  run the  pcs cluster  service  in standby node  and  I see
> the same  error is  generated  
> 
> "error: resource  'resource_name' is active on 2 nodes attempting
> recovery "
> 
>  
> 
> The problem  is  without reboot standby node this  problem not  happen
> for  example  
> 
> If  I stop pcs  cluster service  in standby , run the  resource  in
> standby node , then I start  pcs cluster
> 
> The error   "error: resource  'resource_name' is active on 2 nodes
> attempting recovery "   not  generated in this case.

Make sure your resource agent returns exit codes expected by Pacemaker:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes

In particular, if a monitor command returns 0 (OCF_SUCCESS), it means
the service is running.

When any node reboots, Pacemaker will "probe" the existing state of all
resources on it, by running a one-time monitor command. If the service
is not running, the command should return 7 (OCF_NOT_RUNNING).

So, I'm guessing that either the resource agent is wrongly returning 0
for monitor when the service is not actually running, or the node is
wrongly starting the service at boot, outside cluster control.




More information about the Users mailing list