[ClusterLabs] Why Won't Resources Move?

Wed Aug 1 17:17:23 EDT 2018

On Wed, 2018-08-01 at 03:49 +0000, Eric Robinson wrote:
> I have what seems to be a healthy cluster, but I can’t get resources
> to move.
>  
> Here’s what’s installed…
>  
> [root at 001db01a cluster]# yum list installed|egrep "pacem|coro"
> corosync.x86_64                  2.4.3-2.el7_5.1                
> @updates
> corosynclib.x86_64               2.4.3-2.el7_5.1                
> @updates
> pacemaker.x86_64                 1.1.18-11.el7_5.3              
> @updates
> pacemaker-cli.x86_64             1.1.18-11.el7_5.3              
> @updates
> pacemaker-cluster-libs.x86_64    1.1.18-11.el7_5.3              
> @updates
> pacemaker-libs.x86_64            1.1.18-11.el7_5.3              
> @updates
>  
> Cluster status looks good…
>  
> [root at 001db01b cluster]# pcs status
> Cluster name: 001db01ab
> Stack: corosync
> Current DC: 001db01b (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
> partition with quorum
> Last updated: Wed Aug  1 03:44:47 2018
> Last change: Wed Aug  1 03:22:18 2018 by root via cibadmin on
> 001db01a
>  
> 2 nodes configured
> 11 resources configured
>  
> Online: [ 001db01a 001db01b ]
>  
> Full list of resources:
>  
> p_vip_clust01  (ocf::heartbeat:IPaddr2):       Started 001db01b
> p_azip_clust01 (ocf::heartbeat:AZaddr2):       Started 001db01b
> Master/Slave Set: ms_drbd0 [p_drbd0]
>      Masters: [ 001db01b ]
>      Slaves: [ 001db01a ]
> Master/Slave Set: ms_drbd1 [p_drbd1]
>      Masters: [ 001db01b ]
>      Slaves: [ 001db01a ]
> p_fs_clust01   (ocf::heartbeat:Filesystem):    Started 001db01b
> p_fs_clust02   (ocf::heartbeat:Filesystem):    Started 001db01b
> p_vip_clust02  (ocf::heartbeat:IPaddr2):       Started 001db01b
> p_azip_clust02 (ocf::heartbeat:AZaddr2):       Started 001db01b
> p_mysql_001    (lsb:mysql_001):        Started 001db01b
>  
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
>  
> Constraints look like this…
>  
> [root at 001db01b cluster]# pcs constraint
> Location Constraints:
> Ordering Constraints:
>   promote ms_drbd0 then start p_fs_clust01 (kind:Mandatory)
>   promote ms_drbd1 then start p_fs_clust02 (kind:Mandatory)
>   start p_fs_clust01 then start p_vip_clust01 (kind:Mandatory)
>   start p_vip_clust01 then start p_azip_clust01 (kind:Mandatory)
>   start p_fs_clust02 then start p_vip_clust02 (kind:Mandatory)
>   start p_vip_clust02 then start p_azip_clust02 (kind:Mandatory)
>   start p_vip_clust01 then start p_mysql_001 (kind:Mandatory)
> Colocation Constraints:
>   p_azip_clust01 with p_vip_clust01 (score:INFINITY)
>   p_fs_clust01 with ms_drbd0 (score:INFINITY) (with-rsc-role:Master)
>   p_fs_clust02 with ms_drbd1 (score:INFINITY) (with-rsc-role:Master)
>   p_vip_clust01 with p_fs_clust01 (score:INFINITY)
>   p_vip_clust02 with p_fs_clust02 (score:INFINITY)
>   p_azip_clust02 with p_vip_clust02 (score:INFINITY)
>   p_mysql_001 with p_vip_clust01 (score:INFINITY)
> Ticket Constraints:
>  
> But when I issue a move command, nothing at all happens.
>  
> I see this in the log on one node…
>  
> Aug 01 03:21:57 [16550] 001db01b        cib:     info:
> cib_perform_op:  ++ /cib/configuration/constraints:  <rsc_location
> id="cli-prefer-ms_drbd0" rsc="ms_drbd0" role="Started"
> node="001db01a" score="INFINITY"/>
> Aug 01 03:21:57 [16550] 001db01b        cib:     info:
> cib_process_request:     Completed cib_modify operation for section
> constraints: OK (rc=0, origin=001db01a/crm_resource/4,
> version=0.138.0)
> Aug 01 03:21:57 [16555] 001db01b       crmd:     info:
> abort_transition_graph:  Transition aborted by rsc_location.cli-
> prefer-ms_drbd0 'create': Configuration change | cib=0.138.0
> source=te_update_diff:456 path=/cib/configuration/constraints
> complete=true
>  
> And I see this in the log on the other node…
>  
> notice: p_drbd1_monitor_60000:69196:stderr [ Error signing on to the
> CIB service: Transport endpoint is not connected ]

The message likely came from the resource agent calling crm_attribute
to set a node attribute. That message usually means the cluster isn't
running on that node, so it's highly suspect. The cib might have
crashed, which should be in the log as well. I'd look into that first.

>  
> Any thoughts?
>  
> --Eric
-- 
Ken Gaillot <kgaillot at redhat.com>