[ClusterLabs] Pacemaker starts with error on LVM resource

Sun Oct 1 18:39:24 CEST 2017

On Thu, 2017-09-28 at 18:05 +0300, Octavian Ciobanu wrote:
> Hello all.
> 
> I have a test configuration with 2 nodes that is configured as iSCSI
> storage.
> 
> I've created a master/slave DRBD resource and a group that has the
> following resources ordered as follow : 
>  - iSCSI TCP IP/port block (ocf::heartbeat:portblock)
>  - LVM (ocf::heartbeat:LVM)
>  - iSCSI IP (ocf::heartbeat:IPaddr2)
>  - iSCSI Target (ocf::heartbeat:iSCSITarget) for first LVM partition
>  - iSCSI LUN (ocf::heartbeat:iSCSILogicalUnit) for first LVM
> partition
>  - iSCSI Target (ocf::heartbeat:iSCSITarget) for second LVM partition
>  - iSCSI LUN (ocf::heartbeat:iSCSILogicalUnit) for second LVM
> partition
>  - iSCSI Target (ocf::heartbeat:iSCSITarget) for third LVM partition
>  - iSCSI LUN (ocf::heartbeat:iSCSILogicalUnit) for third LVM
> partition
>  - iSCSI TCP IP/port unBlock (ocf::heartbeat:portblock)
> 
> the LVM-iSCSI group has an order constraint on it to start after the
> DRBD resource as can be seen from pcs constraint list command
> 
> Ordering Constraints:
>   promote Storage-DRBD then start Storage (kind:Mandatory)
> Colocation Constraints:
>   Storage with Storage-DRBD (score:INFINITY) (with-rsc-role:Master)
> 
> All was OK till I've did an update from CentOS 7.3 to 7.4 via yum.
> 
> After the update every time I start the cluster I get this error:
> 
> Failed Actions:
> * Storage-LVM_monitor_0 on storage01 'unknown error' (1): call=22,
> status=complete, exitreason='LVM Volume ClusterDisk is not
> available',
>     last-rc-change='Thu Sep 28 19:16:57 2017', queued=0ms, exec=515ms
> * Storage-LVM_monitor_0 on storage02 'unknown error' (1): call=22,
> status=complete, exitreason='LVM Volume ClusterDisk is not
> available',
>     last-rc-change='Thu Sep 28 19:17:48 2017', queued=0ms, exec=746ms

The "_monitor_0" on these failures means they were the initial probes
of the resource, not a recurring monitor after it was started. Before
starting a resource, Pacemaker probes its current state on all nodes,
to make sure it matches what is expected.

"Ordered probes" is a long-desired enhancement, where Pacemaker
wouldn't probe a resource until all its dependencies are up. It's
trickier than it sounds though, so it hasn't been implemented yet
(except for resources on guest nodes ordered after the guest resource
starts, which just got added to the master branch).

I don't remember anything specific to iSCSI+LVM in 7.4, hopefully
someone else does.

> Even with this error after the DRBD resource start the LVM resource
> start as it should be on the DRBD master node.
> 
> I've did look on both nodes to see if LVM services got started by the
> system and disabled them and even mask-ed them to be sure that they
> will not start at all but with this changes I still get this error.
> 
> From what I see the cluster service tries to start LVM before the
> DRBD resource is started and fails as it dose not find the DRBD disk.
> 
> Any ideas on how to fix this ?
> 
> Best regards 
> Octavian Ciobanu