[ClusterLabs Developers] bug? in heartbeat/LVM OCF script

Tue Jun 21 19:20:59 UTC 2016

On 06/21/2016 11:58 AM, Vladislav Bogdanov wrote:
> 21.06.2016 20:03, Chris Friesen wrot:
>> On 06/20/2016 05:50 PM, Chris Friesen wrote:
>>>
>>> Hi,
>>>
>>> The heartbeat/LVM OCF script uses the following logic for the LVM_status()
>>> routine:
>>>
>>> if [ -d /dev/$1 ]; then
>>>      test "`cd /dev/$1 && ls`" != ""
>>>      rc=$?
>>>      if [ $rc -ne 0 ]; then
>>>          ocf_exit_reason "VG $1 with no logical volumes is not supported by
>>> this RA!"
>>>      fi
>>> fi
>>
>> <snip>
>>
>>> I think it would be better to query the activity directly, using something like
>>> "lvs -o name,selected  -S lv_active=active,vg_name=<volume_group>"
>>
>> I'm testing with the following code instead of the above snippet and it seems
>> to work okay:
>>
>>         # Ask lvm whether the volume group is active.  This maps to
>>         # the question "Are there any logical volumes that are active in
>>         # the specified volume group?".
>>         lvs --noheadings -o selected  -S lv_active=active,vg_name=${1}|grep -q 1
>
> This ^^^ has a big chance to timeout in both monitor and subsequent stop
> operations if clustered VG is used and clvmd is stuck because dlm is waiting for
> fencing (of another node) to finish.
> Or if (clustered) VG is created on an iSCSI/iSRP/FC/FCoE/etc block device which
> is not available for some period of time due to target/network problems.
>
> Both cases lead to fencing of all cluster nodes.

Got any suggestions on a better way to handle it?

The current code is flawed due to arguably-buggy LVM behaviour...the existance 
of a non-empty /dev/<volgroup> directory does not actually guarantee that the 
volume group is activated.

Chris