[ClusterLabs] Resolving cart before the horse with mounted filesystems.

Mon May 3 09:12:48 EDT 2021

On 5/2/21 11:10 PM, Andrei Borzenkov wrote:
> On 03.05.2021 06:27, Matthew Schumacher wrote:
>> On 4/30/21 12:08 PM, Matthew Schumacher wrote:
>>> On 4/30/21 11:51 AM, Ken Gaillot wrote:
>>>> On Fri, 2021-04-30 at 16:20 +0000, Strahil Nikolov wrote:
>>>>> Ken ment yo use 'Filesystem' resourse for mounting that NFS server
>>>>> and then clone that resource.
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>> I'm currently working on understanding and implementing this
>>> suggestion from Andrei:
>>>
>>> Which is exactly what clones are for. Clone NFS mount and order
>>> VirtualDomain after clone. Just do not forget to set interleave=true so
>>> VirtualDomain considers only local clone instance.
>> I tried to use this config, but it's not working for me.
>>
>> I have a group that puts together a ZFS mount (which starts an NFS
>> share), configures some iscsi stuff, and binds a failover IP address:
>>
>> group IP-ZFS-iSCSI fence-datastore zfs-datastore ZFSiSCSI failover-ip
>>
>> Then, I made a mount to that NFS server as a resource:
>>
>> primitive mount-datastore-nfs Filesystem \
>>      params device="<ip>:/datastore" directory="/datastore" fstype=nfs op
>> monitor timeout=40s interval=20s
>>
>> Then I made a clone of this:
>>
>> clone clone-mount-datastore-nfs mount-datastore-nfs meta interleave=true
>> target-role=Started
>>
>> So, in theory, the ZFS/NFS server is mounted on all of the nodes with
>> the clone config.  Now I define some orders to make sure stuff comes up
>> in order:
>>
>> order mount-datastore-before-vm-testvm Mandatory:
>> clone-mount-datastore-nfs vm-testvm
>> order zfs-datastore-before-mount-datastore Mandatory: IP-ZFS-iSCSI
>> clone-mount-datastore-nfs
>>
>> In theory, when a node comes on line, it should check to make sure
>> IP-ZFS-iSCSI is running somewhere in the cluster, then check the local
>> instance of mount-datastore-nfs to make sure he have the NFS mounts we
>> need, then start vm-testvm, however that doesn't work.  If I kill
>> pacemaker on one node, it's fenced, rebooted, and when it comes back I
>> note this in the log:
>>
>>
>> # grep -v  pacemaker /var/log/pacemaker/pacemaker.log
>> May 03 03:02:41  VirtualDomain(vm-testvm)[1300]:    INFO: Configuration
>> file /datastore/vm/testvm/testvm.xml not readable during probe.
>> May 03 03:02:41  VirtualDomain(vm-testvm)[1300]:    INFO: environment is
>> invalid, resource considered stopped
>> May 03 03:02:42  Filesystem(mount-datastore-nfs)[1442]:    INFO: Running
>> start for 172.25.253.110:/dev/datastore-nfs-stub on /datastore
>> May 03 03:02:45  VirtualDomain(vm-testvm)[2576]:    INFO: Virtual domain
>> testvm currently has no state, retrying.
>> May 03 03:02:46  VirtualDomain(vm-testvm)[2576]:    INFO: Domain testvm
>> already stopped.
>>
> It is impossible to comment basing on couple of random lines from log.
> You need to provide full log from DC and the node in question from the
> moment pacemaker was restarted.
>
> But the obvious answer - pacemaker runs probes when it starts and these
> probes run asynchronously. So this may be simply output of resource
> agent doing probe. In which case the result is correct - probe found out
> domain was not running.
>

You are right Andrei.  Looking at the logs:

May 03 03:02:41 node2 pacemaker-attrd     [1281] (attrd_peer_update)     
notice: Setting #node-unfenced[node2]: (unset) -> 1620010887 | from node1
May 03 03:02:41 node2 pacemaker-execd     [1280] 
(process_lrmd_get_rsc_info)     info: Agent information for 'vm-testvm' 
not in cache
May 03 03:02:41 node2 pacemaker-execd     [1280] 
(process_lrmd_rsc_register)     info: Cached agent information for 
'vm-testvm'
May 03 03:02:41 node2 pacemaker-controld  [1283] (do_lrm_rsc_op) info: 
Performing key=7:1:7:b8b0100c-2951-4d07-83da-27cfc1225718 
op=vm-testvm_monitor_0
May 03 03:02:41 node2 pacemaker-controld  [1283] (action_synced_wait) 
     info: VirtualDomain_meta-data_0[1288] exited with status 0
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_process_request) 
     info: Forwarding cib_modify operation for section status to all 
(origin=local/crmd/8)
May 03 03:02:41 node2 pacemaker-execd     [1280] 
(process_lrmd_get_rsc_info)     info: Agent information for 
'fence-datastore' not in cache
May 03 03:02:41 node2 pacemaker-execd     [1280] 
(process_lrmd_rsc_register)     info: Cached agent information for 
'fence-datastore'
May 03 03:02:41 node2 pacemaker-controld  [1283] (do_lrm_rsc_op) info: 
Performing key=8:1:7:b8b0100c-2951-4d07-83da-27cfc1225718 
op=fence-datastore_monitor_0
May 03 03:02:41  VirtualDomain(vm-testvm)[1300]:    INFO: Configuration 
file /datastore/vm/testvm/testvm.xml not readable during probe.
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_perform_op)     
info: Diff: --- 0.1608.23 2
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_perform_op)     
info: Diff: +++ 0.1608.24 (null)
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_perform_op)     
info: +  /cib:  @num_updates=24
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_perform_op)     
info: ++ /cib/status/node_state[@id='2']: <transient_attributes id="2"/>
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_perform_op)     
info: ++ <instance_attributes id="status-2">
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_perform_op)     
info: ++                                       <nvpair 
id="status-2-.node-unfenced" name="#node-unfenced" value="1620010887"/>
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_perform_op)     
info: ++ </instance_attributes>
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_perform_op)     
info: ++ </transient_attributes>
May 03 03:02:41 node2 pacemaker-based     [1278] (cib_process_request) 
     info: Completed cib_modify operation for section status: OK (rc=0, 
origin=node1/attrd/16, version=0.1608.24)
May 03 03:02:41  VirtualDomain(vm-testvm)[1300]:    INFO: environment is 
invalid, resource considered stopped

When node2 comes back from being fenced (testing a hard failure), it 
checks the status of vm-testvm because I previously did a "crm resouce 
move vm-testvm node2" so it's trying to put the VirtualDomain resource 
back on node2, but calling monitor finds that the config file is missing 
because the NFS mount isn't up yet, so it assumes the resource is 
stopped (it's not), then its confused:

May 03 03:02:45  VirtualDomain(vm-testvm)[2576]:    INFO: Virtual domain 
testvm currently has no state, retrying.
May 03 03:02:46  VirtualDomain(vm-testvm)[2576]:    INFO: Domain testvm 
already stopped.

Eventually it does end up stopped on node1 and started on node2.

Is there a way to configure the order so that we don't even run monitor 
until the dependent resource is running?

Is there a way to have a delayed start?

At the end of the day, the way VirtualDomain works has been very 
troublesome for me.  The second that the config file isn't available 
pacemaker thinks that the domain is down and starts kicking the stool 
from under things, even if the domain is running just fine. It seems to 
me that reading the config file is a poor way to test if it's working as 
it surely can be up even if the config file is missing, and because it's 
generated lots of false positives for me. I wonder why it was written 
this way.  Wouldn't it make more sense for monitor to get a status from 
virsh, and that we don't bother to look for the config file unless we 
are starting, and if its missing, we return failed to start?

Thanks for the help,
Matt