[ClusterLabs] trace of Filesystem RA does not log

Mon Oct 14 00:27:30 EDT 2019

On 10/11/19 10:32 PM,  Lentes, Bernd  wrote:
> Hi,
> 
> occasionally the stop of a Filesystem resource for an OCFS2 Partition fails to stop.

The stop failure is very bad, and is crucial for HA system.

You can try o2locktop cli to find the potential INODE to be blamed[1].

`o2locktop --help` gives you more usage details

[1] o2locktop package
https://software.opensuse.org/package/o2locktop?search_term=o2locktop

> I'm currently tracing this RA hoping to find the culprit.
> I'm putting one of both nodes into standby, hoping the error appears.
> Afterwards setting it online again and doing the same procedure with the other node.
> Of course now the error does not appear :-))
> But i don't find any files under /var/lib/heartbeat/trace_ra/Filesystem for a stop operation.
> Resource is part of a group which is cloned.
> 
> I configured the tracing with "crm resource trace fs_ocfs2 stop".
> 
> Result:
> primitive fs_ocfs2 Filesystem \
>          params device="/dev/vg_san/lv_ocfs2" directory="/mnt/ocfs2" fstype=ocfs2 \
>          params fast_stop=no force_unmount=true \
>          op monitor interval=30 timeout=20 \
>          op start timeout=60 interval=0 \
>          op stop timeout=60 interval=0 \
>          op_params trace_ra=1 \
>          meta is-managed=true target-role=Started
> 
> I expect log files for the stop operation in /var/lib/heartbeat/trace_ra/Filesystem.
> But i don't get any.
> 

Might be umount hung and has not time to flush log to disk.

Cheers
Roger