[ClusterLabs] Antw: [EXT] Re: OCFS2 fragmentation with snapshots

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu May 20 02:29:17 EDT 2021


>>> Gang He <ghe at suse.com> schrieb am 20.05.2021 um 07:46 in Nachricht
<f6eb92ce-930b-e3ae-f8de-961b37da80a8 at suse.com>:
> Hi Ulrich,
> 
> 
> 
> On 2021/5/18 18:52, Ulrich Windl wrote:
>> Hi!
>> 
>> I thought using the reflink feature of OCFS2 would be just a nice way to 
> make crash-consistent VM snapshots while they are running.
>> As it is a bit tricky to find out how much data is shared between snapshots, 
> I started to write an utility to examine the blocks allocated to the VM 
> backing files and snapshots.
>> 
>> Unfortunately (as it seems) OCFS2 fragments terribly under reflink 
> snapshots.
>> 
>> Here is an example of a rather "good" file: It has 85 extents that are 
> rather large (not that the extents are sorted by first block; in reality it's 
> a bit worse):
>> DEBUG(5): update_stats: blk_list[0]: 3551627-3551632 (6, 0x2000)
>> DEBUG(5): update_stats: blk_list[1]: 3553626-3556978 (3353, 0x2000)
>> DEBUG(5): update_stats: blk_list[2]: 16777217-16780688 (3472, 0x2000)
>> DEBUG(5): update_stats: blk_list[3]: 16780689-16792832 (12144, 0x2000)
>> DEBUG(5): update_stats: blk_list[4]: 17301147-17304618 (3472, 0x2000)
>> DEBUG(5): update_stats: blk_list[5]: 17304619-17316762 (12144, 0x2000)
>> ...
>> DEBUG(5): update_stats: blk_list[81]: 31178385-31190528 (12144, 0x2000)
>> DEBUG(5): update_stats: blk_list[82]: 31191553-31195024 (3472, 0x2000)
>> DEBUG(5): update_stats: blk_list[83]: 31195025-31207168 (12144, 0x2000)
>> DEBUG(5): update_stats: blk_list[84]: 31210641-31222385 (11745, 0x2001)
>> filesystem: 655360 blocks of size 16384
>> 655360 (100%) blocks type 0x2000 (shared)
>> 
>> And here's a terrible example (33837 extents):
>> DEBUG(4): finalize_blockstats: blk_list[0]: 257778-257841 (64, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[1]: 257842-257905 (64, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[2]: 263503-263513 (11, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[3]: 263558-263558 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[4]: 263559-263569 (11, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[5]: 263587-263587 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[6]: 263597-263610 (14, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[7]: 270414-270415 (2, 0x2000)
>> ...
>> DEBUG(4): finalize_blockstats: blk_list[90]: 382214-382406 (193, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[91]: 382791-382918 (128, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[92]: 382983-382990 (8, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[93]: 383520-383522 (3, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[94]: 384672-384692 (21, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[95]: 384860-384918 (59, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[96]: 385088-385089 (2, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[97]: 385090-385091 (2, 0x2000)
>> ...
>> DEBUG(4): finalize_blockstats: blk_list[805]: 2769213-2769213 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[806]: 2769214-2769214 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[807]: 2769259-2769259 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[808]: 2769261-2769261 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[809]: 2769314-2769314 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[810]: 2772041-2772042 (2, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[811]: 2772076-2772076 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[812]: 2772078-2772078 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[813]: 2772079-2772080 (2, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[814]: 2772096-2772096 (1, 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[815]: 2772099-2772099 (1, 0x2000)
>> ...
>> DEBUG(4): finalize_blockstats: blk_list[33829]: 39317682-39317704 (23, 
> 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[33830]: 39317770-39317775 (6, 
> 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[33831]: 39318022-39318045 (24, 
> 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[33832]: 39318274-39318284 (11, 
> 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[33833]: 39318327-39318344 (18, 
> 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[33834]: 39319157-39319166 (10, 
> 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[33835]: 39319172-39319184 (13, 
> 0x2000)
>> DEBUG(4): finalize_blockstats: blk_list[33836]: 39319896-39319936 (41, 
> 0x2000)
>> filesystem: 1966076 blocks of size 16384
>> mapped=1121733 (57%)
>> 1007658 (51%) blocks type 0x2000 (shared)
>> 114075 (6%) blocks type 0x2800 (unwritten|shared)
>> 
>> So I wonder (while understanding the principle of copy-on-write for reflink 
> snapshots):
>> Is there a way to avoid or undo the fragmentation?
> 
> Since these files(the original file and the cloned files) share the same 
> extent tree, after the files are written,the extents are split(fragmented).
> There is a un-fragmentation tool in ocfs2-tools upstream, but it 
> obviously do not work for this case(reflink file).
> The workaround is, copy the cloned(have fragmentated) file to a new 
> file, and delete the cloned file.

Thanks for answering!

I was wondering like this: If a filesystem allocates new blocks, it typically reserves a small range of sequential blocks.
Now when you snapshot a file, all block allocations are shared. My idea was something like this:
If one of those blocks is unshared, wouldn't it be wise to reserve a few blocks after or arounbd the new unshared block in case other blocks near the unshared one have to be unshared?
That is if a consecutive range of blocks is unshared the new copy wouldn't fragment so badly.
I'm aware that this reservation would reduce the amount of free space available.

On defragmentation: I'm also aware that defragmenting a file with shared blocks the wrong way would lead to complete unsharing of the blocks, so a clever tool would defragment only the unshared blocks. Still this is somewhat agains my first idea, as additional blocks to unshare would add extra fragmentation after having defragmented the nonshared blocks.

Regards,
Ulrich

> 
> Thanks
> Gang
> 
>> 
>> Regards,
>> Ulrich
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>> 





More information about the Users mailing list