[ClusterLabs] OCFS2 fragmentation with snapshots

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue May 18 06:52:38 EDT 2021


Hi!

I thought using the reflink feature of OCFS2 would be just a nice way to make crash-consistent VM snapshots while they are running.
As it is a bit tricky to find out how much data is shared between snapshots, I started to write an utility to examine the blocks allocated to the VM backing files and snapshots.

Unfortunately (as it seems) OCFS2 fragments terribly under reflink snapshots.

Here is an example of a rather "good" file: It has 85 extents that are rather large (not that the extents are sorted by first block; in reality it's a bit worse):
DEBUG(5): update_stats: blk_list[0]: 3551627-3551632 (6, 0x2000)
DEBUG(5): update_stats: blk_list[1]: 3553626-3556978 (3353, 0x2000)
DEBUG(5): update_stats: blk_list[2]: 16777217-16780688 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[3]: 16780689-16792832 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[4]: 17301147-17304618 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[5]: 17304619-17316762 (12144, 0x2000)
...
DEBUG(5): update_stats: blk_list[81]: 31178385-31190528 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[82]: 31191553-31195024 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[83]: 31195025-31207168 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[84]: 31210641-31222385 (11745, 0x2001)
filesystem: 655360 blocks of size 16384
655360 (100%) blocks type 0x2000 (shared)

And here's a terrible example (33837 extents):
DEBUG(4): finalize_blockstats: blk_list[0]: 257778-257841 (64, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[1]: 257842-257905 (64, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[2]: 263503-263513 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[3]: 263558-263558 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[4]: 263559-263569 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[5]: 263587-263587 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[6]: 263597-263610 (14, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[7]: 270414-270415 (2, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[90]: 382214-382406 (193, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[91]: 382791-382918 (128, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[92]: 382983-382990 (8, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[93]: 383520-383522 (3, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[94]: 384672-384692 (21, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[95]: 384860-384918 (59, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[96]: 385088-385089 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[97]: 385090-385091 (2, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[805]: 2769213-2769213 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[806]: 2769214-2769214 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[807]: 2769259-2769259 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[808]: 2769261-2769261 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[809]: 2769314-2769314 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[810]: 2772041-2772042 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[811]: 2772076-2772076 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[812]: 2772078-2772078 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[813]: 2772079-2772080 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[814]: 2772096-2772096 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[815]: 2772099-2772099 (1, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[33829]: 39317682-39317704 (23, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33830]: 39317770-39317775 (6, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33831]: 39318022-39318045 (24, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33832]: 39318274-39318284 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33833]: 39318327-39318344 (18, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33834]: 39319157-39319166 (10, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33835]: 39319172-39319184 (13, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33836]: 39319896-39319936 (41, 0x2000)
filesystem: 1966076 blocks of size 16384
mapped=1121733 (57%)
1007658 (51%) blocks type 0x2000 (shared)
114075 (6%) blocks type 0x2800 (unwritten|shared)

So I wonder (while understanding the principle of copy-on-write for reflink snapshots):
Is there a way to avoid or undo the fragmentation?

Regards,
Ulrich



More information about the Users mailing list