The Role of Defragmentation in the Cloud – Releasing trapped value

Imported from published November 21 2010

One of the questions that we get from clients when moving to the Private Cloud is do we still need to do things as we did in the physical world?

Defragmentation of the file systems (FS) within guest operating systems (OS) containerized in a virtual machine (VM) always comes up. From my own enterprise messaging and database background, this was a very important question to get answered upfront. There could be tremendous negative consequences from not doing proper housekeeping and performing defragmentation (either offline or online while service was running). This essentially represents trapped value to the business.

There are many great vendors out there in this VM defragmentation space, for example, Diskeeper’s V-locity2 or Raxco’s PerfectDisk. They make a good case for defragmentation, essentially pointing to the fact that:


1. Many industry sources point to the negative impact of fragmentation


2. Fragmentation increases the time taken to read/write a file


3. Extra system workload resulting from fragmentation


4. Free space consolidation is very important for improving write operations


5. Fragmentation contributes to higher I/O bandwidth needs


6. Resource contention for I/O by VMs


7. VM disks perpetually growing even when deleting data

The fragmentation issue, whether of files or free space, has symptoms analogous to the performance issue VMware identified with misalignment of Guest OS partitions and indeed of VMFS itself. Essentially, much more unnecessary work is being done by the ESX host server and the corresponding storage elements.

The array and vSphere 4.1 features, help reduce the impact of these issues through I/O coalescing and utilizing array cache to get larger sequences of updates bundled – contiguous writing – EMC VMAX can provide 1TByte of cache currently. Multipathing tools such as EMC PowerPath/VE alleviates the increased I/O load through queue balancing and utilizing all paths to the storage array concurrently.

Thin provisioning ensures ‘Just-In-Time’ space allocation to virtual disks. This is heavily enhanced at the array hardware with complementary technologies as EMC's FAST to further optimize storage price/performance economics. This is also changing through the VMware vStorage VAAI such that vSphere is offloading storage to, surprise suprise, storage tiers that simply do the job better.

However, these do not proactively cure fragmentation within the guest OS or indeed at the VMFS level.

Indeed when we start thinking about environments with hundreds of thousands of VMs, such as in desktop virtualization, using VMware Linked Clones, this issue needs to be tackled. Virtual disk compaction represents an important element here to ensure online compaction capability; space reclaimed, and trapped value released.

The ability to use FAST, can support defragmentation scenarios by shifting the workload onto Solid state drives (SSD) for the duration of the high I/O activity. The array will then move the corresponding sub-LUN elements back to the appropriate tier later. Many customers do this with scripts.

Essentially, using Storage vMotion, the VM could be moved to a datastore on high performance disks, and then use the guest OS internal defragmentation tools. Once completed, the VM is storage vMotion’d back to its datastore. Seems easy enough to do for small numbers of machines, but does not scale to Cloud levels - doing this continuously for large VM volumes.

The whole area of scheduling defragmentation cycles, across an entire virtual infrastructure Cloud estate, is also no trivial task. Tools are needed. The current tool generation operate within the Guest OS. VMFS also warrants an examination, although with the ability to utilize 8MB block sizes, there is less fragmentation taking place at the VMDK level – but this is still worlds away from a self-defragmenting file system!

After all, in a busy Cloud environment, the datastores are heavily used. VMs are created and removed. This eventually causes fragmentation. Whether that is an issue for the Cloud environment – well it is still too early to say I believe.

My own view is that some of the best practices regarding defragmentation of the past are still relevant, but need to be updated with the current generation of applications. For example, Exchange 2000/2003 issues are different in scale than in Exchange 2007/2010. It’s the application stack that still counts as that is delivering service to end users. On the other hand, implementing thousands of defragmenting tools in guest OS VMs is also not my idea of fun, and cost may well be prohibitive. Side effects such as large growth in redo log files of any sort when defragmentation takes place also needs to be considered.

I’d like to see VMware create a defragmentation API integrated with the vStorage VAAI APIs for array awareness, much as they have for anti-virus scanning using the VMSafe API. This would allow the defrag engines to hook into the ability of the hypervisor itself and get the array to offload some of these tasks. That would also provide a consistent interface for vendors to design against, and defragmentation can then be a thing of the past - regardless of the guest OS running in the VM. The Cloud should just deal with it, when it is needed!


The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.