A comment about HDS’s Zero Page Reclaim on one of my previous posts got me thinking about the effectiveness of thin provisioning in general. In that previous post, I talked about the trade-offs between increased storage utilization through the use of thin-provisioning and the potential performance problems associated with it.
There are intrinsic benefits that come with the use of thin provisioning. First, new storage can be provisioned for applications without nearly as much planning. Next, application owners get what they want, while storage admins can show they are utilizing the storage systems effectively. Also, rather than managing the growth of data in individual applications, storage admins are able to manage the growth of data across the enterprise as a whole.
Thin provisioning can also provide performance benefits… For example, consider a set of virtual Windows servers running across several LUNs contained in the same RAID group. Each Windows VM stores its OS files in the first few GB of their respective VMDK files. Each VMDK file is stored in order in each LUN, with some free space at the end. In essence, we have a whole bunch of OS sections separated by gaps of no data. If all VMs were booting at approximately the same time, the disk heads would have to move continuously across the entire disk, increasing disk latency.
Now take the same disks, configured as a thin pool, and create the same LUNs (as thin LUNs) and the same VMs. Because thin-provisioning in general only writes data to the physical disks as it’s being written by the application, starting from the beginning of the disk, all of those Windows VMs’ OS files will be placed at the beginning of the disks. This increased data locality will reduce IO latency across all of the VMs. The effect is probably minor, but reduced disk latency translates to possibly higher IOPS from the same set of physical disks. And the only change is the use of thin-provisioning.
So back to HDS Zero Page Reclaim. The biggest problem with thin provisioning is that it doesn’t stay thin for long. Windows NTFS, for example, is particularly NOT thin-friendly since it favors previously untouched disk space for new writes rather than overwriting deleted files. This activity eventually causes a thin-LUN to grow to it’s maximum size over time, even though the actual amount of data stored in the LUN may not change. And Windows isn’t the only one with the problem. This means that thin provisioning may make provisioning easier, or possibly improve IO latency, but it might not actually save you any money on disk. This is where HDS’s Zero Page Reclaim can help. Hitachi’s Dynamic Provisioning (with ZPR) can scan a LUN for sections where all the bytes are zero and reclaim that space for other thin LUNs. This is particularly useful for converting thick LUNs into thin LUNs. But, it can only see blocks of zeros, and so it won’t necessarily see space freed up by deleting files. Hitachi’s own documentation points out that many file systems are not-thin friendly, and ZPR won’t help with long-term growth of thin LUNs caused by actively writing and then deleting data.
Although there are ways to script the writing of zeros to free space on a server so that ZPI can reclaim that space, you would need to run that script on all of your servers, requiring a unique tool for each operating system in your environment. The script would also have to run periodically, since the file system will grow again afterward.
NetApp’s SnapDrive tool for Windows can scan an NTFS file system, detect deleted files, then report the associated blocks back to the Filer to be added back to the aggregate for use by other volumes/LUNs. The Space Reclamation scan can be run as needed, and I believe it can be scheduled; but, it appears to be Windows only. Again, this will have to be done periodically.
But what if you could solve the problem across most or all of your systems, regardless of operating system, regardless of application, with real-time reclamation? And what if you could simultaneously solve other problems? Enter Symantec’s Storage Foundation with Thin-Reclamation API. Storage Foundation consists of VxFS, VxVM, DMP, and some other tools that together provide dynamic grow/shrink, snapshots, replication, thin-friendly volume usage, and dynamic SAN multipathing across multiple operating systems. Storage Foundation’s Thin-Reclamation API is to thin-provisioning what OST is to Backup Deduplication. Storage vendors can now add near-real-time zero page reclaim for customers that are willing to deploy VxFS/VxVM on their servers. For EMC customers, DMP can replace PowerPath, thereby offsetting the cost.
As far as I know, 3PAR is the first and only storage vendor to write to Symantec’s thin-API, which means they now have the most dynamic, non-disruptive, zero-page-reclaim feature set on the market. As a storage engineer myself, I have often wondered if VxVM/VxFS could make management of application data storage in our diverse environment easier and more dynamic. Adding Thin-Reclamation to the mix makes it even more attractive. I’d like to see more storage vendors follow 3PAR’s lead and write to Symantec’s API. I’d also like to see Symantec open up both OST and the Thin-Reclamation API for others to use, but I doubt that will happen.