Tag Archives: ontap

Compression better than Dedup? NetApp Confirms!

Posted on by

The more I talk with customers, the more I find that the technical details of how something works is much less important than the business outcome it achieves.  When it comes to storage, most customers just want a device that will provide the capacity and performance they need, at a price they can afford–and it better not be too complicated.  Pretty much any vendor trying to sell something will attempt to make their solution fit your needs even if they really don’t have the right products.  It’s a fact of life, sell what you have.  Along these lines, there has been a lot of back and forth between vendors about dedup vs. compression technology and which one solves customer problems best.

After snapshots and thin provisioning, data reduction technology in storage arrays has become a big focus in storage efficiency lately; and there are two primary methods of data reduction — compression and deduplication.

While EMC has been marketing compression technology for block and file data in Celerra, Unified, and Clariion storage systems, NetApp has been marketing deduplication as the technology of choice for block and file storage savings.  But which one is the best choice?  The short answer is.. it depends.  Some data types benefit most from deduplication while others get better savings with compression.

Currently, EMC supports file compression on all EMC Celerra NS20, 40, 80, 120, 480, 960, VG2, and VG8 systems running DART 5.6.47.x+ and block compression on all CX4 based arrays running FLARE30.x+.  In all cases, compression is enabled on a volume/LUN level with a simple check box and processing can be paused, resumed, and disabled completely, uncompressing the data if desired.  Data is compressed out-of-band and has no impact on writes, with minimal overhead on reads.  Any or all LUN(s) and/or Filesystem(s) can be compressed if desired even if they existed prior to upgrading the array to newer code levels.

With the release of OnTap 8.0.1, NetApp has added support for in-line compression within their FAS arrays.  It is enabled per-FlexVol and as far as I have been able to determine, cannot be disabled later (I’m sure Vaughn or another NetApp representative will correct me if I’m wrong here.)  Compression requires 64-bit aggregates which are new in OnTap 8, so FlexVols that existed prior to an upgrade to 8.x cannot be compressed without a data migration which could be disruptive.  Since compression is inline, it creates overhead in the FAS controller and could impact performance of reads and writes to the data.

Vaughn Stewart, of NetApp, expertly blogged today about the new compression feature, including some of the caveats involved, and to me the most interesting part of the post was the following graphic he included showing the space savings of compression vs. dedup for various data types.

Image Credit: Vaughn Stewart, NetApp

The first thing that struck me was how much better compression performed over deduplication for all but one data type (Virtualization will usually fare well because in a typical environment there are many VMs with the same operating system files).  In fact, according to NetApp, deduplication achieves very little savings, if any, for the majority of the data types here.
 
The light green bar indicates savings with both dedupe AND compression enabled on the same dataset.  In 5 out of 9 cases, dedup adds ZERO savings over compression alone.  I can’t help but wonder why anyone would enable dedup on those data types if they already had compression, since both features use storage array CPU resources to find and compress or dedup data.  I am aware that in some cases, dedup can improve performance on NetApp systems due to dedup-aware cache, but I also believe that any performance gain is directly related to the amount of duplication in the data.  Using this chart, virtualization is really the only place where dedup seems particularly effective and hence the only place where real performance gains would likely present themselves.
 
The challenge for NetApp customers will be getting their data into a configuration that supports compression due to the 64-bit aggregate requirement, lack of an easy and non-disruptive LUN migration feature (DataMotion appears to only support iSCSI and NFS and requires several additional licenses), and no way to convert an aggregate from 32-bit to 64-bit.  Once compression has been enabled, if there is truly no way to disable it, any resulting performance impact will be very difficult to rectify.
 
On the other hand, any EMC customer with current maintenance can upgrade their NS or CX4 array to newer versions of DART or FLARE, and compression can be enabled on any existing data after the fact.  If performance becomes an issue for a particular dataset once compressed, the data can be uncompressed later.  Both operations are completely non-disruptive and run in the background.  While block compression only works on LUNs in a virtual pool, as opposed to a traditional RAID group, enabling compression on a normal LUN will automatically migrate the LUN into a virtual pool, perform zero-page reclaim, followed by compression, and the entire process is completely non-disruptive to the application.  Oh, and compressed data can still be tiered with FASTVP across SSD, FC, and SATA disk and/or benefit from up to 2TB of FASTCache.
 
I admit that there is a place for deduplication as well as compression in reducing the footprint of customer data.  However, based on what I’ve seen in my career as an IT professional, and with my customers in my current role at EMC, there are more use cases for compression than there are for deduplication when it comes to primary data, whether SAN or NAS.  Either way, if I was using a new technology for the first time on a particular data set, whether compression or deduplication, I would definitely want a backout plan in case the drawbacks outweight the benefits.

NetApp and EMC: ESX and Exchange 2007 CCR

Posted on by

The first application we tackled after deploying the NetApp system was Exchange 2007.  We had deployed Exchange 2007 recently, running in CCR clusters on VMWare ESX.  Since each node of a CCR cluster has it’s own copy of the database we wanted to put one node from each cluster onto the NetApp, leaving the other nodes on the Clariion.  This environment is entirely FiberChannel, no iSCSI deployed and as such the Exchange servers are using VMWare Raw Devices for the database and log disks.  This poses a problem that we didn’t discover until later which I will discuss in a future post about replicating Exchange with NetApp.

Re-Architecting the environment to fit the storage

The first thing we discovered was that neither IBM/NetApp nor EMC would support the same host HBAs zoned to multiple brands of storage.  So we had to split the ESX cluster into two clusters, one on each storage platform.  Luckily the Exchange environment was isolated on it’s own six node cluster so it was easy to split everything in half.

Next we learned that due to NetApp’s updated active/active mode with proxy paths in ONTap 7.3, VMWare ESX 3.x randomly selects paths when rescanning HBAs and will pick non-optimized paths to the LUNs.  This still works but is not ideal as it increases IO latency, causing the Filer to send autosupport emails periodically warning of the problem.  Installing the NetApp Host Utilities for ESX onto the ESX hosts themselves allows you to run a script that assigns persistent paths evenly across the HBAs.  The script works as advertised but as far as I can tell you have to run the script each time you add a new LUN to the ESX server.  It would be much better if it were more automated.

Actually, if you are running ESX4.0 the scenario changes since NetApp ONTap 7.3+, Clariion FLARE 26+, and ESX4 all support ALUA making this problem all but disappear and improving fabric resiliency. Unfortunately for us, ESX4 is still a bit new and hasn’t been rolled out into production yet.  NetApp also released tools for vCenter 4.0 that allow you to do the path assignment and other tasks from within vCenter rather than at the command line.  EMC also now has PowerPath available for ESX4.0 which will not only manage paths but load balance across all paths for increased performance and lower latency.

VirtualStorageGuy has blogged already about the NetApp/EMC/vSphere plug-ins and there is even a Powerpoint available.

Finally, during the sales process NetApp pushed their de-duplication features (A-SIS) quite a bit and stressed how much disk space we could save in a VMWare environment.  During deployment we were informed that if your VMs (VMDKs and VMFS) were not properly partition aligned de-duplication wouldn’t work well or at all.  Since this environment has several hundred VMs built over several years by many people, and aligning the system (C:) drive of a Windows VM is difficult, the benefit would be minimal for us.  Luckily NetApp has provided tools that can scan and align VMDKs without having to repartition the disks.  We have not tested this yet.  Partition Alignment is a best practice for ANY SAN storage system so we can’t fault NetApp for this problem; it’s just a fact of life.

But is it REALLY Redundant?

Even with two storage systems, with independent VMWare clusters, each hosting half of the Exchange cluster environment, a problem with either array could still take down and entire Exchange cluster.  This is due to the File Share Witness (FSW) component used in a Majority Node Set (MNS) cluster like Exchange CCR.  The idea behind the FSW in an MNS cluster is to prevent a condition known as Split Brain.  Since a MNS cluster does not have a quorum disk, it relies entirely on network communication between the nodes to determine cluster status and make decisions about which nodes should become active.  In the event that the two nodes lose communication with each other, each node will check for the FSW and if it is still available, it assumes that the other cluster node is down and proceeds to bring cluster resources online (if they weren’t already).  Without the FSW, both nodes would potentially go active and there could be issues with inconsistent data, etc.  This is the split-brain condition.

Typically, each cluster has a single FSW on a separate server (the CAS servers in our case).  With the redundancy storage model we moved to, the FSW became a single point of failure.  If we put the FSW on EMC storage with NodeA, and NodeB on the IBM/NetApp storage, a problem with the EMC array could take down both the cluster node AND the FSW at the same time.  The surviving cluster node on the IBM/NetApp array would go down or stay down to prevent split-brain since the FSW was not available.  Moving the FSW to the IBM/NetApp array presents the same problem on opposite side of the cluster.  Incidentally, we proved this problem in lab testing to be sure.  The solution is to move the FSW off of BOTH arrays, to either a dedicated physical server with internal disk, or a third storage array if you have one.  There was a second EMC array in production so we moved the FSW there.  In the new configuration, a complete outage of any single storage array would not take down the Exchange environment.

Crude diagram of the storage redundant Exchange CCR cluster

So far this new 3-way split environment is working fine, performance on the EMC and NetApp arrays is fine for Exchange.  Using the same number of disks on the NetApp array yields about twice as much usable space as the EMC due to RAID-DP vs RAID-10 but overall performance is similar.  Theoretically that means we could allow for more growth of the Exchange databases but in reality that is not always the case.  My next update will be about Exchange replication using SnapManager and SnapMirror and how that has effectively negated the remaining free space in the NetApp aggregate.