Monthly Archives: October 2010

EMC Unified: Guaranteed Efficiency with Better Application Availability

Posted on by

(Warning: This is a long post…)

You have a critical application that you can’t afford to lose:

So you want to replicate your critical applications because they are, well, critical.   And you are looking at the top midrange storage vendors for a solution.  NetApp touts awesome efficiency, awesome snapshots, etc while EMC is throwing considerable weight behind it’s 20% Efficiency Guarantee.  While EMC guarantees to be 20% more efficient in any unified storage solution, there is perhaps no better scenario than a replication solution to prove it.

I’m going to describe a real-world scenario using Microsoft Exchange as the example application and show why the EMC Unified platform requires less storage, and less WAN bandwidth for replication, while maintaining the same or better application availability vs. a NetApp FAS solution.  The example will use a single Microsoft Exchange 2007 SP2 server with ten 100GB mail databases connected via FibreChannel to the storage array.  A second storage array exists in a remote site connected via IP to the primary site and a standby Exchange server is attached to that array.

Basic Assumptions:

  • 100GB per database, 1 database per storage group, 1 storage group per LUN, 130GB LUNs
  • 50GB Log LUNs, ensure enough space for extra log creation during maintenance, etc
  • 10% change rate per day average
  • Nightly backup truncates logs as required
  • Best Practices followed by all vendors
  • 1500 users (Heavy Users 0.4IOPS), 10% of users leverage Blackberry (BES Server = 4X IOPS per user)
  • Approximate IOPS requirement for Exchange: 780IOPS for this server.
  • EMC Solution: 2 x EMC Unified Storage systems with SnapView/SANCopy and Replication Manager
  • NetApp Solution: 2 x NetApp FAS Storage systems with SnapMirror and SnapManager for Exchange
  • RPO: 4 hours (remote site replication update frequency)

Based on those assumptions we have 10 x 130GB DB LUNs and 10 x 50GB Log LUNs and we need approximately 780 host IOPS 50/50 read/write from the backend storage array.

Disk IOPS calculation: (50/50 read/write)

  • RAID10, 780 host IOPS translates to 1170 disk IOPS (r+w*2)
  • RAID5, 780 host IOPS translates to 1950 disk IOPS (r+w*4)
  • RAIDDP is essentially RAID6 so we have about 2730 disk IOPS (r + w*6)

Note: NetApp can create sequential stripes on writes to improve write performance for RAIDDP but that advantage drops significantly as the volumes fill up and free space becomes fragmented which is extremely likely to happen after a few months or less of activity.

Assuming 15K FiberChannel drives can make 180 IOPS with reasonable latencies for a database we’d need:

  • RAID10, Database 6.5 disks (round up to 8), using 450GB 15K drives =  1.7TB usable (1 x 4+4)
  • RAID5, 10.8 disks for RAID5 (round up to 12), using 300GB 15K drives = 2.8TB usable (2 x 5+1)
  • RAID6/DP, 15.1 disks for RAID6 (round up to 16), using 300GB 15K drives = 3.9TB usable (1 x 14+2)

Log writes are highly cachable so we generally need fewer disks; for both the RAID10 and RAID5 EMC options we’ll use a single RAID1 1+1 raid group with 2 x 600GB 15K drives.  Since we can’t do RAID1 or RAID10 on NetApp we’ll have to use at least 3 disks (1 data and 2 parity) for the 500GB worth of Log LUNs but we’ll actually need more than that.

Picking a RAID Configuration and Sizing for snapshots:

For EMC, the RAID10 solution uses fewer disks and provides the most appropriate amount of disk space for LUNs vs. the RAID5 solution.  With the NetApp solution there really isn’t another alternative so we’ll stick with the 16 disk RAID-DP config.  We have loads of free space but we need some of that for snapshots which we’ll see next.  We also need to allocate more space to the Log disks for those snapshots.

Since we expect about 10% change per day in the databases (about 10GB per database) we’ll double that to be safe and plan for 20GB of changes per day per LUN (DB and Log).

NetApp arrays store snapshot data in the same volume (FlexVol) as the application data/LUN so you need to size the FlexVol’s and Aggregates appropriately.  We need 200GB for the DB LUNs and 200GB for the Log LUNs to cover our daily change rate but we’re doubling that to 400GB each to cover our 2 day contingency.  In the case of the DB LUNs the aggregate has more than enough space for the 400GB of snapshot data we are planning for but we need to add 400GB to the Log aggregate as well so we need 4 x 600GB 15K drives to cover the Exchange logs and snapshot data.

EMC Unified arrays store snapshot data for all LUNs in centralized location called the Reserve LUN Pool or RLP.  The RLP actually consists of a number of LUNs that can be used and released as needed by snapshot operations occurring across the entire array.  The RLP LUNs can be created on any number of disks, using any RAID type to handle various IO loads and sizing an RLP is based on the total change rate of all simultaneously active snapshots across the array.  Since we need 400GB of space in the Reserve LUN Pool for one day of changes, we’ll again be safe by doubling that to 800GB which we’ll provide with 6 dedicated 300GB 15K drives in RAID10.

At this point we have 20 disks on the NetApp array and 16 disks on the EMC array.  We have loads of free space in the primary database aggregate on the NetApp but we can’t use that free space because it’s sized for the IOPS workload we expect from the Exchange server.

In order to replicate this data to an alternate site, we’ll configure the appropriate tools.

EMC:

  1. Install Replication Manager on a server and deploy an agent to each Exchange server
  2. Configure SANCopy connectivity between the two arrays over the IP ports built-in to each array
  3. In Replication Manager, Configure a job that quiesces Exchange, then uses SANCopy to incrementally update a copy of the database and log LUNs on the remote array and schedule for every 4 hours using RM’s built in scheduler.

NetApp:

  1. Install SnapManager for Exchange on each Exchange server
  2. Configure SnapMirror connectivity betweeen the two arrays over the IP ports built-in to each array
  3. In SnapManager, Configure a backup job that quiesces Exchange and takes a Snapshot of the Exchange DBs and Logs, then starts a SnapMirror session to replicate the updated FlexVol (including the snapshot) to the remote array.  Configure a schedule in Windows Task Manager to run the backup job every 4 hours.

Both the EMC and NetApp solutions run on schedule, create remote copies, and everything runs fine, until...

Tuesday night during the weekly maintenance window, the Exchange admins decide to migrate half of the users from DB1, to DB2 and DB3 and half of the users from DB4, to DB5 and DB6.  About 80GB of data is moved (25GB to each of the target DBs.)  The transactions logs on DB1 and DB4 jump to almost 50GB, 35GB each on DB2, DB3, DB5, and DB6.

On the NetApp array, the 50GB log LUNs already have about 10GB of snapshot data stored and as the migration is happening, new snapshot data is tracked on all 6 of the affected DB and Log LUNs.  The 25GB of new data plus the 10GB of existing data exceeds the 20GB of free space in the FlexVol that each LUN is contained in and guess what…  Exchange chokes because it can no longer write to the LUNs.

There are workarounds: First, you enable automatic volume expansion for the FlexVols and automatic Snapshot deletion as a secondary fallback.  In the above scenario, the 6 affected FlexVols autoextend to approximately 100GB each equaling 300GB of snapshot data for those 6 LUNs and another 40GB for the remaining 4 LUNs.  There is only 60GB free in the aggregate for any additional snapshot data across all 10 LUNs.  Now, SnapMirror struggles to update the 1200GB of new data (application data + snapshot data) across the WAN link and as it falls behind more data changes on the production LUNs increasing the amount of snapshot data and the aggregate runs out of space.  By default, SnapMirror snapshots are not included in the “automatically delete snapshots” option so Exchange goes down.  You can set a flag to allow SnapMirror owned snapshots to be automatically deleted but then you have to resync the databases from scratch.  In order to prevent this problem from ever occurring, you need to size the aggregate to handle >100% change meaning more disks.

Consider how the EMC array handles this same scenario using SANCopy.  The same changes occur to the databases and approximately 600GB of data is changed across 12 LUNs (6 DB and 6 Log).  When the Replication Manager job starts, SANCopy takes a new snapshot of all of the blocks that just changed for purposes of the current update and begins to copy those changed blocks across the WAN.

EMC Advantages:

  • SANCopy/Inc is not tracking the changes that occur AS they occur, only while an update is in process so the Reserve LUN Pool is actually empty before the update job starts.  If you want additional snapshots on top of the ones used for replication, that will increase the amount of data in the Reserve LUN Pool for tracking changes, but snapshots are created on both arrays independently and the snapshot data is NOT replicated.  This nuance allows you to have different snapshot schedules in production vs. disaster recovery for example.
  • Because SANCopy/Inc only replicates the blocks that have changed on the production LUNs, NOT the snapshot data, it copies only half of the data across the WAN vs SnapMirror which reduces the time out of sync.  This translates to lower WAN utilization AND a better RPO.
  • IF an update was occurring when the maintenance took place, the amount of data put in the Reserve LUN pool would be approximately 600GB (leaving 200GB free for more changed data).  More efficient use of the Snapshot pool and more flexibility.
  • IF the Reserve LUN Pool ran out of space, the SANCopy update would fail but the production LUNs ARE NEVER AFFECTED.  Higher availability for the critical application that you devoted time and money to replicate.
  • Less spinning disk on the EMC array vs. the NetApp.

EMC has several replication products available that each act differently.  I used SANCopy because, combined with Replication Manager, it provides similar functionality to NetApp SnapMirror and SnapManager.  MirrorView/Async has the same advantages as SANCopy/Incremental in these scenarios and can replicate Exchange, SQL, and other applications without any host involvement.

Higher Application availability, lower WAN Utilization , Better RPO, Fewer Spinning Disks, without even leveraging advanced features for even better efficiency and performance.

Why pNFS can be a big deal even if NFS4.1 isn’t…

Posted on by

It’s been a little while since I’ve posted, mostly due to my life being turned on it’s rear after our first child was born 8 weeks ago.  As things start to settle into a rhythm (as much as is possible) I’ve been back online more, reading blogs, following Twitter, and working with customers regularly.  As some of you may know, EMC announced support for pNFS in Celerra with the release of DART 6.x and there have been several recent posts about the technology which piqued my interest a little.

The other bloggers have done a good job of describing what pNFS is and what is new in NFS4.1 itself so I won’t repeat all of that.  I want to focus specifically on pNFS and why it IS a big deal.

Prior to my coming to work for EMC, I worked in internal IT at company that deals with large binary files in support of product development, as well as video editing for marketing purposes.  I had a chance to evaluate, implement, and support multiple clustered file system technologies.  The first was for an HD video editing solution using Mac’s and we followed the likely path of implementing Apple’s XSAN solution which you may know is an OEM’d version of Quantum(ADIC) StorNext.  StorNext allows you to create large filesystems across many disks and access them as local disk on many clients.  File Open, Close, byte-range locking, etc are handled by MetaData Controllers (MDCs) across an IP network while the actual heavy lifting of read/write IO is done over FibreChannel from the clients to the storage directly.  All the shared filesystem benefits of NAS with the performance benefits of SAN.

The second project was specifically targeted at moving large files (4+GB each) through a workflow across many computers as quickly as possible so we could ship products.  Faster processing of the workflow translated to more completed projects per person/per day which meant better margins and keeping our partners and customers happy.  The workflow was already established, using Windows based computers and a file server.  The file server was running out of steam and the amount of data being stored at any given time had increased from 500GB to 8TB over the past 12 months.  We needed a simple way to increase the performance of the file server and also allow for better scalability.  Working with our local EMC SE, we tested and deployed MPFSi using a Celerra NS40 with integrated storage.

MPFS has been around a long time (also known as High Road) and works with Windows and various *nix based platforms.  It is similar to XSAN/StorNext in that open/close/locking activity is handled over IP by the metadata controller (the Celerra datamover in the case of MPFS) while the read/write IO is handled over block storage technology (MPFS supports FibreChannel and iSCSI connectivity to storage).  The advantage of MPFS over many other solutions is that the metadata controller and storage are all built-in to the EMC Celerra storage device and you don’t have to deploy any other servers.

In our case we chose iSCSI due to the cost of FC (switches and HBAs) and used the GigE ports on the Celerra’s CX3 backend for block connectivity.  In testing we showed that CIFS alone provided approximately 240mbps of throughput over GigE connections while enabling MPFSi netted about 750mbps, even if we used the same NIC on the client.  So we tripled throughput over the same LAN by installing a software client.  Had we gone the extra mile to deploy FibreChannel for the block IO we would have seen much higher throughput.

Even better, the use of MPFS did not preclude the use of NDMP for backup to tape directly from the Celerra, accelerating backup many times over the old fileserver.  For clients that did not have MPFS software installed, they accessed the same files over traditional CIFS with no problems.  Another side benefit of MPFS over traditional CIFS, is that the block I/O stack is much more efficient than the NAS I/O stack so even with increased throughput, CPU utilization is lower on the client returning cycles to the application which is doing work for your business.

There are many clustered file system / clustered NAS solutions on the market from a variety of vendors (StorNext, MPFS, GFS, Polyserve, etc) and most of these products are trying to solve the same basic problems of storing more data and increasing performance.  The problem is they are all proprietary and because of that you end up with multiple solutions deployed in the same company.  In our case we couldn’t use MPFS for the video editing solution because EMC has not provided a client for Mac OSX.  And this is where pNFS really becomes attractive.  Storage vendors and operating system vendors alike will be upgrading the already ubiquitous NFS stack in their code to support NFS4.1 and pNFS.  And that support means that I could deploy an EMC Celerra MPFS like solution using the same Celerra based storage, with no extra servers, and no special client software, just the native NFS client in my operating system of choice.  Perhaps Apple will include a pNFS capable client in a future version of Mac OSX.

If you look at the pNFS standard you’ll see that it supports the use of not only block storage, but object and file based storage as well.  So as we build out larger and larger environments and private clouds start to expand into public clouds you could tier your pNFS data across FiberChannel storage, object storage (think Atmos on premises), as well as out to a service provider cloud (ie: AT&T Synaptic).  Now you’ve dramatically increased performance for the data that needs it, saved money storing the data that you need to keep long term, and geographically dispersed the data that needs to be close to users, with a single protocol supported by most of the industry and a single point of management.

Personally I think pNFS could kill off proprietary solutions over the long run unless they include support for it in their products.

This is just my opinion of course…