Tag Archives: powerpath

If You Are Using SSDs, You Should Be Encrypting

Posted on by

I saw the following article come across Twitter today.

http://www.zdnet.com/blog/storage/ssd-security-the-worst-of-all-worlds/1326

In it, Robin Harris describes the issues around data recovery and secure erasure specific to SSD disks.  In layman’s terms, since SSDs do all sorts of fancy things with writes to increase longevity and performance, disk erasure is nearly impossible using normal methods, and forensic or malicious data recovery is quite easy.  So if you have sensitive data being stored on SSDs, that data is at risk of being read by someone, some day, in the future.  It seems that pretty much the only way to mitigate this risk is to use encryption at some level outside the SSD disk itself.

Did you know that EMC Symmetrix VMAX offers data-at-rest encryption that is completely transparent to hosts and applications, and has no performance impact?  With Symmetrix D@RE, each individual disk is encrypted with a unique key, managed by a built-in RSA key manager, so disks are unreadable if removed from the array.   Since the data is encrypted as the VMAX is writing to the physical disk, attempting to read data off an individual disk without the key is pointless, even for SSD disks.

The beauty of this feature is that it’s set-it-and-forget it.  No management needed, it’s enabled during installation and that’s it.  All disks are encrypted, all the time.

  • Ready to decomm an old array and return it, trade it, or sell it?  Destroy the keys and the data is gone.  No need for an expensive Data Erasure professional services engagement.
  • Failed disk replaced by your vendor?  No need for special arrangements with your vendor to keep those disks onsite, or certify erasure of a disk every time one is replaced.  The key stays with the array and the data on that disk is unreadable.

If you have to comply with PCI and/or other compliance rules that require secure erasure of disks, you should consider putting that data on a VMAX with data-at-rest encryption.

Now, What if you have an existing EMC storage system and the same need to encrypt data?  You can encrypt at the volume level with PowerPath Encryption.  PowerPath encrypts the data at the host with a unique key managed by an RSA Key Manager.  And it works with the non-EMC arrays that PowerPath supports as well.

Under normal circumstances, PowerPath Encryption does have some level of performance impact to the host however HBA vendors, such as Emulex, are now offering HBAs with encryption offload that works with PowerPath.  If you combine PowerPath Encryption with Emulex Encryption HBAs, you get in-flight AND at-rest encryption with near-zero performance impact.

  • Do you replicate your sensitive data to a 3rd party remote datacenter for business continuity?  PowerPath Encryption prevents unauthorized access to the data because no host can read it without the proper key.

Saving $90,000 per minute with VMAX Federated Live Migration

Posted on by

One of the customers I work with regularly has a set of key MS SQL databases totaling over 100TB in size which process transactions for their customer facing systems. If these databases are down, no transactions can be processed and customers looking to spend money will go elsewhere. The booking rate related to these databases is ~$90,000 per minute, all day, every day.

Today these databases live on Symmetrix DMX storage which has been very reliable and performant. As happens in an IT world, some of these Symmetrix systems are getting long in the tooth and we are working on refreshing several older DMX systems into fewer, newer VMAX systems. Aside from the efficiency and performance benefits of sub-LUN tiering (EMC FASTVP), and the higher scalability of VMAX vs DMX as well as pretty much any other storage array on the market, EMC has added another new feature to VMAX that is particularly advantageous to this particular customer — Federated Live Migration

Tech Refresh and Data Migrations:

Tech Refresh is a fact of life and data migrations, as a result are common place. The challenge with a data migration is finding a workable process and then shoehorning that process into existing SLAs and maintenance windows. In general, data migrations are disruptive in some way or another, and the level of disruption depends on the technology used and the type of migration. EMC has a long list of migration tools that cover our midrange storage systems, NAS systems, as well as high-end Symmetrix arrays. The downside with these and other vendors’ migration tools is that there is some level of downtime required.

There are 3 basic approaches:

  1. Highest Amount of Downtime:
    • Take system down, copy data, reconfigure system, bring system online
  2. Less Downtime:
    • Copy Data, Take system down, copy recent changes, reconfigure system, bring system online (EMC Open Replicator, EMCopy, RoboCopy, Rsync, EMC SANCopy, etc)
  3. Even Less Downtime:
    • Set up new storage to proxy old storage, reconfigure system, serve data while copying in the background. (EMC Celerra CDMS, EMC Symmetrix Open Replicator)
    • (Traditional virtualization systems like Hitachi USP/VSP, IBM SVC, etc are similar to this as well since you must take some level of downtime to get hosts configured through the virtualization layer, after which you can non-disruptively move data).

Common theme in all 3? Downtime!

Non-Disruptive Migrations are becoming more realistic:

A while ago now, EMC added a feature to PowerPath called Migration Enabler which is a way to non-disruptively migrate data between LUNs and/or Storage arrays from the host perspective. PowerPath Migration Enabler could also leverage storage based copy mechanisms and help with cut over. This is very helpful and I have multiple customers who have successfully migrated data with PPME. The challenge has been getting customers to upgrade to newer versions of PowerPath that include the PPME features they need for their specific environment. Software upgrades usually require some level of downtime, at least on a per-host basis, and we run into maintenance windows and SLAs again.

EMC Symmetrix Federated Live Migration (FLM)

For Symmetrix VMAX customers, EMC has added a new capability that could make data migrations easy as pie for many customers. With FLM, a VMAX can migrate data into itself from another storage array, and perform the host cutover, automatically, and non-disruptively. The VMAX does not have to be inserted in front of the other storage array, and there is no downtime required to reconfigure the host before or after the migration.

Federation is the key to non-disruptive migrations:

The way FLM works is really pretty simple. First, the host is connected to the new VMAX storage without removing the connectivity to the old storage. The VMAX is also connected to the old storage array directly (through the fabric in reality). The VMAX system then sets up a replication session for the devices owned by the host and begins copying data. This is all pretty straightforward. The smart stuff happens during the cut over process.

As you probably know, the VMAX is an Active/Active storage system, so all paths from a host to a LUN are active. Clariion, similar to most other midrange systems is an Active/Passive storage system. On Clariion, paths to the storage processor that owns the LUN are active, while paths to the non-owner are passive until needed. PowerPath and many other path management tools support both active/active and active/passive storage systems.

Symmetrix FLM leverages this active/passive support for the cut over. Essentially the old storage system is the “owning SP” before and during the migration. At cut over time, the VMAX essentially becomes the owning SP and it’s paths go active while the old paths go passive. PowerPath follows the “trespass” and the host keeps on chugging. To make this work, VMAX actually spoofs the LUN WWN and host ID, etc from the old array, so the host thinks it’s still talking to the old array. Filesystems and LVMs that signature and track LUNs based on WWN and/or Host ID are unaffected as a result. The cut over process is nothing more than a path change at the HBA level. The data copy and cut over are managed directly from the VMAX by an administrator.

Of course there are limitations… FLM initially supports only EMC Symmetrix DMX arrays as the source and requires PowerPath. From what I gather, this is not a technical limitation however, it’s just what EMC has tested and supports. Other EMC arrays will be supported later and I have no doubt it will support non-EMC arrays as a source as well.

Solving the $5 million problem:

Symmetrix Federated Live Migration became a signature feature in our VMAX discussions with the aforementioned customer because they know that for every minute their database is down, they lose $90,000 in revenue. Even with the fastest of the 3 traditional methods of migration, they would be down for up to an hour while 12 cluster nodes are rezoned to new storage, LUNs masked, drive letters assigned, etc. A reboot alone takes up to 30 minutes. 1 hour of downtime equates to $5,400,000 of lost revenue. By the way, that’s quite a bit more than the cost of the VMAX, and FLM is included at no additional license cost, so FLM just paid for the VMAX, and then some.

EMC CLARiiON and Celerra Updates – Defining Unified Storage

Posted on by

This past week, during EMC World 2010 in Boston, EMC made several announcements of updates to the Celerra and CLARiiON midrange platforms.  Some of the most impressive were new capabilities coming to CLARiiON FLARE in just a couple short months.  Major updates to Celerra DART will coincide with the FLARE updates and if you are already running CLARiiON CX4 hardware, or are evaluating CX4 (or Celerra), you will want to check these new features out.  They will be available to existing CX4(120,240,480,960)/NS(120,480,960) systems as part of a software update.

Here’s a list of key changes in FLARE 30:

  • Unified management for midrange storage platforms including CLARiiON and Celerra today, plus RecoverPoint, Replication Manager and more in the future.  This is a true single pane of glass for monitoring AND managing SAN, NAS, and data protection and it’s built in to the platform.  “EMC Unisphere” replaces Navisphere Manager and Celerra Manager and supports multiple storage systems simultaneously in a single window. (Video Demo)
  • Extremely large cache (ie: FASTCache) – Up to 2TB of additional read/write cache in CLARiiON using SSDs (Video Demo)
  • Block level Fully Automated Storage Tiering (ie: sub-LUN FAST) – Fully automated assignment of data across multiple disk types
  • Block Level Compression – Compress LUNs in the CLARiiON to reduce disk space requirements
  • VAAI Support – Integrate with vSphere ESX for improved performance

These features are in addition to existing features like:

  • Seamless and non-disruptive mobility of LUNs within a storage array – (via Virtual LUNs)
  • Non-Disruptive Data Migration – (via PowerPath Migration Enabler)
  • VMWare Aware Storage Management – (Navisphere, Unisphere, and vSphere Plugins giving complete visibility  and self-service provisioning for VMWare admins (Video Demo) AND Storage Admins
  • CIFS and NFS Compression – Compress production data on Celerra to reduce disk space requirements including VMs
  • Dynamic SAN path load balancing – (via PowerPath)
  • At-Rest-Encryption – (via PowerPath w/RSA)
  • SSD, FC, and SATA drives in the same system – Balance performance and capacity as needed for your application
  • Local and Remote replication with array level consistency – (SnapView, MirrorView, etc)
  • Hot-swap, Hot-Add, Hot-Upgrade IO Modules – Upgrade connectivity for FC, FCoE, and iSCSI with no downtime
  • Scale to 1.8PB of storage in a single system
  • Simultaneously provide FC, iSCSI, MPFS, NFS, and CIFS access

All together, this is an impressive list of features for a single platform. In fact, while many of EMC’s competitors have similar features, none of them have all of them in the same platform, or leverage them all simultaneously to gain efficiency.  When CLARiiON CX4 and Celerra NS are integrated and managed as a single Unified storage system with EMC Unisphere there is tremendous value as I’ll point out below…

Improve Performance easily…

  • Install a couple SSD drives into a CLARiiON and enable FASTCache to increase the array’s read/write cache from the industry competive 4GB-32GB up to 2TB of array based non-volatile Read AND Write cache available to ALL applications including NAS data hosted by the array.
  • Install PowerPath on Windows, Linux, Solaris, AND VMWare ESX hosts to automatically balance IO across all available paths to storage.  PowerPath detects latency and queuing occuring on each path and adjusts automatically, improving performance at the storage array AND for your hosts.  This is a huge benefit in VMWare environments especially.
  • When VMWare releases the updated version of vSphere ESX that supports VAAI, ESX will be able to leverage VAAI support in the CLARiiON to reduce the amount of IO required to do many tasks, improving performance across the environment again.
  • Upgrade from 1gbe iSCSI to 10gbe iSCSI, or from 4gbe FiberChannel to 8gbe FiberChannel, without a screwdriver or downtime.
  • Provide NAS shared file access with block-level performance for any application using EMC’s MPFS protocol.

Improve Efficiency and cost easily…

  • Create a single pool of storage containing some SSD, some FC, and some SATA drives, that automatically monitors and moves portions of data to the appropriate disk type to both improve performance AND decrease cost simultaneously.
  • Non-disruptively compress volumes and/or files with a single click to save 50% of your disk space in many cases.
  • Convert traditional LUNs to more efficient Thin-LUNs non-disruptively using PowerPath Migration Enabler, saving more disk space.

Increase and Manage Capacity easily…

  • Add additional storage non-disruptively with SSD, FC, and SATA drives in any mix up to 1.8PB of raw storage in a single CLARiiON CX4.
  • Using FASTCache, iSCSI, FC, and FCoE connectivity simultaneously does not reduce total capacity of the system.
  • Expanding LUNs, RAID Groups, and Storage Pools is non-disruptive.
  • Migrating LUNs between RAID groups and/or Storage Pools is non-disruptive using built-in CLARiiON LUN Migration, as is migrating data to a different storage array (using PowerPath Migration Enabler)!
  • Balancing workload between storage processors is non-disruptive and at individual LUN granularity.

Protect your data easily…

  • Snapshot, Clone, and Replicate any of the data to anywhere with built in array tools that can maintain complete data consistency across a single, or multiple applications without installing software.
  • Maintain application consistency for Exchange, SQL, Oracle, SAP, and much more, even within VMWare VMs, while replicating to anywhere with a single pane-of-glass.
  • Encrypt sensitive data seamlessly using PowerPath Encryption w/RSA.

Maintain Flexibility…

  • While you can do all of these things quickly and simply, you still have the flexibility to create traditional RAID sets using RAID 0, 1, 5, 6, and 10 where you need highly predicable performance, or tune read and write cache at the array and LUN level for specific workloads.  Do you want read/write snapshots? How about full copy clones on completely separate disks for workload isolation and failure protection? What about the ability to rollback data to different points in time using snapshots without deleting any other snapshots?  EMC Storage arrays have been able to do this for a long time and that hasn’t changed.

There are few manufacturers aside from EMC that can provide all of these capabilities, let alone provide them within a single platform.  That’s the definition of simple, efficient, Unified Storage in my opinion.

NetApp and EMC: ESX and Exchange 2007 CCR

Posted on by

The first application we tackled after deploying the NetApp system was Exchange 2007.  We had deployed Exchange 2007 recently, running in CCR clusters on VMWare ESX.  Since each node of a CCR cluster has it’s own copy of the database we wanted to put one node from each cluster onto the NetApp, leaving the other nodes on the Clariion.  This environment is entirely FiberChannel, no iSCSI deployed and as such the Exchange servers are using VMWare Raw Devices for the database and log disks.  This poses a problem that we didn’t discover until later which I will discuss in a future post about replicating Exchange with NetApp.

Re-Architecting the environment to fit the storage

The first thing we discovered was that neither IBM/NetApp nor EMC would support the same host HBAs zoned to multiple brands of storage.  So we had to split the ESX cluster into two clusters, one on each storage platform.  Luckily the Exchange environment was isolated on it’s own six node cluster so it was easy to split everything in half.

Next we learned that due to NetApp’s updated active/active mode with proxy paths in ONTap 7.3, VMWare ESX 3.x randomly selects paths when rescanning HBAs and will pick non-optimized paths to the LUNs.  This still works but is not ideal as it increases IO latency, causing the Filer to send autosupport emails periodically warning of the problem.  Installing the NetApp Host Utilities for ESX onto the ESX hosts themselves allows you to run a script that assigns persistent paths evenly across the HBAs.  The script works as advertised but as far as I can tell you have to run the script each time you add a new LUN to the ESX server.  It would be much better if it were more automated.

Actually, if you are running ESX4.0 the scenario changes since NetApp ONTap 7.3+, Clariion FLARE 26+, and ESX4 all support ALUA making this problem all but disappear and improving fabric resiliency. Unfortunately for us, ESX4 is still a bit new and hasn’t been rolled out into production yet.  NetApp also released tools for vCenter 4.0 that allow you to do the path assignment and other tasks from within vCenter rather than at the command line.  EMC also now has PowerPath available for ESX4.0 which will not only manage paths but load balance across all paths for increased performance and lower latency.

VirtualStorageGuy has blogged already about the NetApp/EMC/vSphere plug-ins and there is even a Powerpoint available.

Finally, during the sales process NetApp pushed their de-duplication features (A-SIS) quite a bit and stressed how much disk space we could save in a VMWare environment.  During deployment we were informed that if your VMs (VMDKs and VMFS) were not properly partition aligned de-duplication wouldn’t work well or at all.  Since this environment has several hundred VMs built over several years by many people, and aligning the system (C:) drive of a Windows VM is difficult, the benefit would be minimal for us.  Luckily NetApp has provided tools that can scan and align VMDKs without having to repartition the disks.  We have not tested this yet.  Partition Alignment is a best practice for ANY SAN storage system so we can’t fault NetApp for this problem; it’s just a fact of life.

But is it REALLY Redundant?

Even with two storage systems, with independent VMWare clusters, each hosting half of the Exchange cluster environment, a problem with either array could still take down and entire Exchange cluster.  This is due to the File Share Witness (FSW) component used in a Majority Node Set (MNS) cluster like Exchange CCR.  The idea behind the FSW in an MNS cluster is to prevent a condition known as Split Brain.  Since a MNS cluster does not have a quorum disk, it relies entirely on network communication between the nodes to determine cluster status and make decisions about which nodes should become active.  In the event that the two nodes lose communication with each other, each node will check for the FSW and if it is still available, it assumes that the other cluster node is down and proceeds to bring cluster resources online (if they weren’t already).  Without the FSW, both nodes would potentially go active and there could be issues with inconsistent data, etc.  This is the split-brain condition.

Typically, each cluster has a single FSW on a separate server (the CAS servers in our case).  With the redundancy storage model we moved to, the FSW became a single point of failure.  If we put the FSW on EMC storage with NodeA, and NodeB on the IBM/NetApp storage, a problem with the EMC array could take down both the cluster node AND the FSW at the same time.  The surviving cluster node on the IBM/NetApp array would go down or stay down to prevent split-brain since the FSW was not available.  Moving the FSW to the IBM/NetApp array presents the same problem on opposite side of the cluster.  Incidentally, we proved this problem in lab testing to be sure.  The solution is to move the FSW off of BOTH arrays, to either a dedicated physical server with internal disk, or a third storage array if you have one.  There was a second EMC array in production so we moved the FSW there.  In the new configuration, a complete outage of any single storage array would not take down the Exchange environment.

Crude diagram of the storage redundant Exchange CCR cluster

So far this new 3-way split environment is working fine, performance on the EMC and NetApp arrays is fine for Exchange.  Using the same number of disks on the NetApp array yields about twice as much usable space as the EMC due to RAID-DP vs RAID-10 but overall performance is similar.  Theoretically that means we could allow for more growth of the Exchange databases but in reality that is not always the case.  My next update will be about Exchange replication using SnapManager and SnapMirror and how that has effectively negated the remaining free space in the NetApp aggregate.