Monthly Archives: May 2010

EMC Unified: The benefit of having options

Posted on by

I’ve been having some fun discussions with one of my customers recently about how to tackle various application problems within the storage environment and it got me thinking about the value of having “options”.  This customer has an EMC Celerra Unified Storage Array that has Fiber Channel, iSCSI, NFS, and CIFS protocols enabled.  This single storage system supports VMWare, SQL, Web, Business Intelligence, and many custom applications.

The discussion was specifically centered on ensuring adequate storage performance for several different applications, each with a different type of workload…

1.)  Web Servers – Primarily VMs with general-purpose IO loads and low write ratios.

2.)  SQL Servers – Physical and Virtual machines with 30-40% write ratios and low latency requirements.

3.)  Custom Application  – A custom application database with 100% random read profiles running across 50 servers.

The EMC Unified solution:

EMC Storage already sports virtual provisioning in order to provision LUNs from large pools of disk to improve overall performance and reduce complexity.  In addition, QoS features in the array can be used to provide guaranteed levels of performance for specific datasets by specifying minimum and maximum bandwidth, response time, and IO requirements on a per-LUN basis.  This can help alleviate disk contention when many LUNs share the same disks, as in a virtual pool.  Enterprise Flash Drives (EFD) are also available for EMC Storage arrays to provide extremely high performance to applications that require it and they can coexist with FC and SATA drives in the same array.  Read and write cache can also be tuned at an array and LUN level to help with specific workloads.  With the updates to the EMC Unified Platform that I discussed previously, Sub-LUN FAST (auto tiering), and FAST Cache (EFD used as array cache) will be available to existing customers after a simple, non-disruptive, microcode upgrade, providing two new ways to tackle these issues.

So which feature should my customer use to address their 3 different applications?

Sub-LUN FAST (Fully Automated Storage Tiering)

Put all of the data into large Virtual Provisioning pools on the array, add a few EFD (SSD) and SATA disks to the mix and enable FAST to automatically move the blocks to the appropriate tier of storage.  Over time the workload would even out across the various tiers and performance would increase for all of the workloads with much fewer drives, saving on power, floor space, cooling, and potentially disk cost depending on the configuration.  This happens non-disruptively in the background.  Seems like a no-brainer right?

For this customer, FAST helps the web server VMs and the general-purpose SQL databases where the workload is predominately read and much of the same data is being accessed repeatedly (high locality of reference).   As long as the blocks being accessed most often are generally the same, day-to-day, automated tiering (FAST) is a great solution.  But what if the workload is much more random?  FAST would want to push all of the data into EFD, which generally wouldn’t be possible due to capacity requirements.  Okay, so tiering won’t solve all of their problems.  What about FAST Cache?

FAST Cache

Exponentially increase the size of the storage array’s read AND write cache with EFD (SSD) disks.  This would improve performance across the entire array for all “cache friendly” applications.

For this customer, increasing the size of write cache definitely helps performance for SQL (50% increase in TPM, 50% better response time as an example) but what about their custom database that is 100% random read?  Increasing the size of read cache will help get more data into cache and reduce the need to go to disk for reads, but the more random the data, the less useful cache is.   Okay, so very large caches won’t solve all of their problems.   EFDs must be the answer right?

EFD Disks

Forget SATA and FC disks; just use EFD for everything and it will be super fast!!   EFD has extremely high random read/write performance, low latency at high loads, and very high bandwidth.  You will even save money on power and cooling.

The total amount of data this customer is dealing with in these three applications alone exceeds 20TB.  To store that much in EFD would be cost prohibitive to say the least.  So, while EFD can solve all of this customer’s technical problems, they couldn’t afford to acquire enough EFD for the capacity requirements.

But wait, it’s not OR, it’s AND

The beauty of the EMC Unified solution is that you can use all of these technologies, together, on the same array, simultaneously.

In this customer’s case, we put FC and SATA into a virtual pool with FAST enabled and provision the web and general-purpose SQL servers from it.  FAST will eventually migrate the least used blocks to SATA, freeing the FC disks for the more demanding blocks.

Next, we extend the array cache using a couple EFDs and FAST Cache to help with random read, sequential pre-fetching, and bursty writes across the whole array.

Finally, for the custom 100% random read database, we dedicate a few EFDs to just that application, snapshot the DB and present copies to each server.  We disable read and write cache for the EFD backed volumes which leaves more cache available to the rest of the applications on the array, further improving total system performance.

Now, if and when the customer starts to see disk contention in the virtual pool that might affect performance of the general-purpose SQL databases, QoS can be tuned to ensure low response times on just the SQL volumes ensuring consistent performance.  If the disks become saturated to the point where QoS cannot maintain the response time or the other LUNs are suffering from load generated by SQL, any of the volumes can be migrated (non-disruptively) to a different virtual pool in the array to reduce disk contention.

Options

If you look at offerings from the various storage vendors, many promote large virtual pools, some also promote large caches of some kind, others promote block level tiering, and a few promote EFD (aka SSDs) to solve performance problems.  But, when you are consolidating multiple workloads into a single platform, you will discover that there are weaknesses in every one of those features and you are going to wish you had the option to use most or all of those features together.

You have that option on EMC Unified.

EMC CLARiiON and Celerra Updates – Defining Unified Storage

Posted on by

This past week, during EMC World 2010 in Boston, EMC made several announcements of updates to the Celerra and CLARiiON midrange platforms.  Some of the most impressive were new capabilities coming to CLARiiON FLARE in just a couple short months.  Major updates to Celerra DART will coincide with the FLARE updates and if you are already running CLARiiON CX4 hardware, or are evaluating CX4 (or Celerra), you will want to check these new features out.  They will be available to existing CX4(120,240,480,960)/NS(120,480,960) systems as part of a software update.

Here’s a list of key changes in FLARE 30:

  • Unified management for midrange storage platforms including CLARiiON and Celerra today, plus RecoverPoint, Replication Manager and more in the future.  This is a true single pane of glass for monitoring AND managing SAN, NAS, and data protection and it’s built in to the platform.  “EMC Unisphere” replaces Navisphere Manager and Celerra Manager and supports multiple storage systems simultaneously in a single window. (Video Demo)
  • Extremely large cache (ie: FASTCache) – Up to 2TB of additional read/write cache in CLARiiON using SSDs (Video Demo)
  • Block level Fully Automated Storage Tiering (ie: sub-LUN FAST) – Fully automated assignment of data across multiple disk types
  • Block Level Compression – Compress LUNs in the CLARiiON to reduce disk space requirements
  • VAAI Support – Integrate with vSphere ESX for improved performance

These features are in addition to existing features like:

  • Seamless and non-disruptive mobility of LUNs within a storage array – (via Virtual LUNs)
  • Non-Disruptive Data Migration – (via PowerPath Migration Enabler)
  • VMWare Aware Storage Management – (Navisphere, Unisphere, and vSphere Plugins giving complete visibility  and self-service provisioning for VMWare admins (Video Demo) AND Storage Admins
  • CIFS and NFS Compression – Compress production data on Celerra to reduce disk space requirements including VMs
  • Dynamic SAN path load balancing – (via PowerPath)
  • At-Rest-Encryption – (via PowerPath w/RSA)
  • SSD, FC, and SATA drives in the same system – Balance performance and capacity as needed for your application
  • Local and Remote replication with array level consistency – (SnapView, MirrorView, etc)
  • Hot-swap, Hot-Add, Hot-Upgrade IO Modules – Upgrade connectivity for FC, FCoE, and iSCSI with no downtime
  • Scale to 1.8PB of storage in a single system
  • Simultaneously provide FC, iSCSI, MPFS, NFS, and CIFS access

All together, this is an impressive list of features for a single platform. In fact, while many of EMC’s competitors have similar features, none of them have all of them in the same platform, or leverage them all simultaneously to gain efficiency.  When CLARiiON CX4 and Celerra NS are integrated and managed as a single Unified storage system with EMC Unisphere there is tremendous value as I’ll point out below…

Improve Performance easily…

  • Install a couple SSD drives into a CLARiiON and enable FASTCache to increase the array’s read/write cache from the industry competive 4GB-32GB up to 2TB of array based non-volatile Read AND Write cache available to ALL applications including NAS data hosted by the array.
  • Install PowerPath on Windows, Linux, Solaris, AND VMWare ESX hosts to automatically balance IO across all available paths to storage.  PowerPath detects latency and queuing occuring on each path and adjusts automatically, improving performance at the storage array AND for your hosts.  This is a huge benefit in VMWare environments especially.
  • When VMWare releases the updated version of vSphere ESX that supports VAAI, ESX will be able to leverage VAAI support in the CLARiiON to reduce the amount of IO required to do many tasks, improving performance across the environment again.
  • Upgrade from 1gbe iSCSI to 10gbe iSCSI, or from 4gbe FiberChannel to 8gbe FiberChannel, without a screwdriver or downtime.
  • Provide NAS shared file access with block-level performance for any application using EMC’s MPFS protocol.

Improve Efficiency and cost easily…

  • Create a single pool of storage containing some SSD, some FC, and some SATA drives, that automatically monitors and moves portions of data to the appropriate disk type to both improve performance AND decrease cost simultaneously.
  • Non-disruptively compress volumes and/or files with a single click to save 50% of your disk space in many cases.
  • Convert traditional LUNs to more efficient Thin-LUNs non-disruptively using PowerPath Migration Enabler, saving more disk space.

Increase and Manage Capacity easily…

  • Add additional storage non-disruptively with SSD, FC, and SATA drives in any mix up to 1.8PB of raw storage in a single CLARiiON CX4.
  • Using FASTCache, iSCSI, FC, and FCoE connectivity simultaneously does not reduce total capacity of the system.
  • Expanding LUNs, RAID Groups, and Storage Pools is non-disruptive.
  • Migrating LUNs between RAID groups and/or Storage Pools is non-disruptive using built-in CLARiiON LUN Migration, as is migrating data to a different storage array (using PowerPath Migration Enabler)!
  • Balancing workload between storage processors is non-disruptive and at individual LUN granularity.

Protect your data easily…

  • Snapshot, Clone, and Replicate any of the data to anywhere with built in array tools that can maintain complete data consistency across a single, or multiple applications without installing software.
  • Maintain application consistency for Exchange, SQL, Oracle, SAP, and much more, even within VMWare VMs, while replicating to anywhere with a single pane-of-glass.
  • Encrypt sensitive data seamlessly using PowerPath Encryption w/RSA.

Maintain Flexibility…

  • While you can do all of these things quickly and simply, you still have the flexibility to create traditional RAID sets using RAID 0, 1, 5, 6, and 10 where you need highly predicable performance, or tune read and write cache at the array and LUN level for specific workloads.  Do you want read/write snapshots? How about full copy clones on completely separate disks for workload isolation and failure protection? What about the ability to rollback data to different points in time using snapshots without deleting any other snapshots?  EMC Storage arrays have been able to do this for a long time and that hasn’t changed.

There are few manufacturers aside from EMC that can provide all of these capabilities, let alone provide them within a single platform.  That’s the definition of simple, efficient, Unified Storage in my opinion.

EMC VPLEX enables the private cloud.. But what is a “private cloud”?

Posted on by

Buzzword Much?

If you have seen any of EMC’s marketing for EMC World, or you are attending EMC World in Boston this week, you no doubt noticed a ton of talk about the “Private Cloud”.  There has been a lot more talk from vendors as of late about the “cloud” and “cloud computing” and you may be reminded about how every few years the word “cloud” is shouted out by vendors of all kinds and how inevitably the talk quiets and nothing is really different.  So is it different this time?  I think so.

What is a Cloud?

In the context of IT, there are examples of clouds already.  The Internet and public telephone system are two examples of clouds.  Facebook, Flickr, and Salesforce are examples of clouds as well.  The common theme is that each of these examples provides some sort of service to the end user without requiring the end user to purchase or build any infrastructure to support it.  You can plug a phone into a wall and immediately call nearly anyone in the world.  Cloud is a fancy word (or buzzword) for providing something “as-a-service”.  Salesforce.com is software-as-a-service (SaaS).

So what is the Private Cloud?  

In the context of enterprise datacenters, the focus of EMC’s vision, the Private Cloud is Infrastructure-as-a-service (IaaS) and it enables corporate IT to transition from a necessary expense, to a profit center within the business, providing IT-as-a-Service to the rest of the business.  It decouples infrastructure from applications providing unprecedented levels of scalability, availability, and flexibility at lower cost.

What if…
a.) your corporate applications could run from anywhere, and users had access from anywhere?
b.) you could relocate your applications from anywhere to anywhere else, at any time, without disruption to your users.
c.) you could replace any piece of physical hardware in your infrastructure without impacting your applications.

Sounds too good to be true right? Maybe not…

This week, EMC announced a completely new product called VPLEX.  VPLEX has the ability to take your existing storage arrays and pool them into a cooperative pool of storage for hosts and applications.  It then allows you to move application data within and across those arrays as needed without disrupting the application or users.  If you are familiar with EMC’s Invista, IBM’s SVC, or Hitachi’s USP-V products you may be thinking that VPLEX is just another storage virtualization product.  But I assure you it’s different.  VPLEX virtualizes storage within the datacenter similar to how the above products can, but VPLEX can ALSO combine storage across multiple datacenters and allow an application to run from any of them or all of them, simultaneously, through the power of Federation.

Active/Active Datacenters

With VPLEX Federation, you can move a virtual machine and all of its data from datacenter A to datacenter B in a matter of minutes without user disruption; or hundreds of VMs, or thousands of VMs.  You can run the same application in both locations, sharing a single dataset.  Armed with EMC VPLEX and VMWare vSphere, you can upgrade, replace, and reconfigure any part of your infrastructure (storage, servers, network, power distribution, etc) without ever having to take your applications offline.  How’s that for availability?

The ability to create a virtual infrastructure from the storage layer through to the server layer and host any application on that infrastructure is the key to creating providing Infrastructure-as-a-Service, building the Private Cloud, and provisioning IT-as-a-Service within your organization.  Imagine running the IT department as a business within the business and actually showing financial value to the business.

There is a lot more to this concept but I wanted to at least bring some context around “cloud” as well as EMC’s new VPLEX product.  There will be more to come on this topic.

Chuck Hollis wrote about VPLEX as a new Storage Platform today, and VirtualGeek called it a Virtual Machine teleporter in his quite detailed write up of this new technology.  The key is to step back with an open mind and think about how application design and disaster recovery planning could be approached in entirely new ways when the data is no longer confined to a particular physical location.

Much ado about the future.. (of IT)

Posted on by

The 8:50am Alaska Air flight from Seattle to Boston today may as well have been an EMC chartered flight. Full of my current EMC peers, previous coworkers from my past 12 years in IT, as well as other EMC customers; all of us making the pilgrimage to Boston for EMC World 2010. The five and a half hour flight was both a networking opportunity and a reunion at the same time.

Despite the time away from home while my wife and I prepared for some big life changes, I’m excited to attend my 4th EMC World in 5 years, my first as an employee of EMC. This year promises to be extremely exciting as we make a number of huge announcements during the week, some of which have the potential to change the landscape of information storage and management. As an IT professional for over a decade, I’m a techie at heart and this is really exciting stuff. As a new EMC employee, the position and direction of the company validates many of the reasons I chose to take on this new career path, and with this company.

I plan to provide some commentary on the announcements we make during the week, particularly around virtual storage and the concept of “cloud computing”. I’m not a fan of industry buzzwords and “cloud” is one of the worst offenders but I think it’s important for IT professionals to understand what the vendors really mean when they talk about cloud, and how it affects every day life in IT.

If you are attending EMC World this year, I hope you feel the excitement, and I hope you start to see the bright future we are all headed for.

Do you have a recovery plan? You should!

Posted on by

In my new role at EMC, I am one of the first people to learn of major problems that my customers experience.  In general, customers seem to call their sales team before technical support when a big problem happens.  In the past week, I’ve been involved in recovery efforts with two different customers, both resulting from complete power outages in their production datacenters.

Both of these customers process millions of dollars through their global customer facing websites.  The smaller customer of the two does not have a disaster recovery site of any kind, while the other (larger) customer does have a recovery site, but it is not designed for 100% operation and is hundreds of miles away.

What became clear through both of these incidents is that having a very clear, very well known recovery plan is critical to the business.  Interestingly, these experiences drove home the point that even if you don’t have a recovery site, aren’t using replication, and otherwise don’t have any way to recover the data offsite, you still need a plan that encompasses what you CAN do.  More often than not, major outages are short lived and you will be recovering in your primary datacenter anyway, so you need to have a pre-determined plan to prevent major issues and shorten the time to recover.

Here are some things to think about when creating a recovery plan:

  • Get the application owners together and build a list of all the applications running in your environment.  Document the purpose of each application and map dependencies that each application has on other applications.
  • Next, involve the server/systems admins and document the server names, database names, IP addresses, and DNS names for each application on the list.
  • Finally, involve the infrastructure teams (storage, network, datacenter) and document the network dependencies (subnets, routers, VPN connections, load balancers, etc).  Document any SAN storage used by the servers/applications.  Also document how each infrastructure component affects others (ie: the SAN switches are required to be operational before servers can connect to storage arrays.)
  • Work with business leaders to prioritize the applications.  The idea is to understand how much impact each application has to the business both from a productivity perspective as well as direct financial impact.  There may be legal requirements or service level agreements with customers to consider as well.
  • If possible, identify the maximum amount of time each application can be down in the event of a catastrophic event (RTO – Recovery Time Objective) and how much data can be lost without significant impact to the business (RPO – Recovery Point Objective).  These metrics are usually measured in minutes, hours, and days.
  • Document the backup method for each server and application.  How often are backups run?  What is the retention period?  How long does it take to complete backups?   What is the expected time to restore the data?  How long does it take to recall tapes from offsite storage?
  • At this point you have a prioritized list of applications, now build a step by step recovery plan that lists the exact order in which you must recover systems.  The list should include server names as well as validation points to ensure certain systems are working before moving to the next step.  For example:
    • Step 1: bring up the network switches and routers
    • Step 2: bring up the DNS/DHCP servers
    • Step 3: bring up Active Directory servers
    • Step 4: bring up SAN fabric switches
    • Step 5: bring up SAN storage arrays, verify health of arrays with help from vendor
    • Step 6: …

I recommend that one of the first steps before starting recovery is to contact your key vendors (storage array vendors at least) to notify them of your outage so they can get support resources ready to troubleshoot any hardware issues you may run into during the recovery.

  • Identify key players needed in a recovery, at least primary and secondary contacts for every application and vendor contacts for hardware/software, facilities, UPS/Generator support teams, etc.
  • Establish a standard communication plan to include at least the following…
    • A method to notify employees of an outage and give instructions
    • A method to notify key players for recovery
    • A mechanism for key players to communicate with each other during the recovery
    • Personal (not corporate/business) contact information for all of the key players

The key thing to remember here is that you cannot rely on any communication tools that are part of your infrastructure.   You must assume your PBX/VOIP system will be down, Email will be down, corporate instant messenger will be down, Sharepoint will be unavailable, etc.

  • If you have a remote recovery site, with or without replication technology, and intend to use the remote site to recover production applications in the event of a large failure, be sure to document the triggers for moving to the recovery site.  As an example, you may want to attempt recovery in the primary site, and then move to the recovery site if recovery at the primary site will take too long — be sure to document that time and get executive buyoff.  You should not hear “how long do we wait until we move to the DR site?” during an active recovery operation.  That decision needs to be made during the planning exercise.
  • Document the entire plan and store the digital copies in a readily accessible place (file shares, Sharepoint site, etc).  Keep additional copies on USB sticks or CDs stored in a safe place.  Keep even MORE copies in another location outside the primary datacenter facility (ie: safe deposit box, remote office safe, etc).  Print copies as well and store the printed copy in similar safe places.  Assume that a building may not be accessible due to fire or flood.  I know one customer who issues fingerprint secured USB sticks to every manager.  Each manager must sync their USB stick to a server at least monthly or upper management is notified.
  • Make sure that everyone is aware of the recovery plan, who has access to the plan, where the copies are stored, and what role each of the key players is expected to play during a recovery.

There is far more to think about but hopefully you can get a good start with what I’ve listed above.  If you have a recovery plan already, you should review it regularly and think about anything that needs to be added or modified in the plan.

If you are trying to get approval for a remote recovery site and replication technology and having trouble getting executive approval, going through this exercise and defining application priority with RPO/RTO for each could give you the ammo you need.  Traditional backup architectures aren’t designed for RPO’s under 24 hours while storage array based replication can get RPOs down into the minutes and restoring from tape takes way longer than restoring from replicated data.

Last but not least, keep the plan updated as your environment changes, add new application and server details to the plan as part of the implementation process for new applications, or as part of change control procedures for significant changes to the infrastructure.