In the past–in the days of 2GB,4GB,9GB,18GB and even 36GB drives–when you were tasked with purchasing and configuring hard drives for an application, you were given the amount of storage space required for the application and that was pretty much good enough. If you or your company were more organized you’d do an analysis of the performance requirements for that application (ie: IOPS, read/write ratios, bandwidth, etc.) to make sure you had enough spindles to accommodate the application. More often than not, the capacity requirements necessitated more disk than the performance so you’d build your RAID group and fill it up all the way.
Fast forward a few years and 72GB drives are no longer available, 146GB drives are getting close to end-of-sale and there are 300, 400, 600GB drives, and terabyte SATA drives available for almost any storage system or server. The problem is that as these hard drives get bigger, they aren’t getting any faster. In fact, SATA drives are relatively new in the Enterprise space and are slower than traditional 10,000 and 15,000 RPM SCSI drives. But they hold terabytes of data. Today, performance is the primary requirement and capacity is second because in general you need more spindles for the performance of your application than you do to achieve the capacity requirement.
As an example, let’s take a 100GB SQL database that requires 800 IOPS at 50% Read/50% Write.
Back in the day with 18GB drives you’d need 12 disks to provide ~100GB of space in RAID10. Using SCSI-3 10K drives, you can expect about 140 IOPS per disk giving you 1680 IOPS available. Accounting for RAID10 write penalties, you’d have an effective 1100 IOPS, more than enough for your workload of 800 IOPS.
Today, a single 146GB 10K disk can provide all the capacity required for this database; but you still need at least 10 disks to achieve your 800 IOPS workload with RAID10, or 15 disks with RAID5. The capacity of a RAID10 group with ten 146GB drives is approximately 680GB, leaving you with 580GB of free (or slack) space in the RAID group. The trouble is that you can’t use that space for any of your other applications because the SQL database requires all of the performance available in that RAID group. Change it to RAID5, or use new larger disks, and it’s even worse. Switching to 15K RPM drives can help, but it’s only a 30% increase in performance.
If you are managing SAN storage for a large company, your management probably wants you to show them high disk capacity utilization on the SAN to help justify the cost of storage consolidation. But as the individual disk sizes get larger, it becomes increasingly difficult to keep the capacity utilization high, and for many companies it ends up dropping. Thin Provisioning and De-Duplication technologies are all the rage right now as storage companies push their wares, and customers everywhere are hoping that those buzzwords can somehow save them money on storage costs by increasing capacity utilization. But be aware, if you have slack space due to performance requirements, those technologies won’t do you any good and could hurt you. They are useful for certain types of applications, something I’ll discuss in a later post.
So what do you do? Well, there’s not a lot you can do except educate your management on the difference between sizing for performance and sizing for capacity. They should be aware that slack space is a byproduct of the ever increasing size of hard disk drives. Some vendors are selling high speed flash or SSD disks for their SAN storage systems which can be 30-50X faster than a 15K RPM drive and have similar capacities. But flash has a significant cost which only makes sense if you can leverage most of the IOPS available in each disk. In the next installment I’ll discuss tiered data techniques and how they can overcome some of these problems, increasing performance in some cases while also increasing utilization rates.