It’s been a little while since I’ve posted, mostly due to my life being turned on it’s rear after our first child was born 8 weeks ago. As things start to settle into a rhythm (as much as is possible) I’ve been back online more, reading blogs, following Twitter, and working with customers regularly. As some of you may know, EMC announced support for pNFS in Celerra with the release of DART 6.x and there have been several recent posts about the technology which piqued my interest a little.
- Chuck Hollis – I Want My pNFS
- Chuck Hollis – More on pNFS
- Storagebod – Deja Vu
- Chad Sakac – pNFS – it’s here! (Almost!)
- Steve Foskett – Is NFSv3 really that bad?
- Storagezilla – NFSv4 vs NFSv4? FIGHT!
The other bloggers have done a good job of describing what pNFS is and what is new in NFS4.1 itself so I won’t repeat all of that. I want to focus specifically on pNFS and why it IS a big deal.
Prior to my coming to work for EMC, I worked in internal IT at company that deals with large binary files in support of product development, as well as video editing for marketing purposes. I had a chance to evaluate, implement, and support multiple clustered file system technologies. The first was for an HD video editing solution using Mac’s and we followed the likely path of implementing Apple’s XSAN solution which you may know is an OEM’d version of Quantum(ADIC) StorNext. StorNext allows you to create large filesystems across many disks and access them as local disk on many clients. File Open, Close, byte-range locking, etc are handled by MetaData Controllers (MDCs) across an IP network while the actual heavy lifting of read/write IO is done over FibreChannel from the clients to the storage directly. All the shared filesystem benefits of NAS with the performance benefits of SAN.
The second project was specifically targeted at moving large files (4+GB each) through a workflow across many computers as quickly as possible so we could ship products. Faster processing of the workflow translated to more completed projects per person/per day which meant better margins and keeping our partners and customers happy. The workflow was already established, using Windows based computers and a file server. The file server was running out of steam and the amount of data being stored at any given time had increased from 500GB to 8TB over the past 12 months. We needed a simple way to increase the performance of the file server and also allow for better scalability. Working with our local EMC SE, we tested and deployed MPFSi using a Celerra NS40 with integrated storage.
MPFS has been around a long time (also known as High Road) and works with Windows and various *nix based platforms. It is similar to XSAN/StorNext in that open/close/locking activity is handled over IP by the metadata controller (the Celerra datamover in the case of MPFS) while the read/write IO is handled over block storage technology (MPFS supports FibreChannel and iSCSI connectivity to storage). The advantage of MPFS over many other solutions is that the metadata controller and storage are all built-in to the EMC Celerra storage device and you don’t have to deploy any other servers.
In our case we chose iSCSI due to the cost of FC (switches and HBAs) and used the GigE ports on the Celerra’s CX3 backend for block connectivity. In testing we showed that CIFS alone provided approximately 240mbps of throughput over GigE connections while enabling MPFSi netted about 750mbps, even if we used the same NIC on the client. So we tripled throughput over the same LAN by installing a software client. Had we gone the extra mile to deploy FibreChannel for the block IO we would have seen much higher throughput.
Even better, the use of MPFS did not preclude the use of NDMP for backup to tape directly from the Celerra, accelerating backup many times over the old fileserver. For clients that did not have MPFS software installed, they accessed the same files over traditional CIFS with no problems. Another side benefit of MPFS over traditional CIFS, is that the block I/O stack is much more efficient than the NAS I/O stack so even with increased throughput, CPU utilization is lower on the client returning cycles to the application which is doing work for your business.
There are many clustered file system / clustered NAS solutions on the market from a variety of vendors (StorNext, MPFS, GFS, Polyserve, etc) and most of these products are trying to solve the same basic problems of storing more data and increasing performance. The problem is they are all proprietary and because of that you end up with multiple solutions deployed in the same company. In our case we couldn’t use MPFS for the video editing solution because EMC has not provided a client for Mac OSX. And this is where pNFS really becomes attractive. Storage vendors and operating system vendors alike will be upgrading the already ubiquitous NFS stack in their code to support NFS4.1 and pNFS. And that support means that I could deploy an EMC Celerra MPFS like solution using the same Celerra based storage, with no extra servers, and no special client software, just the native NFS client in my operating system of choice. Perhaps Apple will include a pNFS capable client in a future version of Mac OSX.
If you look at the pNFS standard you’ll see that it supports the use of not only block storage, but object and file based storage as well. So as we build out larger and larger environments and private clouds start to expand into public clouds you could tier your pNFS data across FiberChannel storage, object storage (think Atmos on premises), as well as out to a service provider cloud (ie: AT&T Synaptic). Now you’ve dramatically increased performance for the data that needs it, saved money storing the data that you need to keep long term, and geographically dispersed the data that needs to be close to users, with a single protocol supported by most of the industry and a single point of management.
Personally I think pNFS could kill off proprietary solutions over the long run unless they include support for it in their products.
This is just my opinion of course…
Back From the Pile: Interesting Links, January 7, 2011 – Stephen Foskett, Pack Rat
January 7, 2011 at 11:01 am
[…] A new EMCer blog, Storage Savvy, has a nice pNFS
piece: Why pNFS can be a big deal even if NFS4.1 isn’t…
[…]