Accelerating storage using PernixData’s FVP. A perspective from customer #0001

Recently, I described in "Hunting down unnecessary I/O before you buy that next storage solution" the efforts around addressing "technical debt" that was contributing to unnecessary I/O. The goal was to get better performance out of my storage infrastructure. It’s been a worthwhile endeavor that I would recommend to anyone, but at the end of the day, one might still need faster storage. That usually means, free up another 3U of rack space, and open checkbook

Or does it?  Do I have to go the traditional route of adding more spindles, or investing heavily in a faster storage fabric?  Well, the answer was an unequivocal "yes" not too long ago, but times are a changing, and here is my way to tackle the problem in a radically different way.

I’ve chosen to delay any purchases of an additional storage array, or the infrastructure backing it, and opted to go PernixData FVP.  In fact, I was customer #0001 after PernixData announced GA of FVP 1.0.  So why did I go this route?

1.  Clustered host based caching.  Leveraging server side flash brings compute and data closer together, but thanks to FVP, it does so in such a way that works in a highly available clustered fashion that aligns perfectly with the feature sets of the hypervisor.

2.  Write-back caching. The ability to deliver writes to flash is really important. Write-through caching, which waits for the acknowledgement from the underlying storage, just wasn’t good enough for my environment. Rotational latencies, as well as physical transport latencies would still be there on over 80% of all of my traffic. I needed true write-back caching that would acknowledge the write immediately, while eventually de-staging it down to the underlying storage.

3.  Cost. The gold plated dominos of upgrading storage is not fun for anyone on the paying side of the equation. Going with PernixData FVP was going to address my needs for a fraction of the cost of a traditional solution.

4.  It allows for a significant decoupling of "storage for capacity" versus "storage for performance" dilemma when addressing additional storage needs.

5.  Another array would have been to a certain degree, more of the same. Incremental improvement, with less than enthusiastic results considering the amount invested.  I found myself not very excited to purchase another array. With so much volatility in the storage market, it almost seemed like an antiquated solution.

6.  Quick to implement. FVP installation consists of installing a VIB via Update Manager or the command line, installing the Management services and vCenter plugin, and you are off to the races.

7.  Hardware independent.  I didn’t have to wait for a special controller upgrade, firmware update, or wonder if my hardware would work with it. (a common problem with storage array solutions). Nor did I have to make a decision to perhaps go with a different storage vendor if I wanted to try a new technology.  It is purely a software solution with the flexibility of working with multiple types of flash; SSDs, or PCIe based. 

A different way to solve a classic problem
While my write intensive workload is pretty unique, my situation is not.  Our storage performance needs outgrew what the environment was designed for; capacity at a reasonable cost. This is an all too common problem.  With the increased capacities of spinning disks, it has actually made this problem worse, not better.  Fewer and fewer spindles are serving up more and more data.

My goal was to deliver the results our build VMs were capable of delivering with faster storage, but unable to because of my existing infrastructure.  For me it was about reducing I/O contention to allow the build system CPU cycles to deliver the builds without waiting on storage.  For others it might delivering lower latencies to their SQL backed ERP or CRM servers.

The allure of utilizing flash has been an intriguing one.  I often found myself looking at my vSphere hosts and all of it’s processing goodness, but disappointed those SSD sitting in the hosts couldn’t help to augment my storage performance needs.  Being an active participant in the PernixData beta program allowed me to see how it would help me in my environment, and if it would deliver the needs of the business.

Lessons learned so far
Don’t skimp on quality SSDs.  Would you buy an ESXi host with one physical core?  Of course you wouldn’t. Same thing goes with SSDs.  Quality flash is a must! I can tell you from first hand experience that it makes a huge difference.  I thought the Dell OEM SSDs that came with my M620 blades were fine, but by way of comparison, they were terrible. Don’t cripple a solution by going with cheap flash.  In this 4 node cluster, I went with 4 EMLC based, 400GB Intel S3700s. I also had the opportunity to test some Micron P400M EMLC SSDs, which also seemed to perform very well.

While I went with 400GB SSDs in each host (giving approximately 1.5TB of cache space for a 4 node cluster), I did most of my testing using 100GB SSDs. They seemed adequate in that they were not showing a significant amount of cache eviction, but I wanted to leverage my purchasing opportunity to get larger drives. Knowing the best size can be a bit of a mystery until you get things in place, but having a larger cache size allows for a larger working set of data available for future reads, as well as giving head room for the per-VM write-back redundancy setting available.

An unexpected surprise is how FVP has given me visibility into the one area of I/O monitoring that is traditional very difficult to see;  I/O patterns. See Iometer. As good as you want to make it.  Understanding this element of your I/O needs is critical, and the analytics in FVP has helped me discover some very interesting things about my I/O patterns that I will surely be investigating in the near future.

In the read-caching world, the saying goes that the fastest storage I/O is the I/O the array never will see. Well, with write caching, it eventually needs to be de-staged to the array.  While FVP will improve delivery of storage to the array by absorbing the I/O spikes and turning random writes to sequential writes, the I/O will still eventually have to be delivered to the backend storage. In a more write intensive environment, if the delta between your fast flash and your slow storage is significant, and your duty cycle of your applications driving the I/O is also significant, there is a chance it might not be able to keep up.  It might be a corner case, but it is possible.

What’s next
I’ll be posting more specifics on how running PernixData FVP has helped our environment.  So, is it really "disruptive" technology?  Time will ultimately tell.  But I chose to not purchase an array along with new SAN switchgear because of it.  Using FVP has lead to less traffic on my arrays, with higher throughput and lower read and write latencies for my VMs.  Yeah, I would qualify that as disruptive.

 

Helpful Links

Frank Denneman – Basic elements of the flash virtualization platform – Part 1
http://frankdenneman.nl/2013/06/18/basic-elements-of-the-flash-virtualization-platform-part-1/

Frank Denneman – Basic elements of the flash virtualization platform – Part 2
http://frankdenneman.nl/2013/07/02/basic-elements-of-fvp-part-2-using-own-platform-versus-in-place-file-system/

Frank Denneman – FVP Remote Flash Access
http://frankdenneman.nl/2013/08/07/fvp-remote-flash-access/

Frank Dennaman – Design considerations for the host local FVP architecture
http://frankdenneman.nl/2013/08/16/design-considerations-for-the-host-local-architecture/

Satyam Vaghani introducing PernixData FVP at Storage Field Day 3
http://www.pernixdata.com/SFD3/

Write-back deepdive by Frank and Satyam
http://www.pernixdata.com/files/wb-deepdive.html

Seattle VMUG Meeting for August 2013

VMworld 2013 is just a few short weeks away, and with that comes a lot of excitement in the virtualization community. Whether or not you plan on attending VMworld, there is plenty to be excited for with the next Seattle VMUG on Wednesday, August 14th (5:00pm to 8:00pm) at the Bellevue Art Museum. If you are located in the greater Seattle area, here are a few reasons why you should block out your calendar for this event.

Great content is what the Seattle VMUG has been striving for, and I think this next one is a fine example of that. Two fantastic sponsors will be presenting.

  • Veeam Backup & Replication has a tremendously loyal following. If you get a chance to use it, you quickly understand why. Big improvements are coming in Veeam Backup & Replication v7 that may alter your onsite and offsite protection strategies for the better. This is a great chance to find out more about it.
  • PernixData. Billed as the first flash hypervisor, see how this solution (known as FVP) leverages server-side cache to accelerate storage performance for vSphere environments, and why it is turning heads in the industry. It just may change how you design your infrastructure

Need a few more reasons to go? Okay, well how about:

  • You have a great chance at one of two iPad Minis that will be given away. Yes, the Seattle VMUG manages to find some pretty classy prizes to give away. You can’t win if you don’t show up.

  • Chat it up with your fellow VMUGers. Come on over and introduce yourself. Find out what others are doing in their environments, and share a little bit about what your environment is like. If you work with virtualization or manage infrastructures, and claim that nobody understands what you do, here is your chance to surround yourself with others who do!

  • Soak up the sights at the Bellevue Art Museum. Located in the heart of downtown Bellevue, it’s a great venue that has plenty of parking, and will give you an opportunity to see the exhibits
    So be sure to register and come by and say hi. You might walk away with a new iPad mini, and new ideas on how to improve your environment.