March 2011

Rack it up, plug it in, and away you go. Those are basically the steps needed to expand a storage pool by adding another PS array using the Dell/EqualLogic architecture. A few weeks ago I took delivery of a new PS4000e to compliment my PS6000e at my primary site. The purpose of this additional array was really simple. We needed raw storage capacity. My initial proposal and deployment of my virtualized infrastructure a few years ago was a good one, but I deliberately did not include our big flat-file storage servers in this initial scope of storage space requirements. There was plenty to keep me occupied between the initial deployment, and now. It allowed me to get most of my infrastructure virtualized, and gave a chance for buy-in to the skeptics who thought all of this new-fangled technology was too good to be true. Since that time, storage prices have fallen, and larger drive sizes have become available. Delaying the purchase aligned well with “just-in-time” purchasing principals, and also gave me an opportunity to address the storage issue in the correct way. At first, I thought all of this was a subject matter not worthy of writing about. After all, EqualLogic makes it easy to add storage. But that only addresses part of the problem. Many of you face the same dilemma regardless of what your storage solution is; user facing storage growth.

Before I start rambling about my dilemma, let me clarify what I mean by a few terms I’ll be using; “user facing storage” and “non user facing storage.”

User Facing Storage is simply the storage that is presented to end users via file shares (in Windows) and NFS mounts (in Linux). User facing storage is waiting there, ready to be sucked up by an overzealous end user.
Non User Facing Storage is the storage occupied by the servers themselves, and the services they provide. Most end users generally have no idea on how much space a server reserves for say, SQL databases or transaction logs (nor should they!) Non user facing storage is easier to anticipate needs and manage because it is only exposed to system administrators.

Which array…

I decided to go with the PS4000e because of the value it returns, and how it addresses my specific need. If I had targeted VDI or some storage for other I/O intensive services, I would have opted for one of the other offerings in the EqualLogic lineup. I virtualized the majority of my infrastructure on one PS6000e with 16, 1TB drives in it, but it wasn’t capable of the raw capacity that we now needed to virtualize our flat-file storage. While the effective number of 1GB ports is cut in half on the PS4000e as compared to the PS6000e, I have not been able to gather any usage statistics against my traditional storage servers that suggest the throughput of the PS4000e will not be sufficient. The PS4000e allowed me to trim a few dollars off of my budget line estimates, and may work well at our CoLo facility if we ever need to demote it.

I chose to create a storage pool so that I could keep my volumes that require higher performance on the PS6000, and have the dedicated storage volumes on the PS4000. I will do the same for when I eventually add other array types geared for specific roles, such as VDI.

Truth be told, we all know that 16, 2 terabyte drives does not equal 32 Terabytes of real world space. RAID50 penalty knocks that down to about 21TB. Cut that by about half for average snapshot reserves, and it’s more like 11TB. Keeping a little bit of free pools space available is always a good idea, so let’s just say it effectively adds 10TB of full fledged enterprise class storage. This adds to my effective storage space of 5TB on my PS6000. Fantastic. …but wait, one problem. No, several problems.

The Dilemma

Turning up the new array was the easy part. In less than 30 minutes, I had it mounted, turned on, and configured to work with my existing storage group. Now for the hard part; figuring out how to utilize the space in the most efficient way. User facing storage is a wildcard; do it wrong and you’ll pay for it later. While I didn’t know the answer, I did know some things that would help me come to an educated decision.

If I migrate all of the data on my remaining physical storage servers (two of them, one Linux, and one Windows) over to my SAN, it will consume virtually all of my newly acquired storage space.
If I add a large amount of user-facing storage, and present that to end users, it will get sucked up like a vacuum.
If I blindly add large amounts of great storage at the primary site without careful thought, I will not have enough storage at the offsite facility to replicate to.
Large volumes (2TB or larger) not only run into technical limitations, but are difficult to manage. At that size, there may also be a co-mingling of data that is not necessarily business critical. Doling out user facing storage in large volumes is easy to do. It will come back to bite you later on.
Manipulating the old data in the same volume as new data does not bode well for replication and snapshots, which look at block changes. Breaking them into separate volumes is more effective.
Users will not take the time or the effort clean up old data.
If data retention policies are in place, users will generally be okay with it after a substantial amount of complaining. It’s not too different than the complaining you might here when there are no data retention policies, but you have no space. Pick your poison.
Your users will not understand data retention policies if you do not understand them. Time for a plan.

I needed a way to compartmentalize some of the data so that it could be identified as “less important” and then perhaps live on less important storage. By “less important storage” this could mean that it lives on a part of the SAN that is not replicated, or in a worst case scenario, on even some old decommissioned physical servers, where it resides for a defined amount of time before it is permanently archived and removed from the probationary location.

The Solution (for now)

Data Lifecycle management. For many this means some really expensive commercial package. This might be the way to go for you too. To me, this is really nothing more than determining what is important data, and what isn’t as important, and having a plan to help automate the demotion, or retirement of that data. However, there is a fundamental problem of this approach. Who decides what’s important? What are the thresholds? Last accessed time? Last modified time? What are the ramifications of cherry-picking files from a directory structure because they exceed policy thresholds? What is this going to break? How easy is it to recover data that has been demoted? There are a few steps that I need to do to accomplish this.

1. Poor man’s storage tiering. If you are out of SAN space, re-provision an old server. The purpose of this will be to serve up volumes that can be linked to the primary storage location through symbolic links. These volumes can then be backed up at a less frequent interval, as it would be considered less important. If you eventually have enough SAN storage space, these could be easily moved onto the SAN, but in a less critical role, or on a SAN array that has larger, slower disks.

2. Breaking up large volumes. I’m convinced that giant volumes do nothing for you when it comes to understanding and managing the contents. Turning larger blobs into smaller blobs also serves another very important role. It allows the intelligence of the EqualLogic solutions to do their work on where the data should live in a collection of arrays. A storage Group that consists of say, an SSD based array, a PS6000, and a PS4000 can effectively store the volumes in the correct array that best suites the demand.

3. Automating the process. This will come in two parts; a.) deciding on structure, policies, etc. and b.) making or using tools to move the files from one location to another. On the Linux side, this could mean anything from a bash script, or something written in python. Then use cron to schedule the occurrence. In Windows, you could leverage PowerShell, vbscript, or batch files. This would be as simple, or as complex as your needs require. However, if you are like me, you have limited time to tinker with scripting. If there is something turn-key that does the job, go for it. For me, that is an affordable little utility called “TreeSize Pro” This gives you not only the ability to analyze the contents of NTFS volumes, but can easily automate the pruning of this data to another location.

4. Monitoring the result. This one is easy to overlook, but you will need to monitor the fruits of your labor, and make sure it is doing what it should be doing; maintaining available storage space on critical storage devices. There are a handful of nice scripts that have been written for both platforms that help you monitor free storage space at the server level.

The result

The illustration below helps demonstrate how this would work.

As seen below, once a system is established to automatically move and house demoted data, you can more effectively use storage on the SAN.

Separation anxiety…

In order to make this work, you will have to work hard in making sure that the all of this is pretty transparent to the end user. If you have data that has complex external references, you would want to preserve the integrity of the data that relies on those dependent files. Hey, I never said this was going to be easy.

A few things worth remembering…

If 17 years in IT, and a little observation in human nature has taught me one thing, it is that we all undervalue our current data, and overvalue our old data. You see it time and time again. Storage runs out, and there are cries for running down to the local box store and picking up some $99 hard drives. What needs to reside on there is mission critical (hence the undervaluing of the new data). Conversely, efforts to have users clean up old data from 10+ years ago had users hiding files in special locations, even though it was recorded that it had not been modified, or even accessed in 4+ years. All of this of course lives on enterprise class storage. An all too common example of overvaluing old data.

Tip. Remember your Service Level Agreements. It is common in IT to not only have SLAs for systems and data, but for one’s position. These without doubt are tied to one another. Make sure that one doesn’t compromise the other. Stop gap measures to accommodate more storage will trigger desperate, affordable solutions. (e.g. adding cheap non-redundant drives in an old server somewhere). Don’t do it! All of those arm-chair administrators in your organization will be nowhere to be found when those drives fail, and you are left to clean up the mess.

Tip. Don’t ever thin provision user facing storage. Fortunately, I was lucky to be clued into this early on, but I could only imagine the well intentioned administrator who wanted to present a nice amount of storage space to the user, only to find it sucked up a few days later. Save the thin provisioning for non user facing storage (servers with SQL databases and transaction logs, etc.)

Tip. If you are presenting proposals to management, or general information updates to users, I would suggest quoting only the amount of effective, usable space that will be added. In other words, don’t say you are adding 32TB to your storage infrastructure, when in fact, it is closer to 10TB. Say that it is 10TB of extremely sophisticated, redundant enterprise class storage that you can “bet the business” on. It’s scalability, flexibility and robustness is needed for the 24/7 environments we insist upon. It will just make it easier that way.

Tip. It may seem unnecessary to you, but continue to show off snapshots, replication, and other unique aspects of SAN storage, if you still have those who doubt the power of this kind of technology – especially when they see the cost per TB. Repeat to them how long (if even possible) it would take to protect that same data under traditional storage. Do everything you can to help those who approve these purchases. More than likely, they won’t be as impressed by say, how quick a snapshot is, but rather, shocked how traditional storage can’t be protected very well.

You may have noticed I do not have any rock-solid answers for managing the growth and sustainability of user facing data. Situations vary, but the factors that help determine that path for a solution are quite similar. Whether you decide on a turn-key solution, or choose to demonstrate a little ingenuity in times of tight budgets, the topic is one that you will probably have to face at some point.

Month: March 2011

Zero to 32 Terabytes in 30 minutes. My new EqualLogic PS4000e