November 16, 2015 5 Comments
There is no shortage of mysteries in the Data Center. These stealthy influencers can undermine performance and consistency of your environment, while remaining elusive to identify, quantify, and control. Virtualization helped expose some of this information, as it provided an ideal control plane for visibility. But it does not, and cannot properly expose all data necessary to account for these influencers. The hypervisor also has a habit of presenting the data in ways that can be misinterpreted.
One such mystery as it relates to modern day virtualized Data Centers is known as the “working set.” This term certainly has historical meaning in the realm of computer science, but the practical definition has evolved to include other components of the Data Center; storage in particular. Many find it hard to define, let alone understand how it impacts their Data Center, and how to even begin measuring it.
We often focus on what we know, and what we can control. However, lack of visibility of influencing factors in the Data Center does not make it unimportant. Unfortunately this is how working sets are usually treated. It is often not a part of a Data Center design exercise because it is completely unknown. It is rarely written about for the very same reason. Ironic considering that every modern architecture deals with some concept of localization of data in order to improve performance. Cached content versus it’s persistent home. How much of it is there? How often is it accessed? All of these types of questions are critically important to know.
What is it?
For all practical purposes, a working set refers the amount of data that a process or workflow uses in a given time period. Think of it as hot, commonly accessed data of your overall persistent storage capacity. But that simple explanation leaves a handful of terms that are difficult to qualify, and quantify. What is recent? Does “amount” mean reads, writes, or both? And does it define if it is the same data written over and over again, or is it new data? Let’s explore this more.
There are a few traits of working sets that are worth reviewing.
- Working sets are driven by the workload, the applications driving the workload, and the VMs that they run on. Whether the persistent storage is local, shared, or distributed, it really doesn’t matter from the perspective of how the VMs see it. The size will be largely the same.
- Working sets always relate to a time period. However, it’s a continuum. And there will be cycles in the data activity over time.
- Working set will comprise of reads and writes. The amount of each is important to know because reads and writes have different characteristics, and demand different things from your storage system.
- Working set size refers to an amount, or capacity, but what and how many I/Os it took to make up that capacity will vary due to ever changing block sizes.
- Data access type may be different. Is one block read a thousand times, or are a thousand blocks read one time? Are the writes mostly overwriting existing data, or is it new data? This is part of what makes workloads so unique.
- Working set sizes evolve and change as your workloads and Data Center change. Like everything else, they are not static.
A simplified, visual interpretation of data activity that would define a working set, might look like below.
If a working set is always related to a period of time, then how can we ever define it? Well in fact, you can. A workload often has a period of activity followed by a period of rest. This is sometimes referred to the “duty cycle.” A duty cycle might be the pattern that shows up after a day of activity on a mailbox server, an hour of batch processing on a SQL server, or 30 minutes compiling code. Taking a look over a larger period of time, duty cycles of a VM might look something like below.
Working sets can be defined at whatever time increment desired, but the goal in calculating a working set will be to capture at minimum, one or more duty cycles of each individual workload.
Why it matters
Determining a working set sizes helps you understand the behaviors of your workloads in order to better design, operate, and optimize your environment. For the same reason you pay attention to compute and memory demands, it is also important to understand storage characteristics; which includes working sets. Understanding and accurately calculating working sets can have a profound effect on the consistency of a data center. Have you ever heard about a real workload performing poorly, or inconsistently on a tiered storage array, hybrid array, or Hyper Converged environment? This is because both are extremely sensitive to right sizing the caching layer. Not accurately accounting for working set sizes of the production workloads is a common reason for such issues.
Classic methods for calculation
Over the years, this mystery around working set sizes has resulted in all sorts of sad attempts at trying to calculate. Those attempts have included:
- Calculate using known (but not very helpful) factors. These generally comprise of looking at some measurement of IOPS over the course of a given time period. Maybe dress it up with a few other factors to make it look neat. This is terribly flawed, as it assumes one knows all of the various block sizes for that given workload, and that block sizes for a workload are consistent over time. It also assumes all reads and writes use the same block size, which is also false.
- Measure working sets at the array, as a feature of the array’s caching layer. This attempt often fails because it sits at the wrong location. It may know what blocks of data are commonly accessed, but there is no context to the VM or workload imparting the demand. Most of that intelligence about the data is lost the moment the data exits the HBA of the vSphere host. Lack of VM awareness can even make an accurately guessed cache size on an array be insufficient at times due to cache pollution from noisy neighbor VMs.
- Take an incremental backup, and look at the amount of changed data. Seems logical, but this can be misleading because it will not account for data that is written over and over, nor does it account for reads. The incremental time period of the backup may also not be representative of the duty cycle of the workload.
- Guess work. You might see “recommendations” that say a certain percentage of your total storage capacity used is hot data, but this is a more formal way to admit that it’s nearly impossible to determine. Guess large enough, and the impact of being wrong will be less, but this introduces a number of technical and financial implications on Data Center design.
As you can see, these old strategies do not hold up well, and still leaves the Administrator without a real answer. A Data Center Architect deserves better when factoring in this element to the design or optimization of an environment.
PernixData Architect, and working sets
The hypervisor is the ideal control plane for measurement of a lot of things. Let’s take storage I/O latency as a great example. It doesn’t matter what the latency a storage array advertises, but what the VM actually will see. So why not extend the functionality of the hypervisor kernel so that it provides insight into working set data on a per VM basis? That is exactly what PernixData Architect does. By understanding, and presenting storage characteristics such as block sizes in a way never previously possible, Architect understands on a per VM basis the key elements necessary to calculate working set sizes.
PernixData Architect will provide a working set estimation for each individual VM in your vSphere cluster, as well as an estimate for the VMs on a per host basis. The example below shows this individual breakdown.
And below you see the estimate on a per host level.
What is so unique about these estimates are that it factors in reads and writes. Why is this important? We know that writes have such a different demand on an infrastructure than reads do, so a single number would tell an incomplete story.
PernixData Architect is a stand alone product from FVP, but for those FVP customers who also use Architect, it will provide specialized calculations that factor in how FVP handles cached content and analyzes the activity further to provide FVP users with an estimate on how much flash or RAM would be ideal for each host. This results in a per host recommendation that provides a high and low estimate.
What you can do with this data
Once we’ve established the working set sizes of your workloads, it opens a lot of doors for better design and optimization of an environment. Here are some examples.
- Properly size your top performing tier of persistent storage in a storage array.
- If you are using server side acceleration with a product like PernixData FVP, size the flash and/or RAM on a per host basis correctly to maximize the offload of I/O from an array.
- If you are looking at replicating data to another Data Center, take a look at the writes committed on the working set estimate to gauge how much bandwidth you might need between sites.
- Learn how much of a caching layer might be needed for your existing Hyper Converged environment.
- Chargeback/showback. This is one more way of conveying who are the heavy consumers of your environment, and would fit nicely into a chargeback/showback arrangement.
Determining working set sizes of an environment is a critical factor of the overall operation of your environment, but has been extremely difficult to obtain until now. Architect provides insight and analysis that is not possible in vCenter, or any traditional monitoring system that hooks into vCenter. Providing a detailed understanding of working set sizes is just one feature in Architect to help you make smart, data driven decisions. Good design equals predictable and consistent performance, and spending those precious IT dollars prudently. Rely on real data, and save the guesswork for your Fantasy Football league.