Rethinking “storage efficiency” in HCI architectures–Part 1

Hyper-converged infrastructures (HCI) can bring several design and operational benefits to the table, adding to the long list of reasons behind its popularity. Yet, HCI also introduces new considerations in understanding and measuring technical costs associated with the architecture. These technical costs could be thought of as a usage “tax” or “overhead” on host resources. The amount attributed to this technical cost can vary quite drastically, and depends heavily on the architecture used. For an administrator, it can be a bit challenging to measure and understand. The architecture used by HCI solutions should not be overlooked, as these technical costs can not only influence the performance and consistency of the VMs, but dramatically impact the density of VMs per host, and ultimately the total cost of ownership.

With HCI, host resources (CPU, memory, and network) are now responsible for an entirely new set of duties typically provided by a storage array found in a traditional three-tier architecture. These responsibilities not only include handling VM storage I/O from end to end, but due to the distributed nature of HCI, hosts will take part in storage activity of VMs not local to the host, such as replicated writes of a VM, as well as data at rest operations and other services related to storage. These responsibilities consume host resources. The question is, how much?

This multi-part series is going to look at the basics of HCI architectures, and how they behave differently with respect to their demands on CPU, memory, and network resources. Operational comparisons are not covered in this series simply to maintain focus on the intent of this series.

"Storage efficiency" is more than what you think
The term "storage efficiency" is commonly associated with just data deduplication and compression. With hyper-converged infrastructures, this term takes on additional meaning. Storage efficiency in HCI relates to the efficiency of how I/Os are delivered to and from the VM. Efficiency of I/O delivery to and from VMs matter not only from performance and consistency as seen by the VM, but how much resource usage is introduced to the hosts in the cluster. The latter is often never considered, yet extremely important.

HCI Architectures
HCI solutions available in today’s market not only offer different data services, but are built differently, which is just one of the many reasons why it is difficult to generalize a typical amount of overhead that is needed to process storage I/O. All HCI solutions will vary (some more than others) on how they provide storage services to the VMs while maintaining resources for guest VM activity. The two basic categories, as illustrated in Figure 1 are:

  • Virtual appliance approach. A VM lives on each host in the cluster, delivering a distributed shared storage plane, processing I/O and the other related activities. Depending on the particular HCI solution, this virtual appliance on each host may also be responsible for a number of other duties.
  • Integrated/in-kernel approach. The distributed shared storage system is a part of the hypervisor, where key aspects of the storage system are part of the kernel. This allows for virtual machine I/O to traverse through the native kernel I/O path for the hosts participating in that I/O activity.

Figure1-HCIComparison

Figure 1. Comparing an I/O write between HCI architectures (simplified for clarity)

HCI solutions that use a VM to process storage I/O on each host reside in a context (user space) that is no different than application VMs running on the host. In other words, the resources allocated to this virtual appliance to perform system level storage duties, contend for the same resources as the VMs that it is trying to serve. HCI solutions built into the hypervisor maintain end-to-end control and awareness of the I/O. Since an in-kernel, integrated solution allows I/O to traverse through the native kernel I/O path, it uses the least "costly" way to use host resources. HCI solutions built into the kernel minimize the amplification of I/Os and the CPU and memory resources it takes to process those I/Os from end to end. Sometimes virtual appliance based HCI solutions will use devices on hosts configured in the hypervisor for direct pass-through (aka “VMDirectPath”) in an attempt to reduce overhead, but many of the fundamental penalties (especially as they relate to CPU cycles) of I/O amplification through this indirect path and context switching remain.

Addressing a problem in different ways
Why are their multiple approaches? Manufacturers may state many reasons why they chose a specific approach, and why their approach is superior. Most the decision comes from technical limitations and go-to-market pressures. An HCI vendor may not have the access, or the ability to provide this functionality natively in the kernel of a hypervisor. A virtual appliance approach is easier to bring to market, and naturally adaptable to different hypervisors since it is little more than a virtual machine to process storage I/O.

By way of comparison, those who have full ownership of the hypervisor can integrate this functionality directly into the hypervisor, and when appropriate, build some aspects of it right into the kernel, just as other core functionality is built into the kernel. Resource efficiency, hypervisor feature integration, as well as the contextual awareness and control of I/O types are typically the top reasons why it is beneficial to have a distributed storage mechanism built into the hypervisor.

Do both approaches work? Yes. Do both approaches produce the same result in VM behavior and host resource usage? No. Running the same workloads using HCI solutions with these two different architectures may produce very different results on the VMs, and the hosts that serve them. The degree of impact will depend on the technical cost (in resource usage) of the I/O processing, and other data services provided by a given solution.

This difference often does not show up until numerous, real workloads are put on these solutions. Just as with a traditional storage array, every solution is fast when there is little to no load on it. What counts is the behavior under real load with contending resources. This is something not always visible with synthetic testing. For HCI environments, the overall “storage efficiency” of the particular HCI solution can be better compared (assuming identical hardware and workloads) by looking at the following in a real HCI environment running production workloads:

  • The average number of active VMs per host when running your real workloads.
  • The performance characteristics of the VMs and hosts when running your real workloads while hosts are busy serving other workloads.

These measurements above take this topic from an occasionally tiresome academic debate, and demonstrates the differences in real world circumstances. Ironically, faster hardware can increase, not reduce, the differences between these architectural approaches to HCI. This is not unlike what occurs quite often now at the application level, where faster hardware exposes actual bottlenecks in software/application design previously unnoticeable with older, slower hardware.

Now that an explanation has been given as to why "storage efficiency" really means so much more than data services like deduplication and compression, the next post in this series will focus on CPU resources in HCI environments, and what to look out for when observing CPU usage behaviors in HCI environments.

Does the concept of host resource usage interest you? If so, stay tuned for the book, vSphere 6.5 Host Resources Deep Dive by Frank Denneman and Niels Hagoort. It is surely to be a must-have for those interested in the design and optimization of virtualized environments. You can also follow updates from them at @hostdeepdive on Twitter.

 

Juggling priorities, and my unplanned, but temporary break from blogging

If you have a blog post for long enough, sometimes readers measure your contributions only by the number of recent posts published. If that were an accurate way to measure productivity, then apparently I haven’t been doing anything in the past few months. Quite the contrary really. Since joining VMware in August of 2016, I’ve had the opportunity to work with great teams on exciting projects. It has been fast paced, educational, and fun.

So what have I been doing anyway…
It is fair to say that VMware vSAN 6.6 is the most significant launch in the history of vSAN. The amount of features packed into 6.6 is a testament to the massive focus by R&D to deliver an unprecedented set of new features and enhancements. Part of the effort of any release is rolling out technical content. A significant amount of that load falls on Technical Marketing, and by virtue of being a part of the team, I’ve been right in the thick of it. The list of deliverables is long, but it has been fun to be a part of that process.

How vSAN is integrated into the hypervisor gives it unique abilities in integration and interoperability. An area of focus for me has been demonstrating practical examples of this integration – in particular, the integration that vSAN has with vRealize Operations, and vRealize Log Insight. Connecting the dots between what really happens in data centers, and product capabilities is one way to show how, and why the right type of integration is so important. There are a lot of exciting things coming in this space, so stay tuned.

I’ve also had the chance to join my colleagues, Pete Flecha and John Nicholson on the Virtually Speaking Podcast. In episode 38, we talked a little about storage performance, and in episode 41, we discussed some of the new features of vSAN 6.6. What John and Pete have been able to accomplish with that podcast over the past year is impressive, and the popularity of it speaks to the great content they produce.

Since joining VMware, I also stepped down from my role as a Seattle VMUG leader. It was a fun and educational experience that helped me connect with the community, and appreciate every single volunteer out there. VMUG communities everywhere are run by enthusiasts of technology, and their passion is what keeps it all going. I appreciated the opportunity, and they are in good hands with Damian Kirby, who has taken over leadership duties.

All of these activities, while gratifying, left little time for my normal cadence of posts. I’ve always enjoyed creating no-nonsense, interesting, unique content with a lot of detail. Testing, capturing observations, and investigating issues is fun and rewarding for me, but it is also extremely time consuming. I spent the past 8 years churning out this type of content at a clip of about one post per month. That doesn’t sound like much, but with the level of detail and testing involved, it was difficult to keep up the pace recently. This short reprieve has allowed me to rethink what I want my site to focus on. While much of the content I’m producing these days shows up in other forms, and in other locations, I’ll now have the chance to mix up the content out here a bit. Some new posts are in the works, and hope to pick up the pace soon, if for nothing else, to let everyone know I’m actually doing something. Smile

– Pete

Helpful Links

vSAN in cost effective independent environments

Old habits in data center design can be hard to break.  New technologies are introduced that process data faster, and move data more quickly.  Yet all too often, the thought process for data center design remains the same – inevitably constructed and managed in ways that reflect conventional wisdom and familiar practices.  Unfortunately these common practices are often due to constraints of the technologies that preceded it, rather than aligning the current business objectives with new technologies and capabilities.

Historically, no component of an infrastructure dictated design and operation more than storage.  The architecture of traditional shared storage often meant that the storage infrastructure was the oddball of the modern data center.  Given enough capacity, performance, and physical ports on a fabric, a monolithic array could serve up several vSphere clusters, and therein lies the problem.  The storage was not seen or treated as a clustered resource by the hypervisor like compute.  This centralized way of storing data invited connectivity by as many hosts as possible in order to justify the associated costs. Unfortunately it also invited several problems.  It placed limits on data center design because in part, it was far too impractical to purchase separate shared storage for every use case that would benefit from an independent environment isolated from the rest of the data center.  As my colleague John Nicholson (blog/twitter) has often said, "you can’t cut your array in half."  It’s a humorous, but cogent way to describe this highly common problem.

vSANWhile VMware vSAN has proven to be extremely well suited for converging all applications into the same environment, business requirements may dictate a need for self contained, independent environments isolated in some manner from the rest of the data center.  In "Cost Effective Independent Environments using vSAN" found on VMware’s StorageHub, I walk through four examples that show how business requirements may warrant a cluster of compute and storage dedicated for a specific purpose, and why vSAN is an ideal solution.  The examples provided are:

  • Independent cluster management
  • Development/Test environments
  • Application driven requirements
  • Multi-purpose Disaster Recovery

Each example listed above details how traditional storage can fall short in delivering results efficiently, then compares how vSAN addresses and solves those specific design and operational challenges. Furthermore, learn how storage related controls are moved into the hypervisor using Storage Policy Based Management (SPBM), VMware’s framework that delivers storage performance and protection policies to VMs, and even individual VMDKs, all within vCenter.  SPBM is the common management framework used in vSAN and Virtual Volumes (VVols), and is clearly becoming the way to manage software defined storage.  Each example wraps up with a number of practical design tips for that specific scenario in order to get you started in building a better data center using vSAN.

Clustering is an incredibly powerful concept, and vSphere clusters in particular bring capabilities to your virtualized environment that are simply beyond comparison.  With VMware vSAN, the power of clustering resources are taken to the next level, forming the next logical step in the journey of modernizing your environment in preparation for a fully software defined data center.

This use case published is the first of many more to come that are focused on practical scenarios reflecting common needs of organizations large and small, and how vSAN can help deliver results, quickly and effectively.  Stay tuned!

– Pete

Accommodating for change with Virtual SAN

One of the many challenges to proper data center design is trying to accommodate for future changes, and do so in a practical way. Growth is often the reason behind change, and while that is inherently a good thing, IT budgets often don’t see that same rate of increase. CFO’s expect economies of scale to make your environment more cost efficient, and so should you.

Unfortunately, applications are always demanding more resources. The combination of commodity x86 servers and virtualization provided a flexible way to accommodate growth when it came to compute and memory resources, but addressing storage capacity and storage performance was far more difficult. Hyper-converged architectures helped break down this barrier somewhat, but some solutions lacked flexibility to cope with increasing storage capacity or performance beyond the initial prescribed configurations defined by a vendor. Users need a way to easily increase their HCI storage resources in the middle of a lifecycle without always requesting for yet another capital expenditure.

“A customer can have a car painted any color he wants as long as it’s black” — Henry Ford

But wait… it doesn’t always have to be that way. Take a look at my post on Virtual Blocks on Options in scalability with Virtual SAN. See how VSAN allows for a smarter way to approach your evolving resource needs, giving the power of choice in how you scale your environment back to you. Whether you choose to build your own servers using the VMware compatibility guide, go with VSAN Ready Nodes, or select from one of the VxRAIL options available, the principals described in the post remain the same. I hope it sparks a few ideas on how you can apply this flexibility in a strategic way to your own environment.

Thanks for reading…

The success of VSAN, and my move to VMware

For the past few years, I’ve had the opportunity to share with others how to better understand their Data Center workloads, and how to use this knowledge to better service the needs of their organizations.  As a Technical Marketing Engineer for PernixData, the role allowed me to maintain a pulse on the needs of the customers and the partners, as well as analyze what others were doing from a competitive standpoint. It was a great way to distinguish between industry hyperbole, versus what solutions people were really interested in, and implementing.

One observation simply couldn’t be ignored. It was clear that many were adopting VMware VSAN – and doing it in a big way. The rate of adoption even seemed to outpace the exceptionally rapid rate the product has been maturing. Thinking back to my days on the customer side, it was easy to see why. With its unique traits by virtue of it being built into the hypervisor, it appeals to the sensibilities of the Data Center Administrator, and the CFO. VSAN was resonating with the needs of the customers, and doing so in a much more tangible way than official market research numbers could describe.

I wanted to be a part of it.

With that, I’m thrilled to be joining VMware’s Storage and Availability business unit, as a member of their Technical Marketing Team. One of my areas of focus will be VSAN, as well as many other related topics. I’m joining the likes of GS Khalsa, Jase McCarty, Jeff Hunter, John Nicholson, Ken Werneburg, and Pete Flecha. To say it’s a honor to be joining this team is a bit of an understatement.  I’m truly grateful for the opportunity.

A special thanks to all of the great people I worked with at PernixData. An incredibly talented group of people striving to make a difference. The best of luck to each and every one of them. It’s been a truly rewarding experience indeed.

You’ll be able to my official contributions out on VMware’s Virtual Blocks and as well as other locations. I’ll be continuing to post out here at vmPete.com for unofficial content, and other things that continue to interest me.

Onward…

Working set sizes in the Data Center

There is no shortage of mysteries in the data center. These stealthy influencers can undermine performance and consistency of your environment, while remaining elusive to identify, quantify, and control. Virtualization helped expose some of this information, as it provided an ideal control plane for visibility. But it does not, and cannot properly expose all data necessary to account for these influencers. The hypervisor also has a habit of presenting the data in ways that can be misinterpreted.

One such mystery as it relates to modern day virtualized data centers is known as the "working set." This term certainly has historical meaning in the realm of computer science, but the practical definition has evolved to include other components of the Data Center; storage in particular. Many find it hard to define, let alone understand how it impacts their data center, and how to even begin measuring it.

We often focus on what we know, and what we can control. However, lack of visibility of influencing factors in the data center does not make it unimportant. Unfortunately this is how working sets are usually treated. It is often not a part of a data center design exercise because it is completely unknown. It is rarely written about for the very same reason. Ironic considering that every modern architecture deals with some concept of localization of data in order to improve performance. Cached content versus it’s persistent home. How much of it is there? How often is it accessed? All of these types of questions are critically important to know.

What is it?
For all practical purposes, a working set refers the amount of data that a process or workflow uses in a given time period. Think of it as hot, commonly accessed data of your overall persistent storage capacity. But that simple explanation leaves a handful of terms that are difficult to qualify, and quantify. What is recent? Does "amount" mean reads, writes, or both? And does it define if it is the same data written over and over again, or is it new data? Let’s explore this more.

There are a several traits of working sets that are worth reviewing.

  • Working sets are driven by the workload, the applications driving the workload, and the VMs that they run on.  Whether the persistent storage is local, shared, or distributed, it really doesn’t matter from the perspective of how the VMs see it.  The size will be largely the same.
  • Working sets always relate to a time period.  However, it’s a continuum.  And there will be cycles in the data activity over time.
  • Working set will comprise of reads and writes.  The amount of each is important to know because reads and writes have different characteristics, and demand different things from your storage system.
  • Working set size refers to an amount, or capacity, but what and how many I/Os it took to make up that capacity will vary due to ever changing block sizes.
  • Data access type may be different.  Is one block read a thousand times, or are a thousand blocks read one time?  Are the writes mostly overwriting existing data, or is it new data?  This is part of what makes workloads so unique.
  • Working set sizes evolve and change as your workloads and data center change.  Like everything else, they are not static.

A simplified, visual interpretation of data activity that would define a working set, might look like below.

image

If a working set is always related to a period of time, then how can we ever define it? Well in fact, you can. A workload often has a period of activity followed by a period of rest. This is sometimes referred to the "duty cycle." A duty cycle might be the pattern that shows up after a day of activity on a mailbox server, an hour of batch processing on a SQL server, or 30 minutes compiling code. Taking a look over a larger period of time, duty cycles of a VM might look something like below.

image

Working sets can be defined at whatever time increment desired, but the goal in calculating a working set will be to capture at minimum, one or more duty cycles of each individual workload.

Why it matters
Determining a working set sizes helps you understand the behaviors of your workloads in order to better design, operate, and optimize your environment. For the same reason you pay attention to compute and memory demands, it is also important to understand storage characteristics; which includes working sets. Understanding and accurately calculating working sets can have a profound effect on the consistency of a data center. Have you ever heard about a real workload performing poorly, or inconsistently on a tiered storage array, hybrid array, or hyper-converged environment? This is because both are extremely sensitive to right sizing the caching layer. Not accurately accounting for working set sizes of the production workloads is a common reason for such issues.

Classic methods for calculation
Over the years, this mystery around working set sizes has resulted in all sorts of sad attempts at trying to calculate. Those attempts have included:

  • Calculate using known (but not very helpful) factors.  These generally comprise of looking at some measurement of IOPS over the course of a given time period.  Maybe dress it up with a few other factors to make it look neat.  This is terribly flawed, as it assumes one knows all of the various block sizes for that given workload, and that block sizes for a workload are consistent over time.  It also assumes all reads and writes use the same block size, which is also false.
  • Measure working sets defined on a storage array, as a feature of the array’s caching layer.  This attempt often fails because it sits at the wrong location.  It may know what blocks of data are commonly accessed, but there is no context to the VM or workload imparting the demand.  Most of that intelligence about the data is lost the moment the data exits the HBA of the vSphere host.  Lack of VM awareness can even make an accurately guessed cache size on an array be insufficient at times due to cache pollution from noisy neighbor VMs.
  • Take an incremental backup, and look at the amount of changed data.  This sounds logical, but this can be misleading because it will not account for data that is written over and over, nor does it account for reads.  The incremental time period of the backup may also not be representative of the duty cycle of the workload.
  • Guess work.  You might see "recommendations" that say a certain percentage of your total storage capacity used is hot data, but this is a more formal way to admit that it’s nearly impossible to determine.  Guess large enough, and the impact of being wrong will be less, but this introduces a number of technical and financial implications on data center design. 

Since working sets are collected against activity that occurs on a continuum, calculating a typical working set with a high level of precision is not only impossible, but largely unnecessary.  When attempting to determine working set size of a workload, the goal is to come to a number that reflects the most typical behavior of a single workload, group of workloads, or a total sum of workloads across a cluster or data center.

A future post will detail approaches that should give a sufficient level of understanding on active working set sizes, and help reduce the potential of negative impacts on data center operation due to poor guesswork.

Thanks for reading