Cracking the mysteries of the Data Center is a bit like space exploration. You think you understand what everything is, and how it all works together, but struggle to understand where fact and speculation intersect. The topic of block sizes, as they relate to storage infrastructures is one such mystery. The term being familiar to some, but elusive enough to remain uncertain as to what it is, or why it matters.
This inconspicuous, but all too important characteristic of storage I/O has often been misunderstood (if not completely overlooked) by well-intentioned Administrators attempting to design, optimize, or troubleshoot storage performance. Much like the topic of Working Set Sizes, block sizes are not of great concern to an Administrator or Architect because of this lack of visibility and understanding. Sadly, myth turns into conventional wisdom – in not only what is typical in an environment, but how applications and storage systems behave, and how to design, optimize, and troubleshoot for such conditions.
Let’s step through this process to better understand what a block is, and why it is so important to understand it’s impact on the Data Center.
What is it?
Without diving deeper than necessary, a block is simply a chunk of data. In the context of storage I/O, it would be a unit in a data stream; a read or a write from a single I/O operation. Block size refers the payload size of a single unit. We can blame a bit of this confusion on what a block is by a bit of overlap in industry nomenclature. Commonly used terms like blocks sizes, cluster sizes, pages, latency, etc. may be used in disparate conversations, but what is being referred to, how it is measured, and by whom may often vary. Within the context of discussing file systems, storage media characteristics, hypervisors, or Operating Systems, these terms are used interchangeably, but do not have universal meaning.
Most who are responsible for Data Center design and operation know the term as an asterisk on a performance specification sheet of a storage system, or a configuration setting in a synthetic I/O generator. Performance specifications on a storage system are often the result of a synthetic test using the most favorable block size (often 4K or smaller) for an array to maximize the number of IOPS that an array can service. Synthetic I/O generators typically allow one to set this, but users often have no idea what the distribution of block sizes are across their workloads, or if it is even possibly to simulate that with synthetic I/O. The reality is that many applications draw a unique mix of block sizes at any given time, depending on the activity.
I first wrote about the impact of block sizes back in 2013 when introducing FVP into my production environment at the time. (See section "The IOPS, Throughput & Latency relationship") FVP provided a tertiary glimpse of the impact of block sizes in my environment. Countless hours with the performance graphs, and using vscsistats provided new insight about those workloads, and the environment in which they ran. However, neither tool was necessarily built for real time analysis or long term trending of block sizes for a single VM, or across the Data Center. I had always wished for an easier way.
Why does it matter?
The best way to think of block sizes is how much of a storage payload consisting in a single unit. The physics of it becomes obvious when you think about the size of a 4KB payload, versus a 256KB payload, or even a 512KB payload. Since we refer to them as a block, let’s use a square to represent their relative capacities.
Throughput is the result of IOPS, and the block size for each I/O being sent or received. It’s not just the fact that a 256KB block has 64 times the amount of data that a 4K block has, it is the amount of additional effort throughout the storage stack it takes to handle that. Whether it be bandwidth on the fabric, the protocol, or processing overhead on the HBAs, switches, or storage controllers. And let’s not forget the burden it has on the persistent media.
This variability in performance is more prominent with Flash than traditional spinning disk. Reads are relatively easy for Flash, but the methods used for writing to NAND Flash can inhibit the same performance results from reads, especially with writes using large blocks. (For more detail on the basic anatomy and behavior of Flash, take a look at Frank Denneman’s post on Flash wear leveling, garbage collection, and write amplification. Here is another primer on the basics of Flash.) A very small number of writes using large blocks can trigger all sorts of activity on the Flash devices that obstructs the effective performance from behaving as it does with smaller block I/O. This volatility in performance is a surprise to just about everyone when they first see it.
Block size can impact storage performance regardless of the type of storage architecture used. Whether it is a traditional SAN infrastructure, or a distributed storage solution used in a Hyper Converged environment, the factors, and the challenges remain. Storage systems may be optimized for different block size that may not necessarily align with your workloads. This could be the result of design assumptions of the storage system, or limits of their architecture. The abilities of storage solutions to cope with certain workload patterns varies greatly as well. The difference between a good storage system and a poor one often comes down to the abilities of it to handle large block I/O. Insight into this information should be a part of the design and operation of any environment.
The applications that generate them
What makes the topic of block sizes so interesting are the Operating Systems, the applications, and the workloads that generate them. The block sizes are often dictated by the processes of the OS and the applications that are running in them.
Unlike what many might think, there is often a wide mix of block sizes that are being used at any given time on a single VM, and it can change dramatically by the second. These changes have profound impact on the ability for the VM and the infrastructure it lives on to deliver the I/O in a timely manner. It’s not enough to know that perhaps 30% of the blocks are 64KB in size. One must understand how they are distributed over time, and how latencies or other attributes of those blocks of various sizes relate to each other. Stay tuned for future posts that dive deeper into this topic.
Traditional methods capable of visibility
The traditional methods for viewing block sizes have been limited. They provide an incomplete picture of their impact – whether it be across the Data Center, or against a single workload.
1. Kernel statistics courtesy of vscsistats. This utility is a part of ESXi, and can be executed via the command line of an ESXi host. The utility provides a summary of block sizes for a given period of time, but suffers from a few significant problems.
- Not ideal for anything but a very short snippet of time, against a specific vmdk.
- Cannot present data in real-time. It is essentially a post-processing tool.
- Not intended to show data over time. vscsistats will show a sum total of I/O metrics for a given period of time, but it’s of a single sample period. It has no way to track this over time. One must script this to create results for more than a single period of time.
- No context. It treats that workload (actually, just the VMDK) in isolation. It is missing the context necessary to properly interpret.
- No way to visually understand the data. This requires the use of other tools to help visualize the data.
The result, especially at scale, is a very labor intensive exercise that is an incomplete solution. It is extremely rare that an Administrator runs through this exercise on even a single VM to understand their I/O characteristics.
2. Storage array. This would be a vendor specific "value add" feature that might present some simplified summary of data with regards to block sizes, but this too is an incomplete solution:
- Not VM aware. Since most intelligence is lost the moment storage I/O leaves a host HBA, a storage array would have no idea what block sizes were associated with a VM, or what order they were delivered in.
- Measuring at the wrong place. The array is simply the wrong place to measure the impact of block sizes in the first place. Think about all of the queues storage traffic must go through before the writes are committed to the storage, and reads are fetched. (It also assumes no caching tiers outside of the storage system exist). The desire would be to measure at a location that takes all of this into consideration; the hypervisor. Incidentally, this is often why an array can show great performance on the array, but suffer in the observed latency of the VM. This speaks to the importance of measuring data at the correct location.
- Unknown and possibly inconsistent method of measurement. Showing any block size information is not a storage array’s primary mission, and doesn’t necessarily provide the same method of measurement as where the I/O originates (the VM, and the host it lives on). Therefore, how it is measured, and how often it is measured is generally of low importance, and not disclosed.
- Dependent on the storage array. If different types of storage are used in an environment, this doesn’t provide adequate coverage for all of the workloads.
The Hypervisor is an ideal control plane to analyze the data. It focuses on the results of the VMs without being dependent on nuances of in-guest metrics or a feature of a storage solution. It is inherently the ideal position in the Data Center for proper, holistic understanding of your environment.
Eyes wide shut – Storage design mistakes from the start
The flaw with many design exercises is we assume we know what our assumptions are. Let’s consider typical inputs when it comes to storage design. This includes factors such as
- Peak IOPS and Throughput.
- Read/Write ratios
- RAID penalties
- Perhaps some physical latencies of components, if we wanted to get fancy.
Most who have designed or managed environments have gone through some variation of this exercise, followed by a little math to come up with the correct blend of disks, RAID levels, and fabric to support the desired performance. Known figures are used when they are available, and the others might be filled in with assumptions. But yet, block sizes, and everything they impact are nowhere to be found. Why? Lack of visibility, and understanding.
If we know that block sizes can dramatically impact the performance of a storage system (as will be shown in future posts) shouldn’t it be a part of any design, optimization, or troubleshooting exercise? Of course it should. Just as with working set sizes, lack of visibility doesn’t excuse lack of consideration. An infrastructure only exists because of the need to run services and applications on it. Let those applications and workloads help tell you what type of storage fits your environment best. Not the other way around.
Is there a better way?
The ideal approach for measuring the impact of block sizes will always include measuring from the location of the hypervisor, as this will provide these measurements in the right way, and from the right location. vscsiStats and vCenter related metrics are an incredible resource to tap into, and will provide the best understanding on impacts of block sizes in a storage system. There may be some time investment to decipher block size characteristics of a workload, but the payoff is generally worth the effort.