My vSphere Home Lab. 2015 edition

The vSphere Home Lab. For some, it is a tool for learning. For others it is a hobby. And for the rest of us, it is a weird addiction rationalized as one of the first two reasons. Home Labs come in all shapes and sizes, and there really is no right or wrong way to create one. Apparently interest in vSphere Home Labs hasn’t waned, as there are now countless resources available online illustrating various designs. At one of our recent Seattle VMUG meetings, we gave a presentation on Home Lab arrangements and ideas. There was great interaction from the audience, and we received several comments afterward on how much they enjoyed the discussion and learning about what others were doing. If you are a VMUG leader and are looking for ideas for presentations, I’d recommend this topic at one of your own local meetings.

Much like a real Data Center, Home Labs are a continual work in progress. Shiny new gear often sits by the warts. Replacing the old equipment with new gear usually correlates to how much time and money you wish to dedicate to the effort. I marvel at some setups by others in the industry. A few of the more recent ones to keep an eye on is the work Erik Bussink does with his high speed networking, and the cool setup Jason Langer has with his half height cabinet and rack mounted hosts all on 10GbE. Pretty funny considering how many companies are still running 3 hosts with 1GbE networking.

In my conversations with others in the community, I realized I didn’t have a post I could direct someone to when they would ask what I used in my own environment. Well, let me lay it out for you, as of February, 2015.

Primary vSphere cluster
2 hosts currently make up this cluster, and consist of the following:

  • Lian LI PC-V351B chassis paired with a Scythe SY 1225SL 12L 120mm case fan.
  • SuperMicro MBD-X9SCM-F-O LGA Motherboard with IPMI (a must!)
  • Intel E3-1230 Sandy Bridge 3.2Ghz CPU (single socket, 4 physical cores)
  • 32GB RAM
  • Seasonic X series SS-400FL Power supply
  • Qty 3, Intel E1G42ETBLK dual port NIC
  • Mellanox MT25418 DDR 2 port InfiniBand HCA (10Gb per connection)
  • 8GB USB drive (boot)
  • 2TB SATA disk for local storage (testing)
  • Qty 2.  SATA based SSDs.  (varies with testing)

Management Cluster
At this time, a single host makes up this cluster, but intend to add a second unit.

  • Intel NUC BOXD54250WYKH1 Intel Core i5-4250U
  • Intel 530 240GB mSATA SSD
  • Crucial 16GB Kit (2x8GB)
  • Extra drive bay (for additional 2.5" SSD if needed)

The ATX style hosts have served quite well over the last 2 1/2 years. They are starting to show their age, but are quiet, and power efficient (read: low heat). Unfortunately they max out at just 32GB of RAM, which gets eaten up pretty quickly these days. The chassis started out very empty at first, but as I started to add SSDs and spinning disks for additional testing, InfiniBand cards, along with the occasional PCIe flash card or storage controller, I don’t have much room to spare anymore.

The Intel NUC is an interesting solution. In a vSphere Home Lab, the biggest constraints are that they are limited to 16GB of RAM, and a single 1GbE NIC. Since these units will serve as my management cluster, it should be fine, and it allows me to be more destructive on the primary two host cluster. They also fit into the small server rack quite nicely. I prefer the slightly thicker D54250WYKH as opposed to the traditional Intel D54250WYK model. It’s slightly thicker, but allows for an additional internal 2.5" drive. This offers a lot of flexibility if you wanted to keep some VMs on local storage, or possibly do some limited testing with host based caching. If they ever become too underpowered, they will always find use as a media server or workstation.

Most of my networking needs flow through a Cisco SG300-20. This is a feature rich, layer 3 switch that I’ve written about in the past (Using the Cisco SG300-20 Layer 3 switch in a home lab) I’ve used up all 20 ports, and really need another one. However, with the advent of other good layer 3 switches out there, and with the possibility of eventually moving my Lab to 10GbE, I’ve been making do with what I have.

As noted on my post Testing Infiniband in the home lab with PernixData FVP I introduced InfiniBand as a relatively affordable way to test high speed interconnects between hosts. I avoid the need to have an InfiniBand switch because with only two hosts, I can simply directly connect them. They are only passing vMotion and PernixData FVP traffic, so there is no need worry about routing. A desire to add a 3rd or 4th host gets complex, as I’d have to take the plunge and invest in an IB switch. (loud and not cheap).

Persistent storage comes from a Synology DS1512+ and a Synology DS1514+ NAS. unit. Both are 5 bay units, and have a mix of spinning disk, and SSDs. The primary difference being that the DS1514+ has four, 1GbE ports on it versus the older DS1512+ has two. One unit is used for housing the majority of my Lab VMs, and non-lab based file storage, while the other is used for experimentation and performance testing. Realistically I only need one Synology unit, but I was able to pick up the newer model for a great price, and I couldn’t refuse. My plan is to split out lab duties and general storage needs to the separate units.

Synology has seemed to have won the battle of storage in the home labs. Those who own them know that while they are a little pricey, they are well worth it, and offer so many other benefits beyond just serving up block or file storage for a vSphere cluster.

Battery Backup
My luck with UPS units in the home has not been anything to brag about. It’s usually a case of looking like they work until you really need them. So far the best luck I’ve had is with the unit I’m currently using. It is a CyberPower 1500AVR. With the entire lab drawing around 200 watts, this means that there is only about a 25% load on the UPS.

Server rack
A two shelf wire utility rack from Lowe’s fit the bill quite nicely. It is small, affordable ($25), and seemed to house the goofy form factor of the Lian Li ATX chassis. The only problem is that if I add a another ATX style host, I may have to come up with a better rack solution.

While I had a good lab environment to test with, up until a few months ago, the workstation sitting next to the lab was old, tired, and no longer functional. I found myself not even using it. So I replaced it with an Intel NUC as well. There is a bit of a price premium when buying the NUC, but the form factor, performance, simplicity, and power consumption all make it a no-brainer in my book. The limitations it has as a vSphere host (single NIC, and 16GB of RAM max) is not an issue when used as a workstation. It performs great, and powers a dual Monitor setup really well.

What it looks like
Standing at just 35” high, you can see that it is pretty self contained.



    The Home Lab Road Map / Wish List
    When you have a Home Lab, you have plenty of time to think about what you want next.  The "what you have" is never quite the same as the "what you want."  So here is the path I’ll probably be taking:

  • A second Intel NUC to serve as a 2 node Management cluster. (Done.  See here)
  • 10GbE switch.  My primary hesitation on this is cost, and noise. 
  • New hosts.  I tempted to go the route of a 2U rack mounted chassis so that I can go to three or four hosts more efficiently. With SuperMicro offering some motherboards with a built-in 10GbE port, that is pretty enticing.
  • New gateway.  As the lab grows more sophisticated, the more the network topology looks like a small production environment.  That is why a proper router/firewall on this wish list.
  • New wireless AP.  Not technically part of the Home Lab, but plays an important role for obvious reasons.  I need a wireless AP that is not prone to memory leaks and manual reboots every three or four days.
  • Affordable PCIe based flash is really making inroads in the enterprise, but it’s still not affordable enough for the home.  I hope this changes, as PCIe avoids so many headaches with flash that runs through a traditional storage controller.

Lessons learned over the years
A few takeaways have come from spending many hours working with my Home Lab.  These reflect personal preferences more than anything, but they might save you some effort along the way as well.

1. The best Home Lab is the one that you use.  For quite some time, I used a nested lab on a burley laptop, in addition to the physical setup.  Ultimately the physical Home Lab won out because it fit more of what I wanted to test and work on.  If my interests were more focused on scripting or workflow automation, perhaps a nested lab would be fine.  But I’m a bit too much of a gear-head, and my job now focuses in performance on top of real hardware.  I also didn’t care to power up and power down the entire nested lab each time I wanted to work on the laptop.

2.  While "lab" implies all things experimental, it is common to have a desire for some services to be running all the time.  Perhaps your lab has some responsibilities as a media server.  Or in my case, it also runs my Horizon View environment that I use for remote access.  This makes the idea of tearing down a lab on a whim a bit more complex.  It’s where a Management cluster can come in handy. Having it physically segregated helps to keep things operational when you want to do a complete rebuild, or experiment with a beta version of vSphere.

3.  I stay away from the cheap SSDs.  They have no place in a real Data Center, and aren’t much better in the home.  When it comes to flash, you get what you pay for.  And sometimes, even when you pay, you still don’t get good performing SSDs.  Spend your money wisely.  Buying something multiple times over doesn’t save much money in the end.  And remember, controllers matter too.

4.  Initially I wanted to configure an arrangement that consumed as little power as possible.  Keeping the power down means keeping the heat generated down, and thus the noise.  Since my entire sits just an arms-length from where I work, it was important in the beginning, and important now.  The entire setup draws about 200 watts of power and makes 38dB of noise 3 feet away.  I’ve refused to add anything loud or hot, and if I’m forced to, the lab will have to be relocated into a new area.

5.  There is always a way to do things a little cheaper.  But consider what your time is worth, and remember the reason why you have a Home Lab in the first place.  That has driven several of my purchasing decisions, and helps remove some of the petty obstacles that can sidetrack the best of us from working on what we intended to.

6.  While some technologies and practices trickle down from production environments to the Home Lab.  Sometimes the opposite happens.  Two good examples of this might be the use of the VCSA (vSphere 5.5 or later), and letting ESXi run on a USB or MicroSD card.  And that is the beauty of a lab.  It invites experimentation, and filters out what looks good on paper, versus what actually works.  Keep an open mind, and use it for what it is good for; making mistakes, and learning .

Thanks for reading

– Pete


Applying Innovation in the Workplace


Those of us in this IT industry need not be reminded that IT is as much of a consumer of solutions as it is a provider of services.  Advancing technologies attempt to provide services that are faster, more resilient, and feature rich. Sometimes advancement brings unintended consequences while simultaneously creating even more room to innovate. Exciting for sure, but all of this can become a bit precarious if you hold decision making responsibilities on what technologies best suite your environment. Go too deep on the unproven edge, and risk the well-being of your company, and possibly your career. Arguably more dangerous is to stay too conservative, and risk being a Luddite holding onto unused, outdated or failed vestiges of your IT past. It is a delicate balance, but rarely does it reward the status quo. There is an IT administrator out there somewhere that still doesn’t trust x86 servers, let alone virtualization.

Nobody in this industry is the sole proprietor of innovation, and we are all better off for it. Good ideas are everywhere, and it is fun to see them materialize into functional products. IT departments get a front row seat in seeing how different solutions solve problems. Some ideas are better than others. Others create more problems than they fix. Many companies providing a solution are victims of bad timing, bad marketing, or poor execution. In the continuum of progress and changing market conditions, others fail to acknowledge the change and course correct, or simply lost sight of why they exist in the first place.

Then, there are some solutions that show up as a gem. Perhaps they are transformational to how a problem is solved. Maybe they win you over by their elegance in masking the terribly complex with clean and simple. Maybe the solution has a bit of both. Those who are responsible for running environments get pretty good at recognizing the standouts.

Innovation’s impact on thinking differently
"If I had asked my customers what they wanted they would have said a faster horse" — Henry Ford

It started out as a simple trial of some beta software. It ended up as an integral component of my infrastructure; viewed in the same way as my compute, switchgear, storage, and hypervisor. That is basically the story of how PernixData FVP became a part of my environment. For the next 18 months I would watch daily, hourly, even by the minute as to how my very demanding workloads were improved because of this new approach to solving a common problem. The results were immediate, and obvious. Faster code compiling times. Lower latencies and more predictable performance for all of our applications. All while gaining better visibility to the behavior and needs of our workloads. And of course, storage arrays that were no longer paralyzed by I/O requests. Even the best of slide decks couldn’t convey what I was seeing. I got to see it happen every day, and much like the magic of virtualization in general, it never got old.

It is for that reason that I’ve joined the team at PernixData. I get the chance to help others understand how the PernixData approach can help their environment, and is more than just a faster horse. I’m no longer responsible for my own workloads, but now get to help people better understand their own. Since I’ve always had a passion for virtualization, IT infrastructures, and how real application workloads impact an environment, I think it’s going to be a great fit. I look forward to working with an unbelievably talented group of people. It is quite an honor.

A tip of the hat
I leave an organization that is top notch. Tecplot is a market leader in data visualization, and is routinely voted in the top 100 companies to work for. This doesn’t happen by accident. It comes from great people, great leadership, and has resulted in trusted, innovative products. I would like to thank the ownership group for allowing me the opportunity to be a part of their team, as it has been an absolute pleasure to work there. I’ve learned a lot from smart, principled folks that make up that company, and am better off for it. I leave behind the day to day administrative duties and challenges of a virtualized environment, but I am very excited to join a great team of really smart people who have helped change how challenges in modern IT infrastructures are viewed, and addressed.

Happy New Year.

– Pete

Sustained power outages in the datacenter

Ask any child about a power outage, and you can tell it is a pretty exciting thing. Flashlights. Candles. The whole bit. The excitement is an unexplainable reaction to an inconvenient, if not frustrating event when seen through the eyes of adulthood. When you are responsible for a datacenter of any size, there is no joy that comes from a power outage. Depending on the facility the infrastructure lives in, and the tools put in place to address the issue, it can be a minor inconvenience, or a real mess.

Planning for failure is one of the primary tenants of IT. It touches as much on operational decisions as it does design. Mitigation steps from failure events follow in the wake of the actual design itself, and define if or when further steps need to be taken to become fully operational again. There are some events that require a series of well-defined actions (automated, manual, or somewhere in between) in order to ensure a predictable result. Classic DR scenarios generally come to mind most often, but shoring up steps on how to react to certain events should also include sustained power outages. The amount of good content on the matter is sparse at best, so I will share a few bits of information I have learned over the years.

The Challenges
One of the limitations with a physical design of redundancy when it comes to facility power is, well, the facility. It is likely served by a single utility district, and the customer simply doesn’t have options to bring in other power. The building also may have limited or no backup power. Generators may be sized large enough to keep the elevators and a few lights running, but that is about it. Many cannot, or do not provide power conditioned good enough that is worthy of running expensive equipment. The option to feed PDUs using different circuits from the power closet might also be limited.

Defining the intent of your UPS units is often an overlooked consideration. Are they sized in such a way just to provide enough time for a simple graceful shutdown? …And how long is that? Or are they sized to meet some SLA decided upon by management and budget line owners? Those are good questions, but inevitably, if the power it out for long enough, you have to deal with how a graceful shutdown will be orchestrated.

SMBs fall in a particularly risky category, as they often have a set of disparate, small UPS units supplying battery backed power, with no unified management system to orchestrate what should happen in an "on battery" event. It is not uncommon to see an SMB well down the road of virtualization, but their UPS units do not have the smarts to handle information from the items they are powering. Picking the winning number on a roulette wheel might give better odds than figuring out which is going to go first, and which is going to go last.

Not all power outages are a simple power versus no power issue. A few years back our building lost one leg of the three-phase power coming in from the electric vault under the nearby street. This caused a voltage "back feed" on one of the legs, which cut nominal voltage severely. This dirty power/brown-out scenario was one of the worst I’ve seen. It lasted for 7 very long hours during the middle of the night. While the primary infrastructure was able to be safely shutdown, workstations and other devices were toggling off and one due to this scenario. Several pieces of equipment were ruined, but many others ended up worse off than we were.

It’s all about the little mistakes
"Sometimes I lie awake at night, and I ask, ‘Where have I gone wrong?’  Then a voice says to me, ‘This is going to take more than one night" –Charlie Brown, Peanuts [Charles Schulz]

A sequence of little mistakes in an otherwise good plan can kill you. This transcends IT. I was a rock climber for many years, and a single tragic mistake was almost always the result of a series of smaller mistakes. It often stemmed from poor assumptions, bad planning, trivializing variables, or not acknowledging the known unknowns. Don’t let yourself be the IT equivalent to the climber that cratered on the ground.

One of the biggest potential risks is a running VM not fully committing I/Os from its own queues or anywhere in the data path (all the way down to the array controllers) before the batteries fully deplete. When the VMs are properly shutdown before the batteries deplete, you can be assured that all data has been committed, and the integrity of your systems and data remain intact.

So where does one begin? Properly dealing with a sustained outage is recognizing that it is a sequence driven event.

1. Determine what needs to stay on the longest. Often times it is not how long the a VM or system stays up on battery, but that they are gracefully shutoff before a hard power failure. Your UPS units buy you a finite amount of time. It takes more than "hope" to make your systems go down gracefully, and in the correct order.

2. Determine your hardware dependency chain. Work through what is the most logical order of shutdown for your physical equipment, and identify the last pieces of physical equipment that need to stay on. (Your answer better be switches).

3. Determine your software dependency chain. Many systems can be shut down at any time, but many others rely on other services to support their needs. Map it out. Also recognize that hardware can be affected by the lack of availability of software based services (e.g. DNS, SMTP, AD, etc.).

4. Determine what equipment might need a graceful shutdown, and what can drop when the UPS units run dry. Check with each Manufacturer for the answers.

Once you begin to make progress on better understanding the above, then you can look into how you can make it happen.

Making a retrospective work for you
It’s not uncommon to just be grateful that after the sustained power failure has ended, that you are grateful that everything came back up without issue. As a result, one leaves valuable information on the table on how to improve the process in the future. Seize the moment! Take notes during this event so that they can be remembered better during a retrospective. After all, the retrospective’s purpose is to define what went well and what didn’t. Stressful situations can play tricks on memory. Perhaps you couldn’t identify power cables easily, or wondered why your Exchange server took a long time to shut down, or didn’t know if or when vCenter shut down gracefully. This is a great method for capturing valuable information. In the "dirty power" story above, the UPS power did not last as long as I had anticipated because the server room’s dedicated AC unit shut down. The room heated up, and all of the variable speed fans kicked into high gear, draining the power faster than I thought. Lesson learned.

The planning process is served well by mocking up a power failure event on paper. Remember, thinking about it is free, and is a nice way to kick off the planning. Clearly, the biggest challenge around developing and testing power down and power up scenarios is that it has to be tested at some point. How do you test this? Very carefully. In fact, if you have any concerns at all, save it for a lab. Then introduce it into production in such a way that you can statically control or limit the shutdown event to just a few test machine, etc. The only scenario I can imagine on par with a sustained power outage is kicking off a domino-effect workflow that shuts down your entire datacenter.

The run book
Having a plan located only in your head will accomplish only two things.  It will be a guaranteed failure.  It can put your organization’s systems and data at risk.  This is why there is a need to define and publish a sustained power outage run book. Sometimes known as a "play chart" in the sports world, it is intended to define a reaction to an event under a given set of circumstances. The purpose is to 1.) vet out the process before hand, and 2.) avoid "heat of the moment" decisions under times of great stress that end up being the wrong decision.

The run book also serves as a good planning tool for determining if you have the tools or methods available to orchestrate a graceful, orderly shutdown of VMs and equipment based on the data provided by the UPS units. The run book is not just about graceful power down scenarios, but the steps required for a successful power-up. Sometimes this can be more well known, as an occasional lights out maintenance window may need to occur on some storage or firmware updates, replacement, etc. Power-up planning can also be important, including making sure you have some basic services available for the infrastructure as it powers up. For example, see "Using a Synology NAS as an emergency backup DNS server for vSphere" for a few tips on a simple way to serve up DNS to your infrastructure.

And don’t forget to make sure the run book is still accessible when you need it most (when there is no power). :-)

Tools and tips
I’ve stayed away from discussing specific scripts or tools for this because each environment is different, and may have different tools available to them. For instance, I use Emerson-Liebert UPS units, and have a controlling VM that will orchestrate many of the automated shutdown steps of VMs. Using PowerCLI, Python, or bash can be a complementary, or a critical part of a shutdown process. It is up to you. The key is to have some entity that will be able to interpret how much power remains on battery, and how one can trigger event driven actions from that information.

1. Remember that graceful shutdowns can create a bit of their own CPU and storage I/O storm. While not as significant as some boot storm upon power up, and generally is only noticeable at the beginning of the shutdown process when all systems are up, but it can be noticeable.

2. Ask your coworkers or industry colleagues for feedback. Learn about what they have in place, and share some stories about what went wrong, and what went right. It’s good for the soul, and your job security.

3. Focus more on the correct steps, sequence, and procedure, before thinking about automating it. You can’t automate something when you do not clearly understand the workflow.

4. Determine how you are going to make this effort a priority, and important to key stakeholders. Take it to your boss, or management. Yes, you heard me right. It won’t ever be addressed until it is given visibility, and identified as a risk. It is not about potential self-incrimination. It is about improving the plan of action around these types of events. Help them understand the implications for not handling in the correct way.

It is a very strange experience to be in an server room that is whisper quiet from a sustained power outage. There is an opportunity to make it a much less stressful experience with a little planning and preparation. Good luck!

– Pete

A look at FVP 2.0’s new features in a production environment

I love a good benchmark as much as the next guy. But success in the datacenter is not solely predicated on the results of a synthetic benchmark, especially those that do not reflect a real workload. This was the primary motivation in upgrading my production environment to FVP 2.0 as quickly as possible. After plenty of testing in the lab, I wanted to see how the new and improved features of FVP 2.0 impacted a production workload. The easiest way to do this is to sit back and watch, then share some screen shots.

All of the images below are from my production code compiling machines running at random points of the day. The workloads will always vary somewhat, so take them as more "observational differences" than benchmark results. Also note that these are much more than the typical busy VM. The code compiling VMs often hit the triple crown in the "difficult to design for" department.

  • Large I/O sizes. (32K to 512K, with most being around 256K)
  • Heavy writes (95% to 100% writes during a full compile)
  • Sustained use of compute, networking, and storage resources during the compiling.

The characteristics of flash under these circumstances can be a surprise to many. Heavy writes with large I/Os can turn flash into molasses, and is not uncommon to have sporadic latencies well above 50ms. Flash has been a boon for the industry, and has changed almost everything for the better. But contrary to conventional wisdom, it is not a panacea. The characteristics of flash need to be taken into consideration, and expectations should be adjusted, whether it be used as an acceleration resource, or for persistent data storage. If you think large I/O sizes do not apply to you, just look at the average I/O size when copying some files to a file server.

One important point is that the comparisons I provide did not include any physical changes to my infrastructure. Unfortunately, my peering network for replica traffic is still using 1GbE, and my blades are only capable of leveraging Intel S3700 SSDs via embedded SAS/SATA controllers. The VMs are still backed by a near end-of-life 1GbE based storage array.

Another item worth mentioning is that due to my workload, my numbers usually reflect worst case scenarios. You may have latencies that are drastically lower than mine. The point being that if FVP can adequately accelerate my workloads, it will likely do even better with yours. Now let’s take a look and see the results.

Adaptive Network Compression
Specific to customers using 1GbE as their peering network, FVP 2.0 offers a bit of relief in the form of Adaptive Network Compression. While there is no way for one to toggle this feature off or on for comparison, I can share what previous observations had shown.

FVP 1.x
Here is an older image a build machine during a compile. This was in WB+1 mode (replicating to 1 peer). As you can see, the blue line (Observed VM latency) shows the compounding effect of trying to push large writes across a 1GbE pipe, to SATA/SAS based Flash devices was not as good as one would hope. The characteristics of flash itself, along with the constraints of 1GbE were conspiring with each other to make acceleration difficult.



FVP 2.0 using Adaptive Network Compression
Before I show the comparison of effective latencies between 1.x and 2.0, I want to illustrate the workload a bit better. Below is a zoomed in view (about a 20 minute window) showing the throughput of a single VM during a compile job. As you can see, it is almost all writes.


Below shows the relative number of IOPS. Almost all are write IOPS, and again, the low number of IOPS relative to the throughput is an indicator of large I/O sizes. Remember that with 512K I/O sizes, it only takes a couple of hundred IOPS to nearly saturate a 1GbE link – not to mention the problems that flash has with it.


Now let’s look at latency on that same VM, during that same time frame. In the image below, the blue line shows that the VM observed latency has now improved to the 6 to 8ms range during heavy writes (ignore the spike on the left, as that was from a cold read). The 6 to 8ms of latency is very close to the effective latency of a WB+0, local flash device only configuration.


Using the same accelerator device (Intel S3700 on embedded Patsburg controllers) as in 1.x, the improvements are dramatic. The "penalty" for the redundancy is greatly reduced to the point that the backing flash may be the larger contributor to the overall latency. What has really been quite an eye opener is how well the compression is helping. In just three business days, it has saved 1.5 TB of data running over the peer network.  (350 GB of savings coming from another FVP cluster not shown)


Distributed Fault Tolerant Memory
If there is one thing that flash doesn’t do well with, it is writes using large I/O sizes. Think about all of the overhead that comes from flash (garbage collection, write amplification, etc.), and that in my case, it still needs to funnel through an overwhelmed storage controller. This is where I was looking forward to seeing how Distributed Fault Tolerant Memory (DFTM) impacted performance in my environment. For this test, I carved out 96GB of RAM on each host (384GB total) for the DFTM Cluster.

Let’s look at a similar build run accelerated using write-back, but with DFTM. This VM is configured for WB+1, meaning that it is using DFTM, but still must push the replica traffic across a 1GbE pipe. The image below shows the effective latency of the WB+1 configuration using DFTM.


The image above shows that using DFTM in a WB+1 mode eliminated some of that overhead inherent with flash, and was able to drop latencies below 4ms with just a single 1GbE link. Again, these are massive 256K and 512K I/Os. I was curious to know how 10GbE would have compared, but didn’t have this in my production environment.

Now, let’s try DFTM in a WB+0 mode. Meaning that it has no peering traffic to send it to. What do the latencies look like then for that same time frame?


If you can’t see the blue line showing the effective (VM observed) latencies, it is because it is hovering quite close to 0 for the entire sampling period. Local acceleration was 0.10ms, and the effective latency to the VM under the heaviest of writes was just 0.33ms. I’ll take that.

Here is another image of when I turned a DFTM accelerated VM from WB+1 to WB+0. You can see what happened to the latency.


Keep in mind that the accelerated performance I show in the images above come from a VM that is living on a very old Dell EqualLogic PS6000e. Just fourteen 7,200 RPM SATA drives that can only serve up about 700 IOPS on a good day.

An unintended, but extremely useful benefit of DFTM is to troubleshoot replica traffic that has higher than expected latencies. A WB+1 configuration using DFTM eliminates any notion of latency introduced by flash devices or offending controllers, and limits the possibilities to NICs on the host, or switches. Something I’ve already found useful with another vSphere cluster.

Simply put, DFTM is a clear winner. It can address all of the things that flash cannot do well. It avoids storage buses, drive controllers, NAND overhead, and doesn’t wear out. And it sits as close to the CPU with as much bandwidth as anything. But make no mistake, memory is volatile. With the exception of some specific use cases such as non persistent VDI, or other ephemeral workloads, one should take advantage of the "FT" part of DFTM. Set it to 1 or more peers. You may give back a bit of latency, but the superior performance is perfect for those difficult tier one workloads.

When configuring an FVP cluster, the current implementation limits your selection to a single acceleration type per host. So, if you have flash already installed in your servers, and want to use RAM for some VMs, what do you do? …Make another FVP cluster. Frank Denneman’s post: Multi-FVP cluster design – using RAM and FLASH in the same vSphere Cluster describes how to configure VMs in the same vSphere cluster to use different accelerators. Borrowing those tips, this is how my FVP clusters inside of a vSphere cluster look.


Write Buffer and destaging mechanism
This is a feature not necessarily listed on the bullet points of improvements, but deserves a mention. At Storage Field Day 5, Satyam Vaghani mentioned the improvements with the destaging mechanism. I will let the folks at PernixData provide the details on this, but there were corner cases in which VMs could bump up against some limits of the destager. It was relatively rare, but it did happen in my environment. As far as I can tell, this does seem to be improved.

Destaging visibility has also been improved. Ever since the pre 1.0, beta days, I’ve wanted more visibility on the destaging buffer. After all, we know that all writes eventually have to hit the backing physical datastore (see Effects of introducing write-back caching with PernixData FVP) and can be a factor in design. FVP 2.0 now gives two key metrics; the amount of writes to destage (in MB), and the time to the backing datastore. This will allow you to see if your backing storage can or cannot keep up with your steady state writes. From my early impressions, the current mechanism doesn’t quite capture the metric data at a high enough frequency for my liking, but it’s a good start to giving more visibility.

Honorable mentions
NFS support is a fantastic improvement. While I don’t have it currently in production, it doesn’t mean that I may not have it in the future. Many organizations use it and love it. And I’m quite partial to it in the old home lab. Let us also not dismiss the little things. One of my favorite improvements is simply the pre-canned 8 hour time window for observing performance data. This gets rid of the “1 day is too much, 1 hour is not enough” conundrum.

There is a common theme to almost every feature evaluation above. The improvements I showcase cannot by adequately displayed or quantified with a synthetic workload. It took real data to appreciate the improvements in FVP 2.0. Although 10GbE is the minimum ideal, Adaptive Network Compression really buys a lot of time for legacy 1GbE networks. And DFTM is incredible.

The functional improvements to FVP 2.0 are significant. So significant that with an impending refresh of my infrastructure, I am now taking a fresh look at what is actually needed for physical storage on the back end. Perhaps some new compute with massive amounts of PCIe based flash, and RAM to create large tiered acceleration pools. Then backing spindles supporting our capacity requirements, with relatively little data services, and just enough performance to keep up with the steady-state writes.

Working at a software company myself, I know all too well that software is never "complete."  But FVP 2.0 is a great leap forward for PernixData customers.

Using FVP in multi-NIC vMotion environments

In FVP version 1.5, PernixData introduced a nice little feature that allows a user to specify the network to use for all FVP peering/replica traffic. This added quite a bit of flexibility in adapting FVP to a wider variety of environments. It can also come in handy when testing performance characteristics of different network speeds, similar to what I did when testing FVP over Infiniband. While the “network configuration” setting is self-explanatory, and ultra-simple, it is ESXi that makes it a little more adventurous.

VMkernels and rules to abide by. …Sort of.
“In theory there is no difference between theory and practice. In practice, there is.” — Yogi Berra

Under the simplest of arrangements, FVP will use the vMotion network for its replica traffic. If your vMotion works, then FVP works. FVP will also work in a multi-NIC vMotion arrangement. While it can’t use more than one VMkernel, vMotion certainly can. Properly configured, vMotion will use whatever links are available, leaving more opportunity and bandwidth for FVP’s replica traffic. This can be especially helpful in 1GbE environments. Okay, so far, so good. The problem can become when an ESXi host has multiple VMkernels in the same subnet.

The issues around having multiple VMkernels on a single host in one IP subnet is nothing new. The accepted practice has been to generally stay away from multiple VMkernels in a single subnet, but the lines blur a bit when factoring the VMkernel’s intended purpose.

  • In VMware Support Insider Post, it states to only use one VMkernel per IP Subnet.  Well, except for iSCSI storage, and vMotion.
  • In VMware KB 2007467, it states: “Ensure that both VMkernel interfaces participating in the vMotion have the IP address from the same IP subnet.

The motives for recommending isolation of VMkernels is pretty simple. The VMkernel network stack uses a single routing table to route traffic. Two hosts talking to each other on one subnet with multiple VMkernels may not know what interface to use. The result can be unexpected behavior, and depending on what service is sitting in the same network, even a loss of host connectivity. This behavior can also vary depending on the version of ESXi being used. ESXi 5.0 may act differently than 5.1, and 5.5 changes the game even more with the ability to create Custom TCP/IP stacks per VMkernel adapter.which could give each VMkernel its own routing table.

So what about FVP?
How does any of this relate to FVP? For me, this initial investigation stemmed from some abnormally high latencies I was seeing on my VMs. This is quite the opposite effect I’m used to having with FVP. As it turns out, when FVP was pinned to my vMotion-2 network, it was correctly sending out of the correct interface on my multi-NIC vMotion setup, but the receiving ESXi host was using the wrong target interface (vMotion-1 VMkernel on target host), which caused the latency. Just like other VMkernel behavior, it naturally wanted to always choose the lower vmk number. Configuring FVP to use vMotion-1 network resolved the issue instantly, as vMotion-1 in my case it was using vmk1 instead of vmk5. Many thanks to the support team for noticing the goofy communication path it was taking.

Testing similar behavior with vMotion
While the symptoms showed up in FVP, the cause is an ESXi matter. While not an exact comparison, one can simulate a similar behavior that I was seeing with FVP by doing a little experimenting with vMotion. The experiment simply involves taking an arrangement originally configured for Multi-NIC vMotion, disabling vMotion on the network with the lowest vmk number on both hosts, kicking off a vMotion, and observing the traffic via esxtop. (Warning. Keep this experiment to your lab only).

For the test, two ESXi 5.5 hosts were used, and mult-NIC vMotion was set up in accordance to KB 2007467. One vSwitch. Two VMkernel ports (vMotion-0 & vMotion-1 respectively) in an active/standby arrangement. The uplinks are flopped on the other VMkernel. Below is an example of ESX01:


And both displayed what I’d expect in the routing table.


The tests below will show what the traffic looks like using just one of the vMotion networks, but only where the “vMotion” service is enabled on one of the VMkernel ports.

Test 1: Verify what correct vMotion traffic looks like
First, let’s establish what correct vMotion traffic will look like. This is on a dual NIC vMotion arrangement in which only the network with the lowest numbered vmk is ticked with the “vMotion” service.

The screenshot below is how the traffic looks from the source on ESX01. The green bubble indicates the anticipated/correct VMkernel to be used. Success!


The screenshot below is how traffic looks from the target on ESX02. The green bubble indicates the anticipated/correct VMkernel to be used. Success!


As you can see, the traffic is exactly as expected, with no other traffic occurring on the other VMkernel, vmk2.

Test 2: Verify what incorrect vMotion traffic looks like
Now let’s look at what happens on those same hosts when trying to use only the higher numbered vMotion network. The “vMotion” service was changed on both hosts to the other VMkernel, and both hosts were restarted. What is shown below is how the traffic looks on a dual NIC vMotion arrangement in which the network with the lowest numbered vmk has the “vMotion” service unticked, and the higher numbered vMotion network has the service enabled.

The screenshot below is how the traffic looks from the source on ESX01. The green bubble indicates the anticipated/correct VMkernel to be used. The red bubble indicates the VMkernel it is actually using. Uh oh. Notice how there is no traffic coming from vmk2, where it should be coming from? It’s coming from vmk1, exactly like the first test.


The screenshot below is how traffic looks from the target on ESX02. The green bubble indicates the anticipated/correct VMkernel to be used.


As you can see, under the described test arrangement, ESXi can and may use the incorrect VMkernel on the source, when vMotion is disabled on the vMotion network with the lowest VMkernel number, and active on the other vMotion network. It was repeatable with both ESXi 5.0 and ESXi 5.5.  The results were consistent in tests with host uplinks connected to the same switch versus two stacked switches.  The tests were also consistent using both standard vSwitches and Distributed vSwitches.

The experiment above is just a simple test to better understand how the path from the source to the target can get confused.  From my interpretation, it is not unlike that of which is described in Frank Denneman’s post on why a vMotion network may accidently use a Management Network. (His other post on Designing your vMotion Network is also a great read, and applicable to the topic here.)  Since FVP can only use one specific VMkernel on each host, I believe I was able to simulate the basics of why ESXi was making it difficult for FVP when pinning the FVP replica traffic on my higher numbered vMotion network in my production network.  Knowing this lends itself to the first recommendation below.

A few different ways to configure FVP
After looking at the behavior of all of this, here are a few recommendations on using FVP with your VMkernel ports. Let me be clear that these are my recommendations only.

  • Ideally, create an isolated, non-routable network using a dedicated VLAN with a single VMkernel on each host, and assign only FVP to that network. It can live in whatever vSwitch is most suitable for your environment. (The faster the uplinks, the better).  This will isolate, and insure the peer traffic is flowing as designed, and will let a multi-NIC vMotion arrangement work by itself.  Here is an example of what that might look like:


  • If for some reason you can’t do the recommendation above, (maybe you need to wait on getting a new VLAN provisioned by your network team) use a vMotion network, but if it is a multi-NIC vMotion, set FVP to run on the vMotion network with lowest numbered VMkernel.  According to, yes, another great post from Frank, this was the default approach for FVP prior to exposing the ability to assign FVP traffic to a specific network.

Remember that if there is ever a need to modify anything related to the VMkernel ports (untick the “vMotion” configuration box, adding or removing VMkernels), be aware that the routing interface (as seen via esxcfg-route -l ) may not change until there is a host restart.  You may also find that using esxcfg-route -n to view the host’s arp table handy.

The ability for you to deliver your FVP traffic to it’s East-West peers in the fastest, most reliable way will allow you to make the most of FVP offers.  Treat the FVP like a first class citizen in your network, and it will pay off with better performing VMs.

And a special thank you to Erik Bussink and Frank Denneman for confirming my vMotion test results, and my sanity.

Thanks for reading.

– Pete


Using vscsiStats to better visualize storage I/O

As the saying goes, a picture is worth a thousand words. This isn’t more true than in the world of data visualization. Raw data has its place, but good visualization methods help translate numbers into a meaningful story, and assist with overcoming the deficiencies associated with looking at a spreadsheet of raw numbers. A good visual representation of data gives context, establishes relationships between the numbers, communicates results more clearly; making it easier for you and others to remember. The difference for you as an Administrator can be better approaches to trouble shooting, or helping you in your ability to make smart design and purchasing decisions.

Virtualization Administrators are faced with digesting performance information quite often. vCenter does a pretty good job of letting the Administrator skip the data collection nonsense, and jump into viewing relevant metrics in an easy to read manner. But the vCenter metrics do not always give a complete view of information available, and occasionally needs a little help when one is trying to better understand key performance indicators.

A different way to use vscsiStats
“Some people see the glass half full. Others see it half empty. I see a glass that’s twice as big as it needs to be.” — George Carlin.

VMware’s vscsiStats is a great tool to collect and view storage I/O data in a different way. It can help to harvest a wealth of information about VMs that can be manipulated in a number of ways. For as good as it is, I believe it suffers a bit in that it is geared toward providing summations of a single sample period of time. One can collect all sorts of great information during a specific period, but it gives you no idea of what happened when, and why. To be truly useful, it needs to handle continuous, adjacent sampling periods.

But fear not, with a little extra effort, vscsiStats can be manipulated to factor in time. Combine those results with an Excel 3D surface chart, and you have some neat new ways to interpret the data. Erik Zandboer has fantastic information on how to leverage vscsiStats to generate multiple sampling periods. Combine this with a nice template he provides, and most of the heavy lifting is done for you already. Having that created already was great for me, as I find that the fun is not in generating the graphs, but interpreting, and learning from them.

In an effort to see how similar data can look a bit differently using other tools, let us take a look at a production VM running a real code compiling workload. The area in the red bubble is the time period we will be concentrating on. The screen capture below shows the CPU utilization for the 8 vCPU VM.


The screen capture below shows the storage related metrics for the specific VMDK of the VM, such as read and write IOPS, latency, and number of outstanding commands. In this particular case, the VM is being accelerated by PernixData FVP, but I changed the configuration so that it was only accelerating reads via its "write-through" configuration. Write I/Os are limited to the speed of the backing physical infrastructure. I did this to provide some more interesting graphs, as you will see in a bit.


Now it is time to use vscsiStats to look at similar storage related metrics. In this case, vscsiStats sampled the data in 20 second intervals, for a duration of 400 seconds, and reflects the time period within the red bubbles in the screen captures above. It is a relatively short amount of time for observations, but I didn’t want to smooth the data too much by choosing a long sample interval. In the charts below, read related activity is in green, and write related activity is in dark red. Note that on values such as latency, and I/O size, the axis will use a logarithmic scale.

I/O Size
First, lets take a look at I/O size for reads


You see from above that read I/Os from this period of time were mostly 4K and 32K in size. Contrast this with the write I/Os that are shown for the same sample period below.


The image above shows a significant amount of write I/Os at 32K, 64K, 128K, 256K, and 512K. Notice how much different that looks as compared to the read I/Os. Unlike read I/Os, we know write I/O sizes tend to have a more significant impact on latency.

Now let’s take a look at latency.


Many of the read I/Os shown above come in at around .5ms to 1ms latency. Reads I/Os can be an easier I/O to satisfy, and the latency reflects that. The image below shows many of the writes coming in between 5ms and 15ms or higher. Just like with the other graphs, we get a better understanding of the magnitude of I/Os (z axis) that come in at a given measurement.


Outstanding I/Os
This shows the number of outstanding read I/Os when a new read I/O is issued. As you can see below, the reads are being served pretty fast, and does not have more than around 1 or 2 outstanding read I/Os. In an ideal world we would want this to be as low as possible for all reads and writes.


However, you can see that with writes, it is quite a different story. The increased latency, which comes in part due to the larger I/O sizes used, impacts the number of outstanding write I/Os waiting. The image shows a several points in which the number of outstanding write I/Os surpassed 20. I find this image below visually one of the most impactful.


Sequential versus Random
vscsiStats also demonstrates whether the I/O of a given workload is sequential, or random.


With both reads, and writes, you can see that this particular snippet of a workload is predominately random I/O. Sequential I/O would all be very closely aligned with the ‘0’ value near the middle of the graph.


You can see that from this very small, 6 1/2 minute time period on one VM, the workload demanded different things at different times from the backing storage. Differences that were not readily apparent from the traditional vCenter metrics. Now imagine what other workloads on the same system may look like, or even what other systems may look like. As an aggregate, how might all of these systems be taxing your hosts and storage infrastructures? These are all very good questions with answers specific to each and every environment.

As demonstrated above, using vscsiStats can be a great way to compliment other monitoring metrics found in vCenter, and will surely give you a better understanding of the behavior of your virtualized environment.

Thanks for reading.

– Pete

Observations with the Active Memory metric in vSphere

The subject of memory management of Operating Systems in vSphere is an enormously broad, and complex topic that has been covered quite well over the years. With all of that great information, there are characteristics with some of the metrics given that still seem to befuddle users. One of those metrics provided to us courtesy of vSphere is "Active Memory." I hope to provide a few real world examples of why this confusion occurs, and what to look out for in your own environment.

vSphere attempts to interpret how much memory is being actively used by a VM, and displays this in the form of “Active Memory.”  The VMkernel bases this estimate off of recently touched memory pages by the guest OS for a given sampling period. It then displays it as an average for that sampling period (maximums and minimums exposed with higher logging levels). It is a metric that has proven to be quite controversial. Some have grown frustrated by the perceived inaccuracies of it, but I believe the problem is not in the metric’s accuracy, but a misunderstanding of how it collects it’s data, and it’s meaning. Having additional data points to understand the behavior of your workload is a good thing. It is critical to know what it really means, and how different Operating Systems and applications may provide different results to this metric.

There are a wealth of good sources (a few links at the end of this post) on defining what Active Memory is as it relates to vSphere. The two takeaways of the Active Memory metric I like to remember is that 1.) It is a statistical estimate, and 2.) It represents a single sample period. In other words, it has no relationship to previous samplings, and therefore, may or may not represent the same memory pages accessed.

The Risk
"We have met the enemy, and he is us."  — Walt Kelly as Pogo

Since Active Memory is a unique metric outside of the paradigm of the OS, translating what it means to you, the application, or the guest OS can be prone to misinterpretation. The risk is interpreting it’s meaning incorrectly, and perhaps using it as the primary method for right sizing a VM. Interestingly enough, this can lead to both oversized VMs, and undersized VMs.

I believe that one thing that gets Administrators off on the wrong foot is vSphere’s own baked-in alarm of "Virtual Machine Memory Usage." This "Usage" metric is a percentage of total available memory for the VM, and is tied to the Active Memory metric in vSphere. It implies that when it is high, the VM is running out of memory, and when it is low, it is performing as designed with no memory issues. I will demonstrate how under certain circumstances, both of these assumptions can be wrong.

Oversizing a VM’s resources is not an uncommon occurrence. You would think spotting these systems might be easy and obvious. That is not always the case.

With respect to memory sizing, let’s do a little experiment. The example below is a bulk file copy (11 gigabytes worth of large and small files) from a Linux machine. The target can be local, or remote. The effect will be similar. We will observe the difference of Active Memory between the small VM (1GB of memory assigned), and the large VM (4GB of memory assigned), and what impacts it may or may not have on performance.

The Active Memory of the smaller Linux VM below


The Active Memory of the larger Linux VM below.


Note how the Active Memory increased on the 4GB Linux VM versus the 1GB Linux VM. This gives the impression that the file copy is using memory for the file copy job, and leaves less for the applications.

Now let us jump into ‘top’ inside the guest OS. It also shows figures that give the impression that the file copy using most of the memory for the copy job, and may trigger a vCenter Memory usage alarm.


But in this case, top is not telling the entire story either. Let’s take a look at the same resource utilization inside the guest using ‘htop’


Let’s look at utilization inside the guest using "free -m"


So what is going on here?  The Linux kernel will allocate memory that isn’t actively used by processes to other tasks like file system caches. This opportunistic use of memory will not interfere with other spawning processes. As soon as another process spawns, the Linux kernel will free that memory so that it can be used by the application. This is a clever use of resources, but as you can see, can also give the wrong impression inside the guest (via ‘top’), as well as in vSphere (via Active Memory). One can keep increasing the amount of memory assigned to a VM, and in many cases, this behavior will continue to occur. vSphere’s Active Memory metric does not attempt to distinguish what it is, beyond a change in value. In all cases, the memory statistics are not inaccurate, but just a different representation of memory usage.

The reason why I chose a bulk file copy as an experiment is because a file copy is largely perceived by the end user as being a storage I/O or network I/O matter. The behavior I described will most likely show up in Linux VMs being used as flat-file storage servers (something I see often), but is not limited to just that type of workload. I should also mention that during the testing, the ability for Linux to use memory for some of it’s file handling tasks was more noticeable when using slow backing storage in comparison to faster storage.

If you are purely a Windows shop, remember that this characteristic will show up with virtual appliances, as they are all Linux VMs. Lets take a look at that same bulk file copy in Windows, and see how it relates to Active Memory.

The Active Memory of the smaller Windows VM below.


The Active Memory of the larger Windows VM below.


Memory resources inside the guest of the larger Windows VM below.


The Windows Memory Manager seems to handle this same task differently.  Semantics aside, when more memory is assigned to a VM, Windows appears to carve out more for this task, but seems to cap it’s ability, in favor of leaving the remaining memory space for already cached applications and data, (seen in the screen shots as “standby” and/or “free”).  This is a simple indicator that various Operating Systems handle their memory management differently, and needs to be taken into consideration when a user is observing the Active Memory metric.

Undersizing a VM’s memory can stem from many reasons, but are most likely to show up on the following types of systems.

  • Server performing multiple roles and not sized accordingly. (e.g. Front end web services with backend databases on the same system, like small SharePoint deployments)
  • VMs right sized according to the Active Memory metric.
  • SQL Servers.
  • Exchange Servers.
  • Servers running one or more Java applications.

With a SQL server, one can easily find a server where the "Active Memory" is quite low. Then, look inside the guest, and you will see utilization of memory is very high, and if the system resources were assigned pretty conservatively, will act sluggish.


Now look at it inside the guest, and you will see quite high utilization.


A few steps can help this matter.

  • Use the SQL Server Monitoring Tools in Perfmon to better understand the problem. Be warned that you may have to invest significant time in this in order to get the scaling right, interpret, and validate the data correctly. Don’t rely solely on one metric to determine the state. For instance, the "SQL Server Buffer Manager: Buffer Cache Hit Ratio" is supposed to indicate insufficient memory for SQL if the ratio is a low number. However, I’ve seen memory starved systems still show this as a high value.
  • Change SQL’s default configuration for managing memory. The default setting will let SQL absorb all of the memory, and leave little for the rest of the OS or the apps Set it to a fixed number below the amount assigned to the system. For example, if one had a 12GB SQL server, assign 6GB as the maximum server memory. This will allow for sufficient resources for the server OS an any other applications that run on the system.
  • Document performance monitoring results, then increase the memory assigned to your VM. Then follow up with more performance monitoring to see any measurable results. One could simply increase the memory assigned and forget the other steps, but you’ll be relying completely on anecdotal observations to determine improvement.

Exchange is beginning to act more like SQL with each major release. Much like SQL, Exchange is now quite aggressive in its use of caching. It’s one of the reasons by the dramatic reductions in storage I/O demands over the last three major releases of Exchange. Also like SQL, having plenty of memory assigned will help compensate for slow backend storage.  Starving the system of memory will create wildly unpredictable results, as it never has an opportunity to cache what it should.

Java will use its own memory manager. Java will need available memory space in each VM for each and every JVM running. Ultimately, the JVM applications will work best when a memory reservation is at minimum, set to the sum of all JVMs running on that VM . Be mindful of the implications that memory reservations can bring to the table. You can gain more insight as to the needs of Java inside the guest, by using various tools.

Other observations from a Production environment
A few other notes worth mentioning

1.  Sometimes guest OS paging is monitored as an indicator of not enough memory. However, not all memory inside a guest OS will page when under pressure. If the applications or OS have pinned the memory, so you won’t see memory paging coming from them. One can be starving the app for memory, but it does not show via guest OS paging.

2.  VMs with larger vCPU counts need a relative increase in memory assigned to the VM. I’ve have seen this in my environment, where a VM with a high vCPU count is under tremendous load, that not having enough memory will hinder performance. Simply put, more CPU cycles needs more memory addresses to work with.

3.  Server memory might not be cheap, but neither is storage, and even fast storage is several orders of magnitude slower than memory. The performance gain of assigning more memory to specific VMs (assuming your hosts/cluster can support it) can be immediate, and dramatic. No need to induce unnecessary paging if unnecessary.

4.  Assigning more memory to a VM running a poorly designed or inefficient application will likely not help the application, and be a waste of resources. An application may be storage I/O heavy, no matter how much memory you assign it (think Exchange 2003).

One of my first and favorite VMworld breakout sessions I attended in 2010 was "Understanding Virtualization Memory Management Concepts" (TA7750 still found online) presented by Kit Colbert. Kit is now the CTO of End User Computing at VMware, but the sessions can still be found online. I recall sitting in that session, and within the first 5 minutes deciding that: 1.) I knew nothing about memory, especially with a Hypervisor, and 2.) The deep dive was so good, and the content so verbose, that any attempt at taking notes was pointless. I made it a point to attend this session each year that he presented it, as it represents the very best of what VMworld has to offer. Do yourself a favor and watch one of his sessions.

Memory can and will be measured differently by Hypervisors and Guest OSs. The definitions of terms related to memory may be different by the application, the guest OS, and the hypervisor. Understanding your workloads, and the characteristics of the platforms it uses will help you better size your VMs for the balance between optimal performance with a minimal footprint. Monitoring memory in a useful way can also be a time consuming, difficult task that extends well beyond just a simple metric.

Have fun

– Pete

Helpful links
Understanding vSphere Active Memory

Kit Colbert’s 2011 VMworld breakout session – Understanding Virtualized Memory Performance Management  

Monitor Memory Usage in SQL Server

SQL Server on VMware Best Practices guide

VMware KB 1687: Excessive Page Faults Generated by Windows applications

A vSphere & memory related post would not be complete without mention of the venerable "vSphere Clustering Deepdive"


Get every new post delivered to your Inbox.

Join 48 other followers