Comparing Nehalem and Harpertown running vSphere in a production environment

 

The good press that Intel’s Nehalem chip and underlying architecture has been receiving lately gave me pretty good reason to be excited for the arrival of two Dell M610 blades based on the Nehalem chipset.  I really wanted to know how they were going to stack up against my Dell M600’s (running Harpertown chips).  So I thought I’d do some side-by-side comparisons in a real world environment.  It was also an opportunity to put some 8 vCPU VMs to the test under vSphere.

First, a little background information.  The software my company produces runs on just about every version of Windows, Linux, and Unix there is.  We have to compile and validate (exercise) those builds on every single platform.  The majority of our customers run under Windows and Linux, so the ability to virtualize our farm of Windows and Linux build machines was a compelling argument in my case for our initial investment.

Virtualizing build/compiler machines is a great way to take advantage of your virtualized infrastructure.  What seems odd to me though is that I never read about others using their infrastructure in this way.  Our build machines are critical to us.  Ironically, they’d often be running on old leftover systems.  Now that they are virtualized, we are now letting those physical machines do nothing but exercise and validate the builds.  Unfortunately, we cannot virtualize our exerciser machines because of our reliance on GPU’s from the physical machine’s video cards in our validation routines. 

Our Development Team has also invested heavily in Agile and Scrum principals.  One of the hallmarks of that is Test Driven Development (TDD).    Short development cycles, and the ability for each developer to compile and test their changes allows for more aggressive programming, producing more dramatic results.

How does this relate?  Our Developers need build machines that are as fast as possible.  Unlike so many other applications, their compilers actually can use every processor you give them (some better than others, as you will see).  This meant that many Developer machines were being over spec’d, because we’d use them as a build machine as well the Developer’s primary workstation.  This worked, but you could imagine the disruption that occurs when a Developer’s machine was scheduled to be upgraded or modified in any way. (read:  angry Developer gets territorial over their system, even though YOU are the IT guy).    Plus, we typically spend more for desktop workstations than necessary because of the needed horsepower for these systems performing dual roles.

Two recent advancements have allowed me to deliver on my promises to leverage our virtualized infrastructure for our build processes.  vSphere’s improved co-scheduler (along with support for 8 vCPUs), and Intel’s Nehalem chip.  Let’s see how the improvements pan out.

Hardware tested

  • Dell PowerEdge M600 (Harpertown).  Dual chip, quad core Intel E5430 (2.66 Ghz).  32GB RAM
  • Dell PowerEdge M610 (Nehalem).  Dual chip, quad core Intel x5550.  (2.66 Ghz). 32GB RAM

 

Software & VM’s and applications tested

  • vSphere Enterprise Plus 4.0 Update 1
  • VM:  Windows XP x64.  2GB RAM.  4 vCPUs.  Visual Studio 2005*
  • VM:  Windows XP x64.  2GB RAM.  8 vCPUs.  Visual Studio 2005*
  • VM:  Ubuntu 8.04 x64.  2GB RAM.  4 vCPUs.  Cmake
  • VM:  Ubuntu 8.04 x64.  4GB RAM**.  8 vCPUs.  Cmake

*I wanted to test Windows 7 and Visual Studio 2008, which is said to be better at multithreading, but ran out of time.

** 8vCPU Linux VM was bumped up to 4GB of RAM to eliminate some swapping errors I was seeing, but it never used more than about 2.5 GB during the build run.

 

Testing scenarios

My goals for testing were pretty straight forward

  • Compare how VMs executing full builds, running on hosts with Harpertown chips compared to the same VMs running on hosts with Nehalem chips
  • Compare performance of builds when I changed the number of vCPU’s assigned to a VM.
  • Observe how well each compiler on each platform handled multithreading

I limited observations to full build runs, as incremental builds don’t lend well to using multiple threads. 

I admit that my testing methods were far from perfect.  I wish I could have sampled more data to come up with more solid numbers, but these were production build systems, and the situation dictated that I not interfere too much with our build processes just for my own observations.  My focus is mostly on CPU performance in real world scenarios.  I monitored other resources such as disk I/O and memory just to make sure they were not inadvertently affecting the results beyond my real world allowances.

The numbers

Each test run shows two graphs.  The Line graph shows total CPU utilization as a percentage, that is available to the VM.  The stacked line graph shows the number of CPU cycles in Mhz used by the given vCPU. 

Each testing scenario shows the time in minutes to complete.

  Windows XP64 Linux x64
2 vCPU Nehalem 41 N/A
4 vCPU Harpertown 32 38
4 vCPU Nehalem 27 32
8 vCPU Nehalem 32 8.5

 

VM #1  WinXP64.  4 vCPU.  2GB RAM.  Visual Studio 2005.
HarperTown chipset (E5430)
Full build:  33 minutes

 01-tpb004-4vcpu-m600-cpu

 02-tpb004-4vcpu-m600-cpustacked

VM #2 WinXP64.  4 vCPU.  2GB RAM.  Visual Studio 2005
Nehalem chipset (x5550)
Full build:  27 minutes

01-tpb004-4vcpu-m610-cpu

 02-tpb004-4vcpu-m610-cpustacked

VM #3 WinXP64.  8 vCPU.  2GB RAM.  Visual Studio 2005.
Nehalem chipset (x5550)
Full build:  32 minutes

01-tpb004-8vcpu-m610-cpu

02-tpb004-8vcpu-m610-cpustacked

VM #4 WinXP64.  2 vCPU.  2GB RAM.  Visual Studio 2005.
Nehalem chipset (x5550)
Full build:  41 minutes

01-tpb004-2vcpu-m610-cpu

 02-tpb004-2vcpu-m610-cpustacked

VM #5 Ubuntu 8.04 x64.  4 vCPU.  2GB RAM.  Cmake.
HarperTown chipset (E5430)
Full build:  38 minutes

(no graphs available.  My dog ate ‘em.)

 

VM #6 Ubuntu 8.04 x64.  4 vCPU.  2GB RAM.  Cmake.
Nehalem chipset (x5550)
Full build:  32 minutes

01-tpb002-4vcpu-m610-cpu

02-tpb002-4vcpu-m610-cpustacked

VM #7 Ubuntu 8.04 x64.  8 vCPU.  4GB RAM.  Cmake.
Nehalem chipset (x5550)
Full build:  8.5 minutes  (note:  disregard first blip of data on chart)

01-tpb002-8vcpu-m610-cpu

02-tpb002-8vcpu-m610-cpustacked

Notice the  tremendous multithreading performance of build process under Ubuntu 8.10 (x64)!!!  Remarkably even for each vCPU and thread, which is best observed on the stacked graph charts, where the higher that it is stacked, the better it is using all vCPUs available.  Windows and it’s compiler were not nearly as good, actually becoming less efficient when I moved from 4 vCPUs to 8 vCPUs.  The build times reflect this.

A few other things I noticed along the way…

Unlike the old E5430 hosts, hyper threading is possible on the x5550 hosts, and according to VMWare’s documentation, is recommended.  Whether it actually improves performance is subject to some debate, as found here.

If you want to VMotion VMs between your x5550 and your E5430 based hosts, you will need to turn on EVC mode in VCenter.  You can do this in the cluster settings section of VCenter.  According to Intel and VMware, you won’t be dumbing down or hurting the performance of your new hosts. 

My Dell M610 blades (Nehalem) had the Virtualization Technology toggle turned off in the BIOS.  This was the same as my M600’s (Harpertown).  Why this is the default is beyond me, especially on a blade.  Remember to set that before you even start installing vSphere.

For windows VM’s, remember that the desktop OS’ are limited to what it sees as two physical sockets.  By default, it relates one core on the ESX host as one processor in one socket.  To utilze more than just 2 vCPUs on those VMs, set the “cpuid.corespersocket” setting in the settings of the VM.  More details can be found here.

Conclusion
I’ve observed nice performance gains using the hosts with the Nehalem chips.  15 to 20% from my small samples.  However, my very crude testing has not revealed improvements as noted in various posts suggesting that a single vCPU VM running on a Nehalem chips would be nearly equal to that of a 2 vCPU VM on a Harpertown chip (see here).  This is not to say that it can’t happen.  I just haven’t seen that yet.

I was impressed how well and even the multithreading abilities of the compilers running on a Linux VM are, versus the Windows counterpart.  So were the Developers, who saw the 8.5 minute build time as good or better than any physical system we have in the office.  But make no mistake, if you are running a VM with 8 vCPU’s on a host with 8 cores, and it’s able to use all of those 8 vCPU’s, you won’t be getting something for nothing.  Your ESX host will be nearly pegged for those times its running full tilt, and other VMs will suffer.  This was the reason behind our purchase of additional blades.

2 thoughts on “Comparing Nehalem and Harpertown running vSphere in a production environment”

  1. Very interesting! I was going along and then holy cow 8.5 minutes on the linux box! When you get a chance you will have to see how those 2008 boxes do! Also, this goes back to the fundamental is more virtual processors better. People always want what their previous physical box had, and its hard to explain when less can be more…. sometimes 🙂

    1. Pretty interesting, huh. Yes, this utilization of all cores is almost more of the exception, than the rule. I’ve never seen that efficiency before. For our source code control server (Subversion), I know that it absolutely runs at its fastest with just 1 vCPU.

      Thanks for the feedback.

Leave a comment