Exchange 2007 on a VM, and the case of the mysterious Isapi deadlock detected error.

 

 

Are you running Exchange 2007 on a VM?  Are you experiencing  odd warning events in the event log that look something like this?

Event Type: Warning
Event Source: W3SVC-WP
Event Category: None
Event ID: 2262
Date:  
Time:  12:28:18 PM
User:  N/A
Computer: [yourserver]
Description:
ISAPI ‘c:\WINDOWS\Microsoft.NET\Framework64\v2.0.50727\aspnet_isapi.dll’ reported itself as unhealthy for the following reason: ‘Deadlock detected’.

If you’ve answered yes to these questions, you’ve most certainly looked for the fix, and found other users in the same boat.  They try and try to fix the issue with adjustments in official documentation or otherwise, with no results.

That was me.  …until I ran across this link.

So, as suggested, I added a 2nd vCPU to my exchange server (running on Windows Server 2008 x64, in a vSphere cluster), and started it up.  These specific warning messages in my event log went away completely.  Okay, after several weeks of monitoring, I may have had a couple of warnings here and there.  But that’s it.  No longer the hundreds of warnings every day.

As for the official explanation, I don’t have one.  Adding vCPU’s to fix problems is not something I want to get in the habit of, but it was an interesting problem, with an interesting solution that was worth sharing.

 

Helpful links:

Microsoft’s most closely related KB article on the issue (that didn’t fix anything for me):
http://support.microsoft.com/kb/821268

Application pool recycling:
http://technet.microsoft.com/en-us/library/cc735314(WS.10).aspx

Comparing Nehalem and Harpertown running vSphere in a production environment

 

The good press that Intel’s Nehalem chip and underlying architecture has been receiving lately gave me pretty good reason to be excited for the arrival of two Dell M610 blades based on the Nehalem chipset.  I really wanted to know how they were going to stack up against my Dell M600’s (running Harpertown chips).  So I thought I’d do some side-by-side comparisons in a real world environment.  It was also an opportunity to put some 8 vCPU VMs to the test under vSphere.

First, a little background information.  The software my company produces runs on just about every version of Windows, Linux, and Unix there is.  We have to compile and validate (exercise) those builds on every single platform.  The majority of our customers run under Windows and Linux, so the ability to virtualize our farm of Windows and Linux build machines was a compelling argument in my case for our initial investment.

Virtualizing build/compiler machines is a great way to take advantage of your virtualized infrastructure.  What seems odd to me though is that I never read about others using their infrastructure in this way.  Our build machines are critical to us.  Ironically, they’d often be running on old leftover systems.  Now that they are virtualized, we are now letting those physical machines do nothing but exercise and validate the builds.  Unfortunately, we cannot virtualize our exerciser machines because of our reliance on GPU’s from the physical machine’s video cards in our validation routines. 

Our Development Team has also invested heavily in Agile and Scrum principals.  One of the hallmarks of that is Test Driven Development (TDD).    Short development cycles, and the ability for each developer to compile and test their changes allows for more aggressive programming, producing more dramatic results.

How does this relate?  Our Developers need build machines that are as fast as possible.  Unlike so many other applications, their compilers actually can use every processor you give them (some better than others, as you will see).  This meant that many Developer machines were being over spec’d, because we’d use them as a build machine as well the Developer’s primary workstation.  This worked, but you could imagine the disruption that occurs when a Developer’s machine was scheduled to be upgraded or modified in any way. (read:  angry Developer gets territorial over their system, even though YOU are the IT guy).    Plus, we typically spend more for desktop workstations than necessary because of the needed horsepower for these systems performing dual roles.

Two recent advancements have allowed me to deliver on my promises to leverage our virtualized infrastructure for our build processes.  vSphere’s improved co-scheduler (along with support for 8 vCPUs), and Intel’s Nehalem chip.  Let’s see how the improvements pan out.

Hardware tested

  • Dell PowerEdge M600 (Harpertown).  Dual chip, quad core Intel E5430 (2.66 Ghz).  32GB RAM
  • Dell PowerEdge M610 (Nehalem).  Dual chip, quad core Intel x5550.  (2.66 Ghz). 32GB RAM

 

Software & VM’s and applications tested

  • vSphere Enterprise Plus 4.0 Update 1
  • VM:  Windows XP x64.  2GB RAM.  4 vCPUs.  Visual Studio 2005*
  • VM:  Windows XP x64.  2GB RAM.  8 vCPUs.  Visual Studio 2005*
  • VM:  Ubuntu 8.04 x64.  2GB RAM.  4 vCPUs.  Cmake
  • VM:  Ubuntu 8.04 x64.  4GB RAM**.  8 vCPUs.  Cmake

*I wanted to test Windows 7 and Visual Studio 2008, which is said to be better at multithreading, but ran out of time.

** 8vCPU Linux VM was bumped up to 4GB of RAM to eliminate some swapping errors I was seeing, but it never used more than about 2.5 GB during the build run.

 

Testing scenarios

My goals for testing were pretty straight forward

  • Compare how VMs executing full builds, running on hosts with Harpertown chips compared to the same VMs running on hosts with Nehalem chips
  • Compare performance of builds when I changed the number of vCPU’s assigned to a VM.
  • Observe how well each compiler on each platform handled multithreading

I limited observations to full build runs, as incremental builds don’t lend well to using multiple threads. 

I admit that my testing methods were far from perfect.  I wish I could have sampled more data to come up with more solid numbers, but these were production build systems, and the situation dictated that I not interfere too much with our build processes just for my own observations.  My focus is mostly on CPU performance in real world scenarios.  I monitored other resources such as disk I/O and memory just to make sure they were not inadvertently affecting the results beyond my real world allowances.

The numbers

Each test run shows two graphs.  The Line graph shows total CPU utilization as a percentage, that is available to the VM.  The stacked line graph shows the number of CPU cycles in Mhz used by the given vCPU. 

Each testing scenario shows the time in minutes to complete.

  Windows XP64 Linux x64
2 vCPU Nehalem 41 N/A
4 vCPU Harpertown 32 38
4 vCPU Nehalem 27 32
8 vCPU Nehalem 32 8.5

 

VM #1  WinXP64.  4 vCPU.  2GB RAM.  Visual Studio 2005.
HarperTown chipset (E5430)
Full build:  33 minutes

 01-tpb004-4vcpu-m600-cpu

 02-tpb004-4vcpu-m600-cpustacked

VM #2 WinXP64.  4 vCPU.  2GB RAM.  Visual Studio 2005
Nehalem chipset (x5550)
Full build:  27 minutes

01-tpb004-4vcpu-m610-cpu

 02-tpb004-4vcpu-m610-cpustacked

VM #3 WinXP64.  8 vCPU.  2GB RAM.  Visual Studio 2005.
Nehalem chipset (x5550)
Full build:  32 minutes

01-tpb004-8vcpu-m610-cpu

02-tpb004-8vcpu-m610-cpustacked

VM #4 WinXP64.  2 vCPU.  2GB RAM.  Visual Studio 2005.
Nehalem chipset (x5550)
Full build:  41 minutes

01-tpb004-2vcpu-m610-cpu

 02-tpb004-2vcpu-m610-cpustacked

VM #5 Ubuntu 8.04 x64.  4 vCPU.  2GB RAM.  Cmake.
HarperTown chipset (E5430)
Full build:  38 minutes

(no graphs available.  My dog ate ‘em.)

 

VM #6 Ubuntu 8.04 x64.  4 vCPU.  2GB RAM.  Cmake.
Nehalem chipset (x5550)
Full build:  32 minutes

01-tpb002-4vcpu-m610-cpu

02-tpb002-4vcpu-m610-cpustacked

VM #7 Ubuntu 8.04 x64.  8 vCPU.  4GB RAM.  Cmake.
Nehalem chipset (x5550)
Full build:  8.5 minutes  (note:  disregard first blip of data on chart)

01-tpb002-8vcpu-m610-cpu

02-tpb002-8vcpu-m610-cpustacked

Notice the  tremendous multithreading performance of build process under Ubuntu 8.10 (x64)!!!  Remarkably even for each vCPU and thread, which is best observed on the stacked graph charts, where the higher that it is stacked, the better it is using all vCPUs available.  Windows and it’s compiler were not nearly as good, actually becoming less efficient when I moved from 4 vCPUs to 8 vCPUs.  The build times reflect this.

A few other things I noticed along the way…

Unlike the old E5430 hosts, hyper threading is possible on the x5550 hosts, and according to VMWare’s documentation, is recommended.  Whether it actually improves performance is subject to some debate, as found here.

If you want to VMotion VMs between your x5550 and your E5430 based hosts, you will need to turn on EVC mode in VCenter.  You can do this in the cluster settings section of VCenter.  According to Intel and VMware, you won’t be dumbing down or hurting the performance of your new hosts. 

My Dell M610 blades (Nehalem) had the Virtualization Technology toggle turned off in the BIOS.  This was the same as my M600’s (Harpertown).  Why this is the default is beyond me, especially on a blade.  Remember to set that before you even start installing vSphere.

For windows VM’s, remember that the desktop OS’ are limited to what it sees as two physical sockets.  By default, it relates one core on the ESX host as one processor in one socket.  To utilze more than just 2 vCPUs on those VMs, set the “cpuid.corespersocket” setting in the settings of the VM.  More details can be found here.

Conclusion
I’ve observed nice performance gains using the hosts with the Nehalem chips.  15 to 20% from my small samples.  However, my very crude testing has not revealed improvements as noted in various posts suggesting that a single vCPU VM running on a Nehalem chips would be nearly equal to that of a 2 vCPU VM on a Harpertown chip (see here).  This is not to say that it can’t happen.  I just haven’t seen that yet.

I was impressed how well and even the multithreading abilities of the compilers running on a Linux VM are, versus the Windows counterpart.  So were the Developers, who saw the 8.5 minute build time as good or better than any physical system we have in the office.  But make no mistake, if you are running a VM with 8 vCPU’s on a host with 8 cores, and it’s able to use all of those 8 vCPU’s, you won’t be getting something for nothing.  Your ESX host will be nearly pegged for those times its running full tilt, and other VMs will suffer.  This was the reason behind our purchase of additional blades.

Resource allocation for Virtual Machines

Ever since I started transitioning our production systems to my ESX cluster, I’ve been fascinated how visible resource utilization has become.  Or to put it another way, how blind I was before.  I’ve also been interested to hear about the natural tendency of many Administrator’s to over allocate resources to their VM’s.  Why does it happen?  Who’s at fault?  From my humble perspective, it’s a party everyone has shown up to.

  • Developers & Technical writers
  • IT Administrators
  • Software Manufacturers
  • Politics

Developers & Technical Writers
Best practices and installation guides are usually written by Technical Writers for that Software Manufacturer.  They are provided information by whom else?  The Developers.  Someone on the Development team will take a look-see at their Task Manager or htop in Linux, maybe PerfMon if they have extra time.  They determine (with a less than thorough vetting process) what the requirement should be, and then pass it off to the Technical Writer.  Does the app really need two CPU’s or does that just indicate it’s capable of multithreading?  Or both?  …Or none of the above?  Developers are the group that seems to be most challenged at understanding the new paradigm of virtualization, yet are the one’s to decide what the requirements are.  Some barely know what it is, or dismiss it as nothing more than a cute toy that won’t work for their needs.  It’s pretty fun to show them otherwise, but frustrating to see their continued suspicions of the technology.

IT Administrators (yep, me included)
Take a look at any installation guide for your favorite (or least favorite) application or OS.  Resource minimums are still written for hardware based provisioning.   Most best practice guides outline memory and CPU requirements within the first few pages.  Going against recommendations on page 2 generally isn’t good IT karma.  It feels as counterintuitive as trying to breathe with your head under water.  Only through experience have I grown more comfortable with the practice.  It’s still tough though.

Software Manufacturers
Virtualization can be a sensitive matter to Software Manufactures.  Some would prefer that it doesn’t exist, and choose to document it and license it in that way.  Others will insist that resources are resources, and why would they ever recommend their server application can run with just 768MB of RAM and a single CPU core if there was even a remote possibility of it hurting performance.

Politics
Let’s face it.  How much is Microsoft going to dive into memory recommendations for an Exchange Server when their own virtualization solution does not support some of the advanced memory handling features that VMWare supports?  The answer is, they aren’t.  It’s too bad, because their products run so well in a virtual environment.  Politics can also come from within.  IT departments get coerced by management, project teams or departments, or are just worried about SLA’s of critical services.  They acquiesce to try to keep everyone happy.

What can be done about it.
Rehab for everyone involved.  Too ambitious?  Okay, let’s just try to improve Installation/Best Practices guides from the Software Manufactures.

  • Start with two or three sets of minimums for requirements.  Provisioning the application or OS on a physical machine, followed by provisioning on a VM accommodating a few different hypervisors.
  • Clearly state if the application is even capable of multithreading.  That would eliminate some confusion on whether you even want to consider two or more vCPU’s  on a VM.  I suspect many red faces would show up when software manufactures admit to their customers they haven’t designed their software to work with more than one core anyway.  But this simple step would help Administrators greatly.
  • For VM based installations, note the low threshold amount for RAM in which unnecessary amounts of disk paging will begin to occur.   While the desire is to allocate as little resources as needed, nobody wants disk thrashing to occur.
  • For physical servers, one may have a single server playing a dozen different roles.  Minimums sometimes assume this, and they will throw in a buffer to pad the minimums – just in case.  With a VM, it might be providing just a single role.  Acknowledge that this new approach exists, and adjust your requirements accordingly.

Wishful thinking perhaps, but it would be a start.  Imagine the uproar (and competition) that would occur if a software manufacturer actually spec’d a lower memory or CPU requirement when running under one hypervisor versus another?  …Now I’m really dreaming.

IT Administrators have some say in this too.  Simply put, the IT department is a service provider.  Your staff and the business model are your customers.  As a Virtualization Administrator, you have the ability to assert your expertise on provisioning systems to provide a service, and it work as efficiently as possible.  Let them define what the acceptance criteria for the need they have, and then you deal with how to make it happen.

Real World Numbers
There are many legitimate variables that make it difficult to give one size fits all recommendations on resource requirements.  This makes it difficult for those first starting out.  Rather than making suggestions, I decided I would just summarize some of my systems I have virtualized, and how the utilization rates are for a staff of about 50 people, 20 of them being Software Developers.  These are numbers pulled during business hours.  I do not want to imply that these are the best or most efficient settings.  In fact, many of them were  “first guess” settings that I plan on adjusting later.  They might offer you a point of reference for comparison, or help in your upcoming deployment.

Server/Function Avg % of RAM used Avg % of CPU used / occasional spike Comments
AD Domain (all roles) Controller, DNS, DHCP. 
Windows Server 2008 x64
1 vCPU, 2GB RAM
9% 2% / 15% So much for my DC’s working hard.  2GB is overkill for sure, and I will be adjusting all three of my DC’s RAM downward.  I thought the chattiness of DC’s was more of a burden than it really is.
Exchange 2007 Server (all roles)
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
30% 50% / 80% Consistently our most taxed VM, but pleasantly surprised by how well this runs.
Print Server, AV server 
Windows Server 2008 x64
1 vCPU, 2GB RAM
18% 3% / 10% Sitting as a separate server only because I hate having application servers running as print servers.
Source Code Control Database Server
Windows Server 2003 x64
1 vCPU, 1GB RAM
14% 2% / 40% There were fears from our Dev Team that this was going to be inferior to our physical server, and they suggested the idea of assigning 2 vCPU’s “just in case.”  I said no.  They reported a 25% performance improvement compared to the physical server.  Eventually they might figure out the ol’ IT guy knows what he’s doing.
File Server
Windows Server 2008 x64
1 vCPU, 2GB RAM
8% 4% / 20% Low impact as expected.  Definitely a candidate to reduce resources.
Sharepoint Front End Server
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
10% 15% / 30% Built up, but not fully deployed to everyone in the organization.
Sharepoint Back End/SQL Server
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
9% 15% / 50% I will be keeping a close eye on this when it ramps up to full production use.  SharePoint farms are known to be hogs.  I’ll find out soon enough.
SQL Server for project tracking.
Windows Server 2003 x64
1 vCPU, 1.5GB RAM
12% 4% / 50% Lower than I would have thought.
Code compiling system
Windows XP x64
1 vCPU 1GB RAM
35% 5% / 100% Will spike to 100% CPU usage during compiling (20 min.).  Compilers allow for telling it how many cores to use.
Code compiling system
Ubuntu 8.10 LTS x64
1 vCPU 1GB RAM
35% 5% / 100% All Linux distros seem to naturally prepopulate more RAM than their Windows counterparts, at the benefit perhaps of doing less paging.
       

To complicate matters a bit, you might observe different behaviors on some OS’s (XP versus Vista/2008 versus Windows 7/2008R2, or SQL 2005 versus SQL 2008) in their willingness to pre populate RAM.  Give SQL 2008 4GB of RAM, and it will want to use it even if it isn’t doing much.   You might notice this when looking at relatively idle VM’s with different OS’s, where some have a smaller memory footprint than others.   At the time of this writing, none of my systems were running Windows 2008 R2, as it wasn’t supported on ESX 3.5 as I was deploying them.

Some of these numbers are a testament to ESX’s/vSphere’s superior memory management handling and CPU scheduling.  Memory ballooning, swapping, Transparent Page Sharing all contribute to pretty startling efficiency.

I have yet to virtualize my CRM, web, mail relay, and miscellaneous servers, so I do not have any good data yet for these types of systems.  Having just upgraded to vSphere in the last few days, this also clears the way for me to assign multiple vCPU’s to the code compiling machines (as many as 20 VM’s).  The compilers have switches that can toggle exactly how many cores end up being used, and our Development Team needs these builds compiled as fast as possible.  That will be a topic for another post.

Lessons Learned
I love having systems isolated to performing their intended function now.  Who wants Peachtree crashing their Email server anyway?  Those administrators working in the trenches know that a server that is serving up a single role is easy to manage, way more stable, and doesn’t cross contaminate other services.  In a virtual environment, it’s  worth any additional costs in OS licensing or overhead.

When the transition to virtualizing our infrastructure began, I thought our needs, and our circumstances would be different than they’ve proven to be.  Others claim extraordinary consolidation ratios with virtualization.  I believed we’d see huge improvements, but those numbers wouldn’t possibly apply to us, because (chests puffed out) we needed real power.  Well, I was wrong, and so far, we really are like everyone else.

Helpful links

Exchange 2007… Better late than never

 

Count me in as one of the many Administrators that finally got around to moving from Exchange 2003 to Exchange 2007.   Yes, it’s almost 2010, and probably just a few months away from the release of Exchange 2010.  I never disagreed with Microsoft’s decision to release Exchange 2007 as a 64 bit only application.  It’s just that it made it incredibly difficult to find a way to transition when your infrastructure is full of physical 32 bit servers.  I was constantly reminded that my Exchange 2003 server was brittle, a resource hog, and lacked many of the abilities that the end users and applications were demanding.  Twenty-five minute reboots, and inexplicable behaviors did not instill confidence.    It was time to make the move.

My timing couldn’t have been better.  I have a new virtualized infrastructure powered by some Dell blades, running VMWare ESX 3.5 using a Dell/ Equallogic PS5000 SAN to help with this transition.  It didn’t take away from the normal shrewd planning necessary on a project of this nature.  It just made it easier.  The other benefit to deploying 3 year old software is that issues, workarounds, and fixes to Exchange, as well as how to make it play nicely with other components (ISA, CRM, etc.) were well documented. 

With respect to running Exchange 2007 in a virtual environment, I noticed information on sizing is difficult to find.  Capacity planning guidelines still reflected deployments with physical servers (read: more is better), and white papers on virtualizing Exchange appeared to have the intended audience of large enterprise environments only.  My environment is relatively small, and a single server was going to be handling all Exchange roles.  Based on my environment of about 50 users and 100 mailboxes, I sifted through all the material I could find, threw in a few wild guesses, and decided with the following configuration.

  • 1 vCPU
  • 2.5GB of RAM
  • Primary OS resides in VMFS volume on the SAN
  • Using guest OS based iSCSI initiator with MPIO enabled, dedicated NTFS volume for Exchange database. 
  • Using guest OS based iSCSI initiator with MPIO enabled, dedicated NTFS volume for Exchange transaction logs.  

The transition occurred over one weekend, while the remainder of the outstanding issues were cleaned up throughout the week.  A moment of embarrassment occurred when I realized my thorough planning never took into consideration that the “move mailbox” function would hit the transaction logs so much.   I didn’t see it until it was too late.  The partition for the transaction logs filled up and the services shut down.  But, thanks to the ease of handling storage with the SAN, I was able to create a new LUN, initialize it in the OS, and change a couple of drive letters.  Ran a backup to commit the transaction logs to the database, and I was back in business.  The only puzzle that took me far too long to figure out was getting the AutoDiscover function in Exchange 2007 to work as intended.  It is worthy of it’s own post at another time.

As far as performance goes, I couldn’t be happier with it running as a VM.  I truly didn’t know how it would react under the settings I established, and was ready to make whatever changes necessary, but it has been performing very well.  Below are some very simple utilization numbers to give you an idea.

CPU:  Hovers around 50% utilization during the day, and around 15% utilization during off hours.  CPU Ready values range anywhere from 20 to 200 milliseconds.  This VM has just 1 vCPU assigned to it.  I wanted to keep it this way if possible, so that it could use the “Fault Tolerant” feature of vSphere when we upgrade.  Looks like I get my wish.

RAM:  Typically runs about 50% of the 2.5GB assigned.  No higher than 75% utilization during the busiest time of the day, and 15% utilization during off hours

Network:  Runs about 1.2 MBps.  Spikes of 40 MBps occur only because of some on-host backups occasionally occurring.  Bandwidth utilization is imperceptible during off hours.

Disk I/O:  Hard to gauge, but mostly because the need seems to be so low.  The OS partition coming from the VMFS volume might show 200KBps, but has bounced up to 7MBps on occasion.  No performance data yet on the drives connected via the guest iSCSI initiator.  I think they qualify as “fast” until I find anything that suggests otherwise.

Perception:  You won’t find this on any ESX or OS performance monitor.  But it’s importance cannot be underestimated.  I had about half of the staff comment about how much snappier their Outlook clients and OWA was working for them.  I never have staff stop by and offer random comments telling me how fast something is.  It’s a nice compliment to the improvements of Exchange 2007, and what it is running on.

No more painfully long restart times for me now.  It’s one of the more overlooked benefits I’ve noticed while moving my systems over to my new infrastructure.  Planned server restarts are a part of responsible management of IT systems.  Anything that can be done to reduce any interruption is appreciated.  A two minute restart is always welcome

It probably goes without saying that I took great joy decommissioning the old server.  Hopefully this will be me in a good position when Exchange 2010 comes off the presses.

Virtualization. Making it happen

 

It’s difficult to put into words how exciting, and how overwhelming the idea of moving to a virtualized infrastructure was for me.  In 12 months, I went from investigating solutions, to presenting our options to our senior management, onto the procurement process, followed by the design and implementation of the systems.  And finally, making the transition of our physical machines to a virtualized environment.

It has been an incredible amount of work, but equally as satisfying.  The pressure to produce results was even bigger than the investment itself.  With this particular project, I’ve taken away a few lessons I learned along the way, some of which had nothing to do with virtualization.  Rather than providing endless technical details on this post, I thought I’d share what I learned that has nothing to do with vswitches or CPU Utilization.

1.  The sell.  I never would have been able to achieve what I achieved without the  support of our Management Team.  I’m an IT guy, and do not have a gift of crafty PowerPoint slides, and fluid presentation skills.  But there was one slide that hit it out of the park for me.   It showed how much this crazy idea was going to cost, but more importantly, how it compared against what we were going to spend anyway under a traditional environment.  We had delayed server refreshing for a few years, and it was catching up to us.  Without even factoring in the projected growth of the company, the two lines intersected in less than one year.  I’m sure the dozen other slides helped support my proposal, but this one offered the clarity needed to get approval.

2.  Let go.  I tend to be self-reliant  and made a habit of leaning on my own skills to get things done.  At a smaller company, you get used to that.  Time simply didn’t allow for that approach to be taken on this project.  I needed help, and fast.  I felt very fortunate to establish a great working relationship with Mosaic Technologies.  They provided resources to me that gave me the knowledge I needed to make good purchasing decisions, then assisted with the high level design.  I had access to a few of the most knowledgeable folks in the industry to help me move forward on the project, minimizing the floundering on my part.  They also helped me with sorting out what could be done, versus real-world recommendations on deployment practices.  It didn’t excuse me from the learning that needed to occur, and making it happen, but rather, helped speed up the process, and apply a virtualization solution to our environment correctly.  There is no way I would have been able to do it in the time frame required without them.

3.  Ditch the notebook.  Consider the way you assemble what you’re learning.  I’ve never needed to gather as much information on a project as this.  I hated not knowing what I didn’t know. (take that Yogi Berra)  I was pouring through books, white papers, and blogs to give myself a crash course on a number of different subjects – all at the same time because they needed to work together.  Because of the enormity of the project, I decided from the outset that I needed to try something different.  This was the first project where I abandoned scratchpads and binders, highlighters (mostly) and printouts.  I documented ALL of my information in Microsoft OneNote.  This was a huge success, which I will describe more in another post.

4.  Tune into RSS feeds.  Virtualization was a great example of a topic that many smart people dedicate their entire focus towards, then are kind enough to post this information on their blogs.  Having feeds come right to your browser is the most efficient way to keep up on the content.  Every day I’d see my listing of feeds for a few dozen or so VMware related blogs I was keeping track of.  It was uncanny how timely, and how applicable some of the information posted was.  Not every bit of information could be unconditionally trusted, but hey, it’s the Internet.

5.  Understand the architecture.  Looking back, I spent an inordinate amount of time in the design phase.  Much of this was trying to fully understand what was being recommended to me by my resources at Mosaic, as well as other material, and how that compared to other environments.  At times, grass grew faster than I was moving on the project at the time, (exacerbated by other projects getting in the way) but I don’t regret my stubbornness to understand what was I was trying to absorb before moving forward.  We now have a scalable, robust system that helps avoid some of the common mistakes I see occur on user forums.

6.  Don’t be a renegade.  Learn from those who really know what they are doing, and choose proven technologies, while recognizing trends in the fast-moving virtualization industry.  For me there was a higher up front cost to this approach, but time didn’t allow for any experimentation.  It helped me settle on VMware ESX powered by Dell blades, running on a Dell/EqualLogic iSCSI SAN.  That is not a suggestion that a different, or lesser configuration will not work, but for me, it helped expedite my deployment.

7.  Just because you are a small shop, doesn’t mean you don’t have to think big.  Much of my design considerations surrounded planning for the future.  How the system could scale and change, and how to minimize the headaches with those changes.  I wanted my VLAN’s arranged logically, and address boundaries configured in a way that would make sense for growth.  For a company of about 50 employees/120 systems, I never had to deal with this very much.  Thanks to another good friend of mine whom I’d been corresponding with on a project a few months prior, I was able to get things started on the right foot.  I’ll tell you more about this in a later post.

The results of the project have exceeded my expectations.  It’s working even better than I anticipated, and has already proven it’s value when I had a hardware failure occur.  We’ve migrated over 20 of our production systems to the new environment, and will have about 20 more up online within about 6 months.  There is a tremendous amount of work yet to be completed, but the benefits are paying for themselves already.