Using OneNote in IT

 

It’s hard to believe that as an IT administrator, one of my favorite applications I use is one of the least technical.  Microsoft created an absolutely stellar application when they created OneNote.  If you haven’t used it, you should.

Most IT Administrators have high expectations of themselves.  Somehow we expect to remember pretty much everything.  Deployment planning, research, application specific installation steps and issues.  Information gathering for troubleshooting, and documenting as-built installations.  You might have information that you work with every day, and think “how could I ever forget that?” (you will), along with that obscure, required setting on your old phone system that hasn’t been looked at in years.

The problem is that nobody can remember everything. 

After years of using my share of spiral binders, backs of print outs, and Post-It notes to gather and manage systems and technologies, I’ve realized a few things.  1.)  I can’t read my own writing.  2.)  I never wrote enough down for the information to be valuable.  3.)  What I can’t fit on one physical page, I squeeze in on another page that makes no sense at all.  4.)  The more I have to do, the more I tried (and failed) to figure out a way to file it.  5.)  These notes eventually became meaningless, even though I knew I kept them for a reason.  I just couldn’t remember why.

Do you want to make a huge change in how you work?   Read on.

OneNote was first adopted by our Sales team several years ago, and while I knew what it was, I never bothered to use it for real IT projects until late in 2007, when a colleague of mine (thanks Glenn if you are reading) suggested that it was working well for him and his IT needs.  Ever since then, I wonder how I ever worked without it.

If you aren’t familiar with OneNote, there isn’t too much to understand.  It’s an electronic Notebook. 

image

It’s arranged just as you’d expect a real notebook.  The left side represents notebooks, the top area of tabs represent sections or earmarks, and the right side represents the pages in a notebook.  It’s that easy.   Just like it’s physical counterpart, it’s free-form formatting allows you to place object anywhere on a page (goodbye MS Word).

What has transpired since my experiment to use OneNote is how well it tackles every single need I have in information gathering and mining of that data after the fact.  Here are some examples.

Long term projects and Research

What better time to try out a new way of working on one of the biggest projects I’ve had to tackle in years, right?  Virtualizing my infrastructure was a huge undertaking, and I had what seemed like an infinite amount of information to learn in a very short period of time, under all different types of subject matters.  In a Notebook called “Virtualization” I had sections that narrowed subject matters down to things like ESX, SAN array, Blades, switchgear, UPS, etc.  Each one of those sections had pages (at least a few dozen for the ESX section, as there was a lot to tackle) that were specific subject matters of information I needed to gather to learn about, or to keep for reference.  Links, screen captures, etc.  I dumped everything in there, including my deployment steps before, during, and after.

 

Procedures

Our Linux code compiling machines have very specific package installations and settings that need to be set before deployment.  OneNote works great for this.  The no-brainer checkboxes offer nice clarity.

image

If you maintain different flavors of Unix or various distributions of Linux, you know how much the syntax can vary.  OneNote helps keep your sanity.  With so many Windows products going the way of Powershell, you’d better have your command line syntax down for that too.

This has also worked well with backend installations.  My Installations of VMware, SharePoint, Exchange, etc. have all been documented this way.  It takes just a bit longer, but is invaluable later on.  Below is a capture of part of my cutover plan from Exchange 2003 to Exchange 2007.

image

Migrations and Post migration outstanding issues

After big migrations, you have to be on your toes to address issues that are difficult to predict.  OneNote has allowed me to use a simple ISSUE/FIX approach.  So, in an “Apps” notebook, under an “E2007 Migration” section, I might have a page called “Postfix” and it might look something like this.

image

You can label these pages “Outstanding issues” or as I did for my ESX 3.5 to vSphere migration, “Postfix” pages.

image

As-builts

Those in the Engineering/Architectural world are quite familiar with As-built drawings.  Those are drawings that reflect how things were really built.  Many times in IT, deployment plans and documentation never go further than the day you deploy it.  OneNote allows for an easy way to turn that deployment plan into a living copy, or as-built configuration of the product you just deployed.  Configurations are as dynamic as the technologies that power them.  Its best to know what sort of monster you created, and how to recreate it if you need to.

 

Daily issues (fire fighting)

Emergencies, impediments, fires, or whatever you’d like to call them, come up all the time.  I’ve found OneNote to be most helpful in two specific areas on this type of task.  I use it as a quick way to gather data on an issue that I can look at later (copying and pasting screenshot and URLs into OneNote), and for comparing the current state of a system against past configurations.  Both ways help me solve the problems more quickly.

Searching text in bitmapped screen captures

One of the really interesting things about OneNote is that you can paste a screen capture of say, a dialog box in the notebook, then when searching later for a keyword, it will include those bitmaps in the search results!!!!  Below is one of the search results OneNote pulled up when I searched for “KDC”  This was a screen capture sitting in OneNote.  Neat.

image

 

Goodbye Browser Bookmarks

How many times have you spent trying to organize your web browser bookmarks or favorites, only to never look at them again, or try to figure out why you bookmarked it?  Its an exercise in futility.  No more!  Toss them all away.  Paste those links into the various locations in OneNote (where the subject matter is applicable, and enter a brief little description on top of it, and you can always find it later when searching for it.

 

Summary

I won’t ever go without using OneNote for projects large or small again.  It is right next to my email as my most used application.  OneNote users tend to be a loyal bunch, and after a few years of using it, I can see why.  At about $80 retail, you can’t go wrong.  And, lucky for you, it will be included in all versions of Office 2010.

Additional Links

New features coming in OneNote 2010
http://blogs.msdn.com/descapa/archive/2009/07/15/overview-of-onenote-2010-what-s-new-for-you.aspx

Using OneNote with SharePoint
http://blogs.msdn.com/mcsnoiwb/archive/2008/12/03/onenote-and-sharepoint-the-basics.aspx 

Interesting tips and tricks with OneNote
http://blogs.msdn.com/onenotetips/

Comparing Nehalem and Harpertown running vSphere in a production environment

 

The good press that Intel’s Nehalem chip and underlying architecture has been receiving lately gave me pretty good reason to be excited for the arrival of two Dell M610 blades based on the Nehalem chipset.  I really wanted to know how they were going to stack up against my Dell M600’s (running Harpertown chips).  So I thought I’d do some side-by-side comparisons in a real world environment.  It was also an opportunity to put some 8 vCPU VMs to the test under vSphere.

First, a little background information.  The software my company produces runs on just about every version of Windows, Linux, and Unix there is.  We have to compile and validate (exercise) those builds on every single platform.  The majority of our customers run under Windows and Linux, so the ability to virtualize our farm of Windows and Linux build machines was a compelling argument in my case for our initial investment.

Virtualizing build/compiler machines is a great way to take advantage of your virtualized infrastructure.  What seems odd to me though is that I never read about others using their infrastructure in this way.  Our build machines are critical to us.  Ironically, they’d often be running on old leftover systems.  Now that they are virtualized, we are now letting those physical machines do nothing but exercise and validate the builds.  Unfortunately, we cannot virtualize our exerciser machines because of our reliance on GPU’s from the physical machine’s video cards in our validation routines. 

Our Development Team has also invested heavily in Agile and Scrum principals.  One of the hallmarks of that is Test Driven Development (TDD).    Short development cycles, and the ability for each developer to compile and test their changes allows for more aggressive programming, producing more dramatic results.

How does this relate?  Our Developers need build machines that are as fast as possible.  Unlike so many other applications, their compilers actually can use every processor you give them (some better than others, as you will see).  This meant that many Developer machines were being over spec’d, because we’d use them as a build machine as well the Developer’s primary workstation.  This worked, but you could imagine the disruption that occurs when a Developer’s machine was scheduled to be upgraded or modified in any way. (read:  angry Developer gets territorial over their system, even though YOU are the IT guy).    Plus, we typically spend more for desktop workstations than necessary because of the needed horsepower for these systems performing dual roles.

Two recent advancements have allowed me to deliver on my promises to leverage our virtualized infrastructure for our build processes.  vSphere’s improved co-scheduler (along with support for 8 vCPUs), and Intel’s Nehalem chip.  Let’s see how the improvements pan out.

Hardware tested

  • Dell PowerEdge M600 (Harpertown).  Dual chip, quad core Intel E5430 (2.66 Ghz).  32GB RAM
  • Dell PowerEdge M610 (Nehalem).  Dual chip, quad core Intel x5550.  (2.66 Ghz). 32GB RAM

 

Software & VM’s and applications tested

  • vSphere Enterprise Plus 4.0 Update 1
  • VM:  Windows XP x64.  2GB RAM.  4 vCPUs.  Visual Studio 2005*
  • VM:  Windows XP x64.  2GB RAM.  8 vCPUs.  Visual Studio 2005*
  • VM:  Ubuntu 8.04 x64.  2GB RAM.  4 vCPUs.  Cmake
  • VM:  Ubuntu 8.04 x64.  4GB RAM**.  8 vCPUs.  Cmake

*I wanted to test Windows 7 and Visual Studio 2008, which is said to be better at multithreading, but ran out of time.

** 8vCPU Linux VM was bumped up to 4GB of RAM to eliminate some swapping errors I was seeing, but it never used more than about 2.5 GB during the build run.

 

Testing scenarios

My goals for testing were pretty straight forward

  • Compare how VMs executing full builds, running on hosts with Harpertown chips compared to the same VMs running on hosts with Nehalem chips
  • Compare performance of builds when I changed the number of vCPU’s assigned to a VM.
  • Observe how well each compiler on each platform handled multithreading

I limited observations to full build runs, as incremental builds don’t lend well to using multiple threads. 

I admit that my testing methods were far from perfect.  I wish I could have sampled more data to come up with more solid numbers, but these were production build systems, and the situation dictated that I not interfere too much with our build processes just for my own observations.  My focus is mostly on CPU performance in real world scenarios.  I monitored other resources such as disk I/O and memory just to make sure they were not inadvertently affecting the results beyond my real world allowances.

The numbers

Each test run shows two graphs.  The Line graph shows total CPU utilization as a percentage, that is available to the VM.  The stacked line graph shows the number of CPU cycles in Mhz used by the given vCPU. 

Each testing scenario shows the time in minutes to complete.

  Windows XP64 Linux x64
2 vCPU Nehalem 41 N/A
4 vCPU Harpertown 32 38
4 vCPU Nehalem 27 32
8 vCPU Nehalem 32 8.5

 

VM #1  WinXP64.  4 vCPU.  2GB RAM.  Visual Studio 2005.
HarperTown chipset (E5430)
Full build:  33 minutes

 01-tpb004-4vcpu-m600-cpu

 02-tpb004-4vcpu-m600-cpustacked

VM #2 WinXP64.  4 vCPU.  2GB RAM.  Visual Studio 2005
Nehalem chipset (x5550)
Full build:  27 minutes

01-tpb004-4vcpu-m610-cpu

 02-tpb004-4vcpu-m610-cpustacked

VM #3 WinXP64.  8 vCPU.  2GB RAM.  Visual Studio 2005.
Nehalem chipset (x5550)
Full build:  32 minutes

01-tpb004-8vcpu-m610-cpu

02-tpb004-8vcpu-m610-cpustacked

VM #4 WinXP64.  2 vCPU.  2GB RAM.  Visual Studio 2005.
Nehalem chipset (x5550)
Full build:  41 minutes

01-tpb004-2vcpu-m610-cpu

 02-tpb004-2vcpu-m610-cpustacked

VM #5 Ubuntu 8.04 x64.  4 vCPU.  2GB RAM.  Cmake.
HarperTown chipset (E5430)
Full build:  38 minutes

(no graphs available.  My dog ate ‘em.)

 

VM #6 Ubuntu 8.04 x64.  4 vCPU.  2GB RAM.  Cmake.
Nehalem chipset (x5550)
Full build:  32 minutes

01-tpb002-4vcpu-m610-cpu

02-tpb002-4vcpu-m610-cpustacked

VM #7 Ubuntu 8.04 x64.  8 vCPU.  4GB RAM.  Cmake.
Nehalem chipset (x5550)
Full build:  8.5 minutes  (note:  disregard first blip of data on chart)

01-tpb002-8vcpu-m610-cpu

02-tpb002-8vcpu-m610-cpustacked

Notice the  tremendous multithreading performance of build process under Ubuntu 8.10 (x64)!!!  Remarkably even for each vCPU and thread, which is best observed on the stacked graph charts, where the higher that it is stacked, the better it is using all vCPUs available.  Windows and it’s compiler were not nearly as good, actually becoming less efficient when I moved from 4 vCPUs to 8 vCPUs.  The build times reflect this.

A few other things I noticed along the way…

Unlike the old E5430 hosts, hyper threading is possible on the x5550 hosts, and according to VMWare’s documentation, is recommended.  Whether it actually improves performance is subject to some debate, as found here.

If you want to VMotion VMs between your x5550 and your E5430 based hosts, you will need to turn on EVC mode in VCenter.  You can do this in the cluster settings section of VCenter.  According to Intel and VMware, you won’t be dumbing down or hurting the performance of your new hosts. 

My Dell M610 blades (Nehalem) had the Virtualization Technology toggle turned off in the BIOS.  This was the same as my M600’s (Harpertown).  Why this is the default is beyond me, especially on a blade.  Remember to set that before you even start installing vSphere.

For windows VM’s, remember that the desktop OS’ are limited to what it sees as two physical sockets.  By default, it relates one core on the ESX host as one processor in one socket.  To utilze more than just 2 vCPUs on those VMs, set the “cpuid.corespersocket” setting in the settings of the VM.  More details can be found here.

Conclusion
I’ve observed nice performance gains using the hosts with the Nehalem chips.  15 to 20% from my small samples.  However, my very crude testing has not revealed improvements as noted in various posts suggesting that a single vCPU VM running on a Nehalem chips would be nearly equal to that of a 2 vCPU VM on a Harpertown chip (see here).  This is not to say that it can’t happen.  I just haven’t seen that yet.

I was impressed how well and even the multithreading abilities of the compilers running on a Linux VM are, versus the Windows counterpart.  So were the Developers, who saw the 8.5 minute build time as good or better than any physical system we have in the office.  But make no mistake, if you are running a VM with 8 vCPU’s on a host with 8 cores, and it’s able to use all of those 8 vCPU’s, you won’t be getting something for nothing.  Your ESX host will be nearly pegged for those times its running full tilt, and other VMs will suffer.  This was the reason behind our purchase of additional blades.

Side effects of upgrading VM’s to Virtual Hardware 7 in vSphere

 

New Years day, 2010 was a day of opportunity for me.  With the office closed, I jumped on the chance of upgrading my ESX cluster from 3.5 to vSphere 4.0 U1.  Those in IT know that a weekend flanked by a holiday is your friend; offering a bit more time to resolve problems, or back out entirely.  The word on the street was the upgrade to vSphere was a smooth one, but I wanted extra time, just in case.  The plan was laid out, the prep work had been done, and I was ready to pull the trigger.  The steps fell into 4 categories.

1.  Upgrade vCenter 
2.  Upgrade ESX hosts 
3.  Upgrade VM Tools and virtual hardware 
4.  Tidy up by taking care of licensing, VCB, and confirm that everything works as expected.

Pretty straightforward, right?  Well, in fact it was.  VMware should be commended for the fine job they did in making the transition  relatively easy.  I didn’t even need the 3 day weekend to do it.  Did I overlook anything?  Of course.  In particular, I didn’t pay attention to how the Virtual Hardware 7 upgrade was going to affect my systems at the application layer.

At the heart of the matter is that when the Virtual Hardware upgrade is made, VMware will effortlessly disable and hide those old NICs, add new NICs in there, and then reassign the addressing info that were on the old cards.  It’s pretty slick, but  does cause a few problems.

  1. Static IP addressing gets transferred over correctly, but the other NIC settings do not.  Do you have all but IPv4 disabled (e.g. Client for Microsoft networks, QoS, etc.) for your iSCSI connections to your SAN?  Do you have NetBIOS over TCP/IP shut off as well?  Well, after the Hardware 7 upgrade, all of those services will be turned on.  Do you have IPv6 disabled on your LAN NIC?  (no, it’s not a commentary on my dislike of IPv6, but there are many legitimate reasons to do this).  That will be turned back on.
  2. NIC binding order will reset itself to an order you probably do not want.  These affect services in a big way, especially when you factor in side effect #1.  (Please note that none of my systems were multi-homed on the LAN side.  The additional NIC’s for each VM were simply for iSCSI based storage access using a guest iSCSI initiator.)
  3. Guest iSCSI initiator settings *may* be different.  A few of the most common reactions I saw were the “Discovery” tab had duplicate entries, and the “Volumes and Devices” tab no longer had the drive letter of the guest initiated drive.  This is necessary to have in order for some services to not jump the gun too early.
  4. Duplicate reverse DNS records.  I stumbled upon this after the update based off of some errors I was seeing.  Many mysteries can occur with orphaned, duplicate reverse DNS records.  Get rid of ‘em as soon as you see them.  It won’t hurt to check your WINs service and clear that out as well, and keep an eye on those machines with DHCP reservations.
  5. In Microsoft’s Operating Systems, their network configuration subsystem generates a Global Unique Identifier (GUID) for each NIC that is partially based on the MAC address of the NIC.  This GUID may or may not be used in applications like Exchange, CRM, Sharepoint, etc.  When the NIC changes, the GUID changes.  …and services may break.

Items 1 through 4 are pretty easy to handle – even easier when you know what’s coming.  Item #5 is a total wildcard.

What’s interesting about this is that it created the kinds of problems that in many ways are the most problematic for Administrators; where you think it’s running fine, but it’s not.  Most things work long enough to make your VM snapshots no longer relevant, if you plan on the quick fix. 

Now, in hindsight, I see that some of this was documented, as much of this type of thing comes up in P2V, and V2V conversions.  However, much of it was not.  My hope is to save someone else a little heartache.  Here is what I did after each VM was upgraded to Virtual Hardware 7.

All VM’s

Removing old NICs

  1. Open up command shell option for the "run as administrator". In Shell, type set devmgr_show_nonpresent_devices=1, then hit Enter
  2. Type start devmgmt.msc then hit Enter
  3. Click View > Show hidden devices
  4. Expand Network Adapter tree
  5. Right click grayed out NICs, and click uninstall
  6. Click View > Show hidden devices to untoggle.
  7. Exit out of application
  8. type set devmgr_show_nonpresent_devices=0, then hit Enter.

Change binding order of new NICs

  1. Right click on Network, then click Properties > Manage Network Connections
  2. Rename NICs to "LAN" "iSCSI-1" and "iSCSI-2" or whatever you wish.
  3. Change binding order to have LAN NIC at the top of the list
  4. Disable IPV6 on LAN NIC
  5. For iSCSI NIC’s, disable all but TCP/IPv4. Verify static IP (w/ no gateway or DNS servers), verify "Register this connections address in DNS" is unchecked, and Disable NetBIOS over TCP/IP

Verify/Reset iSCSI initator settings

  1. Open iSCSI initiator
  2. Verify that in the Discovery tab, just one entry is in there; x.x.x.x:3260, Default Adapter, Default IP address
  3. Verify Targets and favorite Targets tab
  4. On Volumes and Devices tab, click on "Autoconfigure" to repopulate entries (clears up mangled entries on some).
  5. Click OK, and restart machine.

 DNS and DHCP

  1. Remove duplicate reverse lookup records for systems upgraded to Virtual Hardware 7
  2. For systems that have DHCP reserved addresses, jump into your DHCP manager, and modify as needed.

Exchange 2007 Server

Exchange seemed operational at first, but the more I looked at the event logs, the more I realized I needed to do some clean up

After performing the tasks under the “All VMs” fix, most issues went away.  However, one stuck around.  Because of the GUID issue, if your Exchange Server is running the transport server role, it will flag you with Event ID: 205 errors.  It is still looking for the old GUID.  Here is what to do.

First, determine status of the NICs

[PS] C:\Windows\System32>get-networkconnectioninfo

Name : Intel(R) PRO/1000 MT Network Connection #4
DnsServers : {192.168.0.9, 192.168.0.10}
IPAddresses : {192.168.0.20}
AdapterGuid : 5ca83cae-0519-43f8-adfe-eefca0f08a04
MacAddress : 00:50:56:8B:5F:97

Name : Intel(R) PRO/1000 MT Network Connection #5
DnsServers : {}
IPAddresses : {10.10.0.193}
AdapterGuid : 6d72814a-0805-4fca-9dee-4bef87aafb70
MacAddress : 00:50:56:8B:13:3F

Name : Intel(R) PRO/1000 MT Network Connection #6
DnsServers : {}
IPAddresses : {10.10.0.194}
AdapterGuid : 564b8466-dbe2-4b15-bd15-aafcde21b23d
MacAddress : 00:50:56:8B:2C:22

Then get the transport server info

[PS] C:\Windows\System32>get-transportserver | ft -wrap Name,*DNS*

image 

Then, set the transport server info correctly

set-transportserver SERVERNAME -ExternalDNSAdapterGuid 5ca83cae-0519-43f8-adfe-eefca0f08a04

set-transportserver SERVERNAME -InternalDNSAdapterGuid 5ca83cae-0519-43f8-adfe-eefca0f08a04

 

Sharepoint Server

Thank heavens we are just in the pre-deployment stages for Sharepoint.  Our Sharepoint Consultant asked what I did to mess up the Central Administration site, as it could no longer be accessed (the other sites were fine however).  After a bizarre series of errors, I thought it would be best to restore it from snapshot and test what was going on.  The virtual hardware upgrade definitely broke the connection, but so did removing the existing NIC, and adding another one.   As of now, I can’t determine that it is in fact a problem with the NIC GUID, but it sure seems to be.   My only working solution in the time allowed was to keep the server at Hardware level 4, and build up a new Sharepoint Front End Server. 

One might question why, with the help of vCenter,  a MAC address can’t be forced on the server.   Even though one is able to get the last 12 characters of the GUID (representing the MAC address) the first part is different.  It makes sense because the new device is different.  The applications care about the GUID as a whole, not just the MAC address.

Here is how you can find the GUID of the system’s NIC in question.  Run this BEFORE you perform a virtual hardware update, and save it for future reference in case you run into problems.  Also make note of where it exists in the registry.  It’s not a solution to the issue I had with Sharepoint, but its worth knowing about.

C:\Users\administrator.DOMAINNAME>net config rdr
Computer name
\\SERVERNAME
Full Computer name SERVERNAME.domainname.lan
User name Administrator
Workstation active on
NetbiosSmb (000000000000)
NetBT_Tcpip_{56BB9E44-EA93-43C3-B7B3-88DD478E9F73} (0050568B60BE)
Software version Windows Server (R) 2008 Standard
Workstation domain DOMAINNAME
Workstation Domain DNS Name domainname.lan
Logon domain DOMAINNAME
COM Open Timeout (sec) 0
COM Send Count (byte) 16
COM Send Timeout (msec) 250

 

 In hindsight…

Let me be clear that there is really not much that VMWare can do about this.  The same troubles would occur on a physical machine if you needed to change out Network cards.  The difference is, that it’s not done as easily, as often, or as transparently as on a VM.

If I were to do it over again (and surely I will the day when VMware releases a major upgrade again), I would have done a few things different.

  1. Note all existing application errors and warnings on each server prior to the upgrade, just so I don’t try to ponder if that warning I’m starring at had existed before the upgrade.
  2. Note those GUID’s before you upgrade.  You could always capture it after restoring from a snapshot if you do run into problems, but save yourself a little time and get this down on paper ahead of time.
  3. Take the virtual hardware upgrade slowly.  After everything else went pretty smooth, I was in a “get it done” mentality.  Although the results were not  catastrophic, I could have done better minimizing the issues.
  4. Keep the snapshots around at least the length of your scheduled maintenance window.  It’s not a get-out-of-jail card, but if you have the weekend or off-hours to experiment, it offers you a good tool to do so.

This has also helped me in the decision of taking a very conservative approach to implementing the new VMXNet3 NIC driver to existing VMs.  I might simply update my templates and only deploy them on new systems, or systems that don’t run services that rely on the NIC’s GUID.

One final note.  “GUID” can be many things depending on the context, and may be referenced in different ways (UUID, SID, etc).  Not all GUID’s are NIC GUID’s.  The term can be used quite loosely in various subject matters.   What does this mean to you?  It means that it makes searching the net pretty painful at times.

Interesting links:

A simple powershell script to get the NIC GUID
http://pshscripts.blogspot.com/2008/07/get-guidps1.html

Resource allocation for Virtual Machines

Ever since I started transitioning our production systems to my ESX cluster, I’ve been fascinated how visible resource utilization has become.  Or to put it another way, how blind I was before.  I’ve also been interested to hear about the natural tendency of many Administrator’s to over allocate resources to their VM’s.  Why does it happen?  Who’s at fault?  From my humble perspective, it’s a party everyone has shown up to.

  • Developers & Technical writers
  • IT Administrators
  • Software Manufacturers
  • Politics

Developers & Technical Writers
Best practices and installation guides are usually written by Technical Writers for that Software Manufacturer.  They are provided information by whom else?  The Developers.  Someone on the Development team will take a look-see at their Task Manager or htop in Linux, maybe PerfMon if they have extra time.  They determine (with a less than thorough vetting process) what the requirement should be, and then pass it off to the Technical Writer.  Does the app really need two CPU’s or does that just indicate it’s capable of multithreading?  Or both?  …Or none of the above?  Developers are the group that seems to be most challenged at understanding the new paradigm of virtualization, yet are the one’s to decide what the requirements are.  Some barely know what it is, or dismiss it as nothing more than a cute toy that won’t work for their needs.  It’s pretty fun to show them otherwise, but frustrating to see their continued suspicions of the technology.

IT Administrators (yep, me included)
Take a look at any installation guide for your favorite (or least favorite) application or OS.  Resource minimums are still written for hardware based provisioning.   Most best practice guides outline memory and CPU requirements within the first few pages.  Going against recommendations on page 2 generally isn’t good IT karma.  It feels as counterintuitive as trying to breathe with your head under water.  Only through experience have I grown more comfortable with the practice.  It’s still tough though.

Software Manufacturers
Virtualization can be a sensitive matter to Software Manufactures.  Some would prefer that it doesn’t exist, and choose to document it and license it in that way.  Others will insist that resources are resources, and why would they ever recommend their server application can run with just 768MB of RAM and a single CPU core if there was even a remote possibility of it hurting performance.

Politics
Let’s face it.  How much is Microsoft going to dive into memory recommendations for an Exchange Server when their own virtualization solution does not support some of the advanced memory handling features that VMWare supports?  The answer is, they aren’t.  It’s too bad, because their products run so well in a virtual environment.  Politics can also come from within.  IT departments get coerced by management, project teams or departments, or are just worried about SLA’s of critical services.  They acquiesce to try to keep everyone happy.

What can be done about it.
Rehab for everyone involved.  Too ambitious?  Okay, let’s just try to improve Installation/Best Practices guides from the Software Manufactures.

  • Start with two or three sets of minimums for requirements.  Provisioning the application or OS on a physical machine, followed by provisioning on a VM accommodating a few different hypervisors.
  • Clearly state if the application is even capable of multithreading.  That would eliminate some confusion on whether you even want to consider two or more vCPU’s  on a VM.  I suspect many red faces would show up when software manufactures admit to their customers they haven’t designed their software to work with more than one core anyway.  But this simple step would help Administrators greatly.
  • For VM based installations, note the low threshold amount for RAM in which unnecessary amounts of disk paging will begin to occur.   While the desire is to allocate as little resources as needed, nobody wants disk thrashing to occur.
  • For physical servers, one may have a single server playing a dozen different roles.  Minimums sometimes assume this, and they will throw in a buffer to pad the minimums – just in case.  With a VM, it might be providing just a single role.  Acknowledge that this new approach exists, and adjust your requirements accordingly.

Wishful thinking perhaps, but it would be a start.  Imagine the uproar (and competition) that would occur if a software manufacturer actually spec’d a lower memory or CPU requirement when running under one hypervisor versus another?  …Now I’m really dreaming.

IT Administrators have some say in this too.  Simply put, the IT department is a service provider.  Your staff and the business model are your customers.  As a Virtualization Administrator, you have the ability to assert your expertise on provisioning systems to provide a service, and it work as efficiently as possible.  Let them define what the acceptance criteria for the need they have, and then you deal with how to make it happen.

Real World Numbers
There are many legitimate variables that make it difficult to give one size fits all recommendations on resource requirements.  This makes it difficult for those first starting out.  Rather than making suggestions, I decided I would just summarize some of my systems I have virtualized, and how the utilization rates are for a staff of about 50 people, 20 of them being Software Developers.  These are numbers pulled during business hours.  I do not want to imply that these are the best or most efficient settings.  In fact, many of them were  “first guess” settings that I plan on adjusting later.  They might offer you a point of reference for comparison, or help in your upcoming deployment.

Server/Function Avg % of RAM used Avg % of CPU used / occasional spike Comments
AD Domain (all roles) Controller, DNS, DHCP. 
Windows Server 2008 x64
1 vCPU, 2GB RAM
9% 2% / 15% So much for my DC’s working hard.  2GB is overkill for sure, and I will be adjusting all three of my DC’s RAM downward.  I thought the chattiness of DC’s was more of a burden than it really is.
Exchange 2007 Server (all roles)
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
30% 50% / 80% Consistently our most taxed VM, but pleasantly surprised by how well this runs.
Print Server, AV server 
Windows Server 2008 x64
1 vCPU, 2GB RAM
18% 3% / 10% Sitting as a separate server only because I hate having application servers running as print servers.
Source Code Control Database Server
Windows Server 2003 x64
1 vCPU, 1GB RAM
14% 2% / 40% There were fears from our Dev Team that this was going to be inferior to our physical server, and they suggested the idea of assigning 2 vCPU’s “just in case.”  I said no.  They reported a 25% performance improvement compared to the physical server.  Eventually they might figure out the ol’ IT guy knows what he’s doing.
File Server
Windows Server 2008 x64
1 vCPU, 2GB RAM
8% 4% / 20% Low impact as expected.  Definitely a candidate to reduce resources.
Sharepoint Front End Server
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
10% 15% / 30% Built up, but not fully deployed to everyone in the organization.
Sharepoint Back End/SQL Server
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
9% 15% / 50% I will be keeping a close eye on this when it ramps up to full production use.  SharePoint farms are known to be hogs.  I’ll find out soon enough.
SQL Server for project tracking.
Windows Server 2003 x64
1 vCPU, 1.5GB RAM
12% 4% / 50% Lower than I would have thought.
Code compiling system
Windows XP x64
1 vCPU 1GB RAM
35% 5% / 100% Will spike to 100% CPU usage during compiling (20 min.).  Compilers allow for telling it how many cores to use.
Code compiling system
Ubuntu 8.10 LTS x64
1 vCPU 1GB RAM
35% 5% / 100% All Linux distros seem to naturally prepopulate more RAM than their Windows counterparts, at the benefit perhaps of doing less paging.
       

To complicate matters a bit, you might observe different behaviors on some OS’s (XP versus Vista/2008 versus Windows 7/2008R2, or SQL 2005 versus SQL 2008) in their willingness to pre populate RAM.  Give SQL 2008 4GB of RAM, and it will want to use it even if it isn’t doing much.   You might notice this when looking at relatively idle VM’s with different OS’s, where some have a smaller memory footprint than others.   At the time of this writing, none of my systems were running Windows 2008 R2, as it wasn’t supported on ESX 3.5 as I was deploying them.

Some of these numbers are a testament to ESX’s/vSphere’s superior memory management handling and CPU scheduling.  Memory ballooning, swapping, Transparent Page Sharing all contribute to pretty startling efficiency.

I have yet to virtualize my CRM, web, mail relay, and miscellaneous servers, so I do not have any good data yet for these types of systems.  Having just upgraded to vSphere in the last few days, this also clears the way for me to assign multiple vCPU’s to the code compiling machines (as many as 20 VM’s).  The compilers have switches that can toggle exactly how many cores end up being used, and our Development Team needs these builds compiled as fast as possible.  That will be a topic for another post.

Lessons Learned
I love having systems isolated to performing their intended function now.  Who wants Peachtree crashing their Email server anyway?  Those administrators working in the trenches know that a server that is serving up a single role is easy to manage, way more stable, and doesn’t cross contaminate other services.  In a virtual environment, it’s  worth any additional costs in OS licensing or overhead.

When the transition to virtualizing our infrastructure began, I thought our needs, and our circumstances would be different than they’ve proven to be.  Others claim extraordinary consolidation ratios with virtualization.  I believed we’d see huge improvements, but those numbers wouldn’t possibly apply to us, because (chests puffed out) we needed real power.  Well, I was wrong, and so far, we really are like everyone else.

Helpful links

Discovering AutoDiscover in Exchange 2007

 

In my post “Exchange 2007… Better Later than Never” I mentioned one of the post-deployment difficulties I faced was getting the "AutoDiscover” function to behave the way it was designed.  For those unfamiliar with the feature, it allows for automated discovery and configuration of various connectivity methods to an Exchange Server.  Exchange MAPI clients, Exchange HTTP/RPC clients, and mobile devices using ActiveSync all can use AutoDiscover in some form or another.

While it wasn’t critical for the transition itself, AutoDiscover was vital for our future deployments of “Outlook Anywhere” and “ActiveSync.”  I figured skimming over a few TechNet articles and blog postings, and I’d be quickly onto the next project.  That began my long ugly journey getting AutoDiscover to work.

It became clear that the ingredients for AutoDiscover to work correctly was a properly configured ISA Server, SSL certificates, namespace/DNS accommodations, and of course, Exchange.  What was really interesting about this particular project was that I was dealing with very mature products, yet, I never ran across so much contradicting information on how to make it work.  Perhaps some of that stems from so many valid topologies and configurations, or possibly big changes between the RTM versions of Exchange and ISA and their first service packs.  Still, it seemed odd.  I sifted through postings from desperate IT Administrators in similar situations who had no more hair to pull out.  You could sense the defeat in their words.  Now I understand.

One guideline mentioned quite often was the need for a special SSL certificate that allowed for more than one FQDN to be assigned to it.  You’ll see it referred to as a Unified Communication Certificate (UCC or UC) or a Subject Alternative Name (SAN) certificate.  The purpose is the same, but the names and the references are different.  While UC certificates are not technically a requirement, it is best to think of it that way.  For AutoDiscover, the names needed on a UC cert would look something like:

mypubliccompanyname.com
autodiscover.mypubliccompanyname.com
mail.mypubliccompanyname.com
internalmailservername
internalmailservername.myprivatelanname.lan

I went with a UC cert from DigiCert, but any of the larger commercial CA’s should work.  However, a word of warning.  Exchange doesn’t like self signed certificates, and many mobile phones have troubles with private certificates as well as those from smaller commercial CAs.  You should be fine if you run Certificate Services internally (or so I’m told), and your namespace checks out okay.  Don’t forget look at your ISA server and make sure you are running SP1 or later, due to limitations on how the RTM version handled UC certificates.

Speaking of namespaces, time for a thorn in my side to come back and sting me.  My internal namespace is not a name that we own (a legacy issue I should have taken care of long ago).  Certificate Authorities will not issue standard or UC SSL certificates to names you do not own for obvious reasons, even if the references are private.  Fortunately, I was able to work around this by making absolutely sure the simple name was used in any Exchange configuration settings that usually accepted the internal FQDN.  Disaster averted.

Now for the dirt on how I was able to make it work.  My as-built  design is modeled somewhat after Jason Jones’ method of Publishing Exchange 2007 Services with ISA 2006.  Following the construct of:

  • Not using the existing listener created for OWA, and creating a separate listener for Outlook Anywhere (OA)/Autodiscover, and binding UC cert to that listener. Using HTTP authentication with Integrated/Windows Auth (aka NTLM). This would provides HTTP/Integrated auth from the client to the FW, then basic auth from the FW to the Exchange server.
  • Allowing the ISA server to utilize Kerberos constrained delegation (KCD) by way of changes in AD.
  • Creating a single Publishing rule for OA , where KCD is used.
  • Setting internal and external URL’s to their respective internal and external locations (internalmailservername and autodiscover.mycompanyname.com)

After configuring it as above, AutoDiscover worked internally, but not externally.  Continually getting failures with the /rpc directory when testing internally (via test-outlookwebservices) and externally (via testexchangeconnecivity.com).  I found a post that gave the missing piece of the puzzle, and modified my configuration per the recommendations  http://forums.isaserver.org/m_2002041377/mpage_2/key_/tm.htm:

  • Create a 2nd Publishing rule for OA, sitting on top of primary OA publishing rule.
    • Only /rpc/* is published
    • Auth Delegation is set to "No Delegation, but client may authenticate directly"
    • Set to "all users" instead of "authenticated users"
    • Changing "EXPR" Outlookprovider to msstd:mail.mycompanyname.com so that the certificate mutual authentication test passes.

Under the conditions described above, Outlook Anywhere with Autodiscover functions as desired.

As Jason Jones put it best, “The reason for the need of a separate listener is that Windows Authentication (NTLM) and Forms Based Authentication (FBA) are mutually exclusive. It is not possible to use a single web listener for all Exchange 2007 publishing and achieve transparent authentication within Outlook anywhere.”  Thus the need to create a dedicated listener to be used exclusively for Outlook Anywhere and associated services.

I did have to make one other adjustment that is rarely brought up in the AutoDiscover deployment scenarios.  We know that AutoDiscover wants to look at your TLD name (e.g. yourpubliccompanyname.com) when doing it’s discovery process.  However, you may have simply had an “A” record of “yourpubliccompanyname.com” pointing to your web server to catch those users who forget to type in “www” before your domain name.  It’s also not a far fetch to assume you had an SSL certificate bound to that web server as well.  This is exactly what I had, so I had to make the following changes:

1.  Have our ISP (or whomever has authoritative control on the DNS zone file for “mypubliccompanyname.com”) change the “A” record from my public web server IP address, to my autodiscover address.

2.  In ISA, add a new “DENY (REDIRECT)” rule for mypubliccompanyname.com that does, well, a deny, and a redirect www.mypubliccompanyname.com.  This sits right above the web publishing rule for www.mypubliccompanyname.com

The original setup was a carryover from an earlier time.  The configuration above is the way I should have set it up.  Nice to do a little cleanup along the way.

I can’t tell you how relieved I was in getting this to work, no matter how many hoops I had to jump through.  I also have a complete set of as-built notes in case I need to recreate or debug the existing configuration.  It’s been stable since, but I have a feeling I’ll be looking at this again as soon as we transition to Exchange 2010. 

Other helpful links:

Microsoft Exchange Remote Connectivity Analyzer
https://www.testexchangeconnectivity.com/

Publishing Exchange 2007 Services with ISA Server 2006…
http://blog.msfirewall.org.uk/2008/07/publishing-exchange-2007-services-with.html

Technet white paper:  Exchange 2007 Autodiscover service:
http://technet.microsoft.com/en-us/library/bb332063.aspx

Generating SSL certificates for Exchange 2007 and ISA 2006:
http://www.isaserver.org/tutorials/Generating-SSL-Certificates-Exchange-2007-ISA-Server-2006.html 

Dr. Tom Shinder’s guides on Publishing Exchange 2007 OWA, activeSync, and RPC/HTTP using ISA 2006:
http://www.isaserver.org/tutorials/Publishing-Exchange-2007-OWA-Exchange-ActiveSync-RPCHTTP-using-2006-ISA-Firewall-Part1.html

A bulk discount on Tylenol.  …You’ll need it.
http://www.costco.com

Living with ISA 2006 and the ISA Firewall client

 

One of my big projects in 2008 was making the transition from my old firewall to a new solution.  I’ve had 18 months or so to work with ISA and the workstations running the Firewall Client software, and thought I’d share my experiences.

First, a little background.  The network I inherited long ago was protected by a Watchguard Firewall.  At the time, it was a moderately capable stateful packet inspection (SPI) unit that performed what was asked of it;  ingress filtering with a little protection from a few application layer proxies.  But times had changed and communication sessions had become more sophisticated.  Exploits were getting more creative and difficult to defend against because they were occurring high up at the application layer.  Like many SPI firewalls, it’s ability to intelligently control outbound traffic was limited.

My acceptance criteria included better protection at the application layer, as well as close integration with my Active Directory based infrastructure.  I also needed a firewall that would help me get a handle on outbound traffic.  ISA 2006 was the answer.  I chose a Celestix MsA4000i appliance running ISA to simplify the hardware procurement and deployment process.

During my implementation planning, I had the opportunity to talk at length with Richard Hicks, a Senior Engineer for Celestix Networks.  Celestix makes a fine product line of security solution appliances running ISA, and Richard (a recent MVP award winner) had excellent insight into ISA implementations, large and small.  I give him credit for helping me translate the functional requirements I was used to with my old firewall, while giving practical recommendations on how ISA performs those same functions, and policy design and implementation.

One of the unique traits of ISA is the various methods it allows internal clients to communicate with. 

  • SecureNAT.  The most basic of the three, and uses ISA as the gateway/router for traditional perimeter based protection.  Used when a default gateway is assigned to the client.
  • Web Proxy Client.  Generally called upon when there are web based requests such as HTTP and FTP calls, etc. 
  • Firewall client.  An optional piece of the ISA solution that runs on Windows clients, and extends the functionality of ISA in ways that cannot be matched by other solutions.

None of these are mutually exclusive, and can be run all at the same time.  Unfortunately, this flexibility can hinder your intentions.  If you want to restrict outbound communication to authenticated access only, running SecureNAT will compromise that ability.  The solution?  Run all non server systems without a default gateway, to force the client to use the web proxy client, or firewall client.  In the event that the target is beyond your LAN, the firewall client will handle all routing.

The easiest transition would have been using SecureNAT for the initial deployment, but there was an opportunity for monumental improvements if I attempted to go without it.  Am I glad I took this extra step?  Yes!  Some of the highlights have been:

  • Outbound connections limited to authenticated users only.  If an outbound connection is made,  I could see what user is requesting it.  Logging provides meaningful data now.
  • True egress control.  Connections initiated from the inside can finally be controlled.  Once everything was up and running, it was fascinating to see what was initiating outbound connections.
  • Forces compliance of application related restrictions.  IM and P2P applications specialize in working their way around firewalls.  The combination of the web proxy, and the firewall client with no SecureNAT helps achieve this.
  • Suppression of malware.   The combination of allowing only authenticated outbound access, along with utilizing an automated malware blacklist database helped control users who had a knack of making a mess out of their PCs.

The results of the improved security stance was impressive.  So was the amount of complaining from end users.  They were furious.  I had angry developers shutting off the firewall client software on their PC.  It made them feel good until they realized shutting down the firewall client gave them less access, not more.  They made claims that BitTorrent was a necessary part of their job, and found it insulting that outbound SSH sessions were not allowed to any host on the Internet.  They didn’t like that their non-domain joined test machines (or unapproved personal laptop) would require a username and password before they could access the Internet.  Their complaints went straight to the top of the organization, as did my explanations.  Security won out, and policies stood without change.

There were some hiccups along the way.  Most deployment related problems were fixed, while others forced some changes in how we worked.  The ISA community is an active one, but with the move of using workstations running the ISA firewall client without a default gateway, it made finding out answers much more difficult.  Some of the obstacles I ran into were:

  • Lack of support for CIFS traversing across network segments.  The firewall client cannot handle this alone, and needs a default gateway.
  • Vista and later workstations need a static route added for remote targets that were not web based.  This can be added via DHCP (option 121, but don’t try to add it via the DHCP snap-in in Vista, otherwise it won’t work).  Thanks to some assistance from Richard Hicks and Microsoft for ultimately explaining the reason behind the inconsistent behavior between XP and Vista.  More info can be found here: http://tmgblog.richardhicks.com/2009/01/10/dns-resolver-behavior-in-windows-vista/ 
  • Building up a healthy list of domains that will be allowed to have anonymous outbound access.  OS and application update domains and mirrors are good examples of this.
  • Older Outlook Clients (2003) wouldn’t talk to the internal Exchange Server using it’s MAPI connection until the following tweak was made:  http://www.isaserver.org/articles/2004olpop3smtp.html
  • Web services that use SSL, but do not run over port 443 had to be accommodated for.  http://www.isaserver.org/articles/2004tunnelportrange.html
  • Browser proxy configurations in *nix workstations may not be enough.  For those workstations, leave a default gateway.

As you can see from the links I provide, I found www.isaserver.org invaluable during my implementation.  It attracts some of the brightest and the best in the security world who contribute articles, and to community forums.  It’s a great resource for any ISA administrator. 

My biggest annoyances in using the firewall client are small, but still worth mentioning.

  • The virtual black hole that the occurs on socket of the workstation running the firewall client.  Trying to debug via traditional methods is nearly impossible.  It simplifies the number of connections from the client, but it’s hard to tell the contents of the connection.
  • The name.  “Firewall Client” implies that it is some application that protects a workstation like ZoneAlarm, Norton, or the Windows Firewall.  A simple name change would eliminate this confusion to newer users, and some IT guys not familiar with ISA.

If I were to do it over again, I would have given more notice on what changes would be occurring, and why.  I had previous verbal green lights from management to restrict thing things like P2P and IM sessions, and our written IT policies had already reflected these restrictions.  I just never had the capability to do so.  I warned staff, but apparently not enough.  I had to do a healthy amount of explaining, which was fine because I had the technical reasons, and the business case on my side. 

I look forward to the next version of ISA (Threat Management Gateway, or TMG) and the steps it takes to improve upon the Firewall Client component.  Recommended reading on using the Firewall Client in ISA 2004 and 2006 can be found below.

Firewall Client
http://www.isaserver.org/tutorials/Understanding-ISA-Firewall-Client-Part1.html

http://www.isaserver.org/articles/2004firewallclient.html

http://www.isaserver.org/tutorials/Understanding_and_installing_ISA_Firewall_Clients.html

http://www.isaserver.org/tutorials/ISA_Clients__Part_2_SecureNAT_and_Web_Proxy_Client.html

Database of malware domains that can be imported directly into ISA
http://www.malwaredomains.com/

A special thanks to Richard Hicks from Celestix, and my good friend Glenn Barnas from Inno-Tech, who provided invaluable information when I needed it most.

Exchange 2007… Better late than never

 

Count me in as one of the many Administrators that finally got around to moving from Exchange 2003 to Exchange 2007.   Yes, it’s almost 2010, and probably just a few months away from the release of Exchange 2010.  I never disagreed with Microsoft’s decision to release Exchange 2007 as a 64 bit only application.  It’s just that it made it incredibly difficult to find a way to transition when your infrastructure is full of physical 32 bit servers.  I was constantly reminded that my Exchange 2003 server was brittle, a resource hog, and lacked many of the abilities that the end users and applications were demanding.  Twenty-five minute reboots, and inexplicable behaviors did not instill confidence.    It was time to make the move.

My timing couldn’t have been better.  I have a new virtualized infrastructure powered by some Dell blades, running VMWare ESX 3.5 using a Dell/ Equallogic PS5000 SAN to help with this transition.  It didn’t take away from the normal shrewd planning necessary on a project of this nature.  It just made it easier.  The other benefit to deploying 3 year old software is that issues, workarounds, and fixes to Exchange, as well as how to make it play nicely with other components (ISA, CRM, etc.) were well documented. 

With respect to running Exchange 2007 in a virtual environment, I noticed information on sizing is difficult to find.  Capacity planning guidelines still reflected deployments with physical servers (read: more is better), and white papers on virtualizing Exchange appeared to have the intended audience of large enterprise environments only.  My environment is relatively small, and a single server was going to be handling all Exchange roles.  Based on my environment of about 50 users and 100 mailboxes, I sifted through all the material I could find, threw in a few wild guesses, and decided with the following configuration.

  • 1 vCPU
  • 2.5GB of RAM
  • Primary OS resides in VMFS volume on the SAN
  • Using guest OS based iSCSI initiator with MPIO enabled, dedicated NTFS volume for Exchange database. 
  • Using guest OS based iSCSI initiator with MPIO enabled, dedicated NTFS volume for Exchange transaction logs.  

The transition occurred over one weekend, while the remainder of the outstanding issues were cleaned up throughout the week.  A moment of embarrassment occurred when I realized my thorough planning never took into consideration that the “move mailbox” function would hit the transaction logs so much.   I didn’t see it until it was too late.  The partition for the transaction logs filled up and the services shut down.  But, thanks to the ease of handling storage with the SAN, I was able to create a new LUN, initialize it in the OS, and change a couple of drive letters.  Ran a backup to commit the transaction logs to the database, and I was back in business.  The only puzzle that took me far too long to figure out was getting the AutoDiscover function in Exchange 2007 to work as intended.  It is worthy of it’s own post at another time.

As far as performance goes, I couldn’t be happier with it running as a VM.  I truly didn’t know how it would react under the settings I established, and was ready to make whatever changes necessary, but it has been performing very well.  Below are some very simple utilization numbers to give you an idea.

CPU:  Hovers around 50% utilization during the day, and around 15% utilization during off hours.  CPU Ready values range anywhere from 20 to 200 milliseconds.  This VM has just 1 vCPU assigned to it.  I wanted to keep it this way if possible, so that it could use the “Fault Tolerant” feature of vSphere when we upgrade.  Looks like I get my wish.

RAM:  Typically runs about 50% of the 2.5GB assigned.  No higher than 75% utilization during the busiest time of the day, and 15% utilization during off hours

Network:  Runs about 1.2 MBps.  Spikes of 40 MBps occur only because of some on-host backups occasionally occurring.  Bandwidth utilization is imperceptible during off hours.

Disk I/O:  Hard to gauge, but mostly because the need seems to be so low.  The OS partition coming from the VMFS volume might show 200KBps, but has bounced up to 7MBps on occasion.  No performance data yet on the drives connected via the guest iSCSI initiator.  I think they qualify as “fast” until I find anything that suggests otherwise.

Perception:  You won’t find this on any ESX or OS performance monitor.  But it’s importance cannot be underestimated.  I had about half of the staff comment about how much snappier their Outlook clients and OWA was working for them.  I never have staff stop by and offer random comments telling me how fast something is.  It’s a nice compliment to the improvements of Exchange 2007, and what it is running on.

No more painfully long restart times for me now.  It’s one of the more overlooked benefits I’ve noticed while moving my systems over to my new infrastructure.  Planned server restarts are a part of responsible management of IT systems.  Anything that can be done to reduce any interruption is appreciated.  A two minute restart is always welcome

It probably goes without saying that I took great joy decommissioning the old server.  Hopefully this will be me in a good position when Exchange 2010 comes off the presses.

Virtualization. Making it happen

 

It’s difficult to put into words how exciting, and how overwhelming the idea of moving to a virtualized infrastructure was for me.  In 12 months, I went from investigating solutions, to presenting our options to our senior management, onto the procurement process, followed by the design and implementation of the systems.  And finally, making the transition of our physical machines to a virtualized environment.

It has been an incredible amount of work, but equally as satisfying.  The pressure to produce results was even bigger than the investment itself.  With this particular project, I’ve taken away a few lessons I learned along the way, some of which had nothing to do with virtualization.  Rather than providing endless technical details on this post, I thought I’d share what I learned that has nothing to do with vswitches or CPU Utilization.

1.  The sell.  I never would have been able to achieve what I achieved without the  support of our Management Team.  I’m an IT guy, and do not have a gift of crafty PowerPoint slides, and fluid presentation skills.  But there was one slide that hit it out of the park for me.   It showed how much this crazy idea was going to cost, but more importantly, how it compared against what we were going to spend anyway under a traditional environment.  We had delayed server refreshing for a few years, and it was catching up to us.  Without even factoring in the projected growth of the company, the two lines intersected in less than one year.  I’m sure the dozen other slides helped support my proposal, but this one offered the clarity needed to get approval.

2.  Let go.  I tend to be self-reliant  and made a habit of leaning on my own skills to get things done.  At a smaller company, you get used to that.  Time simply didn’t allow for that approach to be taken on this project.  I needed help, and fast.  I felt very fortunate to establish a great working relationship with Mosaic Technologies.  They provided resources to me that gave me the knowledge I needed to make good purchasing decisions, then assisted with the high level design.  I had access to a few of the most knowledgeable folks in the industry to help me move forward on the project, minimizing the floundering on my part.  They also helped me with sorting out what could be done, versus real-world recommendations on deployment practices.  It didn’t excuse me from the learning that needed to occur, and making it happen, but rather, helped speed up the process, and apply a virtualization solution to our environment correctly.  There is no way I would have been able to do it in the time frame required without them.

3.  Ditch the notebook.  Consider the way you assemble what you’re learning.  I’ve never needed to gather as much information on a project as this.  I hated not knowing what I didn’t know. (take that Yogi Berra)  I was pouring through books, white papers, and blogs to give myself a crash course on a number of different subjects – all at the same time because they needed to work together.  Because of the enormity of the project, I decided from the outset that I needed to try something different.  This was the first project where I abandoned scratchpads and binders, highlighters (mostly) and printouts.  I documented ALL of my information in Microsoft OneNote.  This was a huge success, which I will describe more in another post.

4.  Tune into RSS feeds.  Virtualization was a great example of a topic that many smart people dedicate their entire focus towards, then are kind enough to post this information on their blogs.  Having feeds come right to your browser is the most efficient way to keep up on the content.  Every day I’d see my listing of feeds for a few dozen or so VMware related blogs I was keeping track of.  It was uncanny how timely, and how applicable some of the information posted was.  Not every bit of information could be unconditionally trusted, but hey, it’s the Internet.

5.  Understand the architecture.  Looking back, I spent an inordinate amount of time in the design phase.  Much of this was trying to fully understand what was being recommended to me by my resources at Mosaic, as well as other material, and how that compared to other environments.  At times, grass grew faster than I was moving on the project at the time, (exacerbated by other projects getting in the way) but I don’t regret my stubbornness to understand what was I was trying to absorb before moving forward.  We now have a scalable, robust system that helps avoid some of the common mistakes I see occur on user forums.

6.  Don’t be a renegade.  Learn from those who really know what they are doing, and choose proven technologies, while recognizing trends in the fast-moving virtualization industry.  For me there was a higher up front cost to this approach, but time didn’t allow for any experimentation.  It helped me settle on VMware ESX powered by Dell blades, running on a Dell/EqualLogic iSCSI SAN.  That is not a suggestion that a different, or lesser configuration will not work, but for me, it helped expedite my deployment.

7.  Just because you are a small shop, doesn’t mean you don’t have to think big.  Much of my design considerations surrounded planning for the future.  How the system could scale and change, and how to minimize the headaches with those changes.  I wanted my VLAN’s arranged logically, and address boundaries configured in a way that would make sense for growth.  For a company of about 50 employees/120 systems, I never had to deal with this very much.  Thanks to another good friend of mine whom I’d been corresponding with on a project a few months prior, I was able to get things started on the right foot.  I’ll tell you more about this in a later post.

The results of the project have exceeded my expectations.  It’s working even better than I anticipated, and has already proven it’s value when I had a hardware failure occur.  We’ve migrated over 20 of our production systems to the new environment, and will have about 20 more up online within about 6 months.  There is a tremendous amount of work yet to be completed, but the benefits are paying for themselves already.

It’s all about the name

Every once in a while you run into a way of doing things that makes you wonder why you ever did it any other way.  For me, that was using DNS aliasing for referencing all servers, and services that they provide.  I use them whenever possible.

Many years ago I had a catastrophic server failure.  Looking back, it was a fascinating series of events that you would think would never happen, but it did.  This server happened to be the primary storage server for our development team, and was a staple of our development system.  It’s full server name was hardcoded on mount points and symbolic links of other *nix systems, as well as drive mappings from windows machines connecting to it via Samba.  It’s name was buried in countless scripts owned by the Development and QA teams.  Once the new hardware came in, provisioning a new server was relatively easy.  Getting everything functioning again because of these broken links was not.  Other factors prevented me from using the old approach, which was naming the new server the same name as the old server.  So I knew there had to be a better way.  There was.  That was using DNS aliasing (cname records) on your internal DNS servers to decouple the server name itself from the service it was providing.  This practice helps you design your server infrastructure for change.

Good candidates for aliasing are:

  • NTP/time servers (automated for domain joined machines, but not for non-joined machines, *nix systems, and network devices)
  • Email servers (primary email servers, as well as mail relay servers)
  • Source code control servers
  • Document management, wikis, or collaboration servers
  • Critical workstations/servers that perform source code compiling and/or validation testing.
  • Network devices and OOB management cards.  I can’t remember what the FQDN’s of my switches are.  Can you?
  • Log servers.
  • File Servers and their respective share names or NFS exports (ex. \\infostore\sales & infostore:/exports/sales respectively)

The practice is particularly interesting on file servers.  If you start out with one file server that contains shares for your applications, your files, and your user home directories.  You could have sharenames that reference aliases, all for the very same server.

  • \\appserv\applications
  • \\fileserv\operations
  • \\userserv\joesmith

Now, when you need to move user home directories over to a new server, or bring up a new server to perform that new role, just move the data, turn up the share name, and change the alias.

Now of course, there are some things that aliasing can’t be used on, or doesn’t work well on.

  • DNS clients that need to refer to DNS servers require IP addresses, and can’t use aliases
  • Some windows service that may use complex authentication methods.
  • Services that are relying on SSL certificates that are expecting to see the real name, not the alias.  (ex.  Exchange URL references)
  • Windows Server 2003 and earlier do not support aliases out of the box.  It will support only \\realservername\sharename by default.  You will need to add a registry key to disable strict name checking.  More info found here:  http://support.microsoft.com/kb/281308 

Most recently, I made the transition from Exchange 2003 to Exchange 2007.  Usually a project like that has pages of carefully planned out steps on the cut-over;  what needed to be changed, and when.  What I didn’t have to worry about this time is all of my internal hosts that reference the mail server by it’s DNS alias name;  mailserver.mycompany.lan.  Just one easy step to change the cname reference from the old server name to the new server name, and that was it.  The same thing occurred when I transitioned to new Domain Controllers a few months ago.  These serve as my internal time servers for all internal systems and devices.

What’s most surprising is that this practices is not done in IT environments as often as you’d think.  There might be an occasional alias here and there, but not a calculated effort to help transitions to new servers and reduce downtime.  Whether you are doing planned server transitions, or recovering from a server failure, this is a practice that is guaranteed to help almost any situation.

An introduction of sorts…

There are a thousands of great blogs out there, with extremely smart people contributing all sorts of great information. This may not be one of them. Let me explain.

Every IT Administrator that is a staff of one or two knows that your strength isn’t in knowing every nuance of one particular thing, but rather, the latitude of knowledge needed to make everything work together. The dramatic shift of gears that has to occur in my job on a daily basis is not unique, but no less surprising. From deploying a new Virtualized Infrastructure one day, to figuring out why some SQL buried in our CRM doesn’t work on the next, to getting all of our *nix systems to play with our Windows Systems nicely. It never ends. I used to think that lack of absolute expertise in one specific thing was a hinderence. Now I see it as a strength.

I’ve gotten to stand on the shoulders of many, and would be foolish to think I’ve been able to accomplish everything I have on my own. This includes mentors, colleagues, solution providers, Management teams who trusted my opinion, and those unsung heros who figured out some registry entry that needed to be changed, and chose to write about it, so that I could get some sleep.

So this is to all of those men and women who have the capability setting up multiple VLAN’s, but are finding themselves fixing the photocopier because… well, nobody else can.