How I use Dell/EqualLogic’s SANHQ in my environment

 

One of the benefits of investing in Dell/EqualLogic’s SAN solutions are the number of great tools included with the product, at no extra charge.  I’ve written in the past about leveraging their AutoSnapshot Manager for VM and application consistent snapshots and replicas.  Another tool that deserves a few words is SAN HeadQuarters (SANHQ). 

SANHQ allows for real-time and historical analysis of your EqualLogic arrays.  Many EqualLogic users are well versed with this tool, and may not find anything here that they didn’t already know.  But I’m surprised to hear that many are not.  So, what better way to help those unfamiliar with SANHQ than to describe how it helps me with my environment.

While the tool itself is “optional” in the sense that you don’t need to deploy it to use the EqualLogic arrays, it is an easy (and free) way to expose the powers of your storage infrastructure.  If you want to see what your storage infrastructure is doing, do yourself a favor and run SANHQ.   

Starting up the application, you might find something like this:

image

You’ll find an interesting assortment of graphs, and charts that help you decipher what is going on with your storage.  Take a few minutes and do a little digging.  There are various ways that it can help you do your job better.

 

Monitoring

Sometimes good monitoring is downright annoying.  It’s like your alarm clock next to the bed; it’s difficult to overlook, but that’s the point.  SANHQ has proven to be an effective tool for proactive monitoring and alerting of my arrays.  While some of these warnings are never fun, it’s biggest value is that it can help prevent those larger, much more serious problems, which always seem to be a series of small issues thrown together.  Here are some examples of how it has acted as the canary in the coalmine for me in my environment.

  • When I had a high number of TCP retransmits after changing out my SAN Switchgear, it was SANHQ that told me something was wrong.  EqualLogic Support helped me determine that my new switchgear wasn’t handling jumbo frames correctly. 
  • When I had a network port go down on the SAN, it was SANHQ that alerted me via email.  A replacement network cable fixed the problem, and the alarm went away.
  • If replication across groups is unable to occur, you’ll get notified right away that replication isn’t running.  The reasons for this can be many, but SANHQ usually gives you the first sign that something is up.  This works across physical topologies where your target my be at another site.
  • Under maintenance scenarios, you might find the need to pause replication on a volume, or on the entire group.  SANHQ will do a nice job of reminding you that it’s still not replicating, and will bug you at a regular interval that it’s still not running.

 

Analysis and Planning

SANHQ will allow you to see performance data at the group level, by storage pools, volumes, or volume collections.  One of the first things I do when spinning up a VM that uses guest attached volumes, is to jump into SANHQ, and see how those guest attached volumes are running.  How are the average IOPS? What about Latencies and Queue depth?  All of those can be found  easily in SANHQ, and can help put your mind at ease if you are concerned about your new virtualized Exchange or SQL servers.  Here is a screenshot of a 7 day history for SQL server with guest attached volumes, driving our SharePoint backend services.

image

The same can be done of course for VMFS volumes.  This information will compliment existing data one gathers from vCenter to understand if there are performance issues with a particular VMFS volume.

Often times monitoring and analysis isn’t about absolute numbers, but rather, allowing the user to see changes relative to previous conditions.  This is especially important for the IT generalist who doesn’t have time or the know-how for deep dive storage analysis, or have a dedicated Storage Administrator to analyze the data.  This is where the tool really shines.  For whatever type of data you are looking at, you can easily choose a timeline by the last hour, 8 hours, 1 day, 7 days, 30 days, etc.  The anomalies, if there are any, will stand out. 

image

Simply click on the Timeline that you want, and the historical data of the Group, member, volume, etc will show up below.

image

I find analyzing individual volumes (when they are VMFS volumes) and volume collections (when they are guest attached volumes) the most helpful in making sure that there are not any hotspots in I/O.  It can help you determine if a VM might be better served in a VMFS volume that hasn’t been demanding as much I/O as the one it’s currently in.

It can also play a role in future procurement.  Those 15k SAS drives may sound like a neat idea, but does your environment really need that when you decide to add storage?  Thinking about VDI?  It can be used to help determine I/O requirements.  Recently, I was on the phone with a friend of mine, Tim Antonowicz.  Tim is a Senior Solutions Architect from Mosaic Technology who has done a number of successful VDI deployments (and who recently started a new blog).  We were discussing the possibility of VDI in my environment, and one of the first things he asked of me was to pull various reports from SANHQ so that he could understand our existing I/O patterns.  It wasn’t until then that I noticed all of the great storage analysis offerings that any geek would love.  There are a number of canned reports that can be saved out as a pdf, html, csv, or other format to your liking.

image

Replication Monitoring

The value of SANHQ went way up for me when I started replication.  It will give you summaries of the each volume replicated.

image

If you click on an individual volume, it will help you see transfer sizes and replication times of the most recent replicas.  It also separates inbound replica data from outbound replica data.

image

While the times and the transfer rates will be skewed somewhat if you have multiple replica’s running (as I do), it is a great example on how you can understand patterns in changed data on a specific volume.  The volume captured above represents where one of my Domain Controllers lives.  As you can see, it’s pretty consistent, and doesn’t change much, as one would expect (probably not much more than the swap file inside the VM, but that’s another story).  Other kinds of data replicated will fluctuate more widely.  This is your way to see it.

 

Running SANHQ

SANHQ will live happily on a stand alone VM.  It doesn’t require much, but does need direct access to your SAN, and uses SNMP.  Once installed, SANHQ can be run directly on that VM, or the client-only application can be installed on your workstation for a little more convenience.  If you are replicating data, you will want SANHQ to connect to the source site, and the target site, for most effective use of the tool.

Improvements?  Sure, there are a number of things that I’d love to see.  Setting alarms for performance thresholds.  Threshold templates that you could apply to a volume (VMFS or native) that would help you understand the numbers (green = good.  Red = bad).  The ability to schedule reports, and define how and where they are posted.  Free pool space activity warnings (important if you choose to keep replica reserves low and leverage free pool space).  Array diagnostics dumps directly from SANHQ.  Programmatic access for scripting.  Improvements like these could make a useful product become indispensible in a production environment.

Replication with an EqualLogic SAN; Part 5

 

Well, I’m happy to say that replication to my offsite facility is finally up and running now.  Let me share with you the final steps to get this project wrapped up. 

You might recall that in my previous offsite replication posts, I had a few extra challenges.  We were a single site organization, so in order to get replication up and running, an infrastructure at a second site needed to be designed and in place.  My topology still reflects what I described in the first installment, but simple pictures don’t describe the work getting this set up.  It was certainly a good exercise in keeping my networking skills sharp.  My appreciation for the folks who specialize in complex network configurations, and address management has been renewed.  They probably seldom hear words of thanks for say, that well designed sub netting strategy.  They are an underappreciated bunch for sure.

My replication has been running for some time now, but this was all within the same internal SAN network.  While other projects prevented me from completing this sooner, it gave me a good opportunity to observe how replication works.

Here is the way my topology looks fully deployed.

image

Most Collocations or Datacenters give you about 2 square feet to move around, (only a slight exaggeration on the truth) so it’s not the place you want to be contemplating reasons why something isn’t working.  It’s also no fun realizing you don’t have the remote access you need to make the necessary modifications, and you don’t, or can’t drive to the CoLo.  My plan for getting this second site running was simple.  Build up everything locally (switchgear, firewalls, SAN, etc.) and change my topology at my primary site to emulate my the 2nd site.

Here is the way it was running while I worked out the kinks.

image

All replication traffic occurs over TCP port 3260.  Both locations had to have accommodations for this.  I also had to ensure I could manage the array living offsite.  Testing this out with the modified infrastructure at my primary site allowed me to verify traffic was flowing correctly.

The steps taken to get two SAN replication partners transitioned from a single network to two networks (onsite) were:

  1. Verify that all replication is running correctly when the two replication partners are in the same SAN Network
  2. You will need a way to split the feed from your ISP, so if you don’t have one already, place a temporary switch at the primary site on the outside of your existing firewall.  This will allow you to emulate the physical topology of the real site, while having the convenience of all of the equipment located at the primary site. 
  3. After the 2nd firewall (destined for the CoLo) is built and configured, place it on that temporary switch at the primary site.
  4. Place something (a spare computer perhaps) on the SAN segment of the 2nd firewall so you can test basic connectivity (to ensure routing is functioning, etc) between the two SAN networks. 
  5. Pause replication on both ends, take the target array and it’s switchgear offline. 
  6. Plug the target array’s Ethernet ports to the SAN switchgear for the second site, then change the IP addressing of the array/group so that it’s running under the correct net block.
  7. Re-enable replication and run test replicas.  Starting out with the Group Manager.  Then to ASM/VE, then onto ASM/ME.

It would be crazy not to take one step at a time on this, as you learn a little on each step, and can identify issues more easily.  Step 3 introduced the most problems, because traffic has to traverse routers that also are secure gateways.  Not only does one have to consider a couple of firewalls, you now run into other considerations that may be undocumented.  For instance.

  • ASM/VE replication occurs courtesy of vCenter.  But ASM/ME replication is configured inside the VM.  Sure, it’s obvious, but so obvious it’s easy to overlook.  That means any topology changes will require adjustments in each VM that utilize guest attached volumes.  You will need to re-run the “Remote Setup Wizard” to adjust the IP address of the target group that you will be replicating to.
  • ASM/ME also uses a VSS control channel to talk with the array.  If you changed the target array’s group and interface IP addresses, you will probably need to adjust what IP range will be allowed for VSS control.
  • Not so fast though.  VM’s that use guest iSCSI initiated volumes typically have those iSCSi dedicated virtual network cards set with no default gateway.  You never want to enter more than one default gateway on this sort of situation.  The proper way to do this will be to add a persistent static route.  This needs to be done before you run the remote Setup Wizard above.  Fortunately the method to do this hasn’t changed for at least a decade.  Just type in

route –p add [destinationnetwork] [subnetmask] [gateway] [metric]

  • Certain kinds of traffic that passes almost without a trace across a layer 2 segment shows up right away when being pushed through very sophisticated firewalls who’s default stances are deny all unless explicitly allowed.  Fortunately, Dell puts out a nice document on their EqualLogic arrays.
  • If possible, it will be easiest to configure your firewalls with route relationships between the source SAN and the target SAN.  It may complicate your rulesets (NAT relationships are a little more intelligent when it comes to rulesets in TMG), but it simplifies how each node is seeing each other.  This is not to say that NAT won’t work, but it might introduce some issues that wouldn’t be documented.

Step 7 exposed an unexpected issue; terribly slow replicas.  Slow even though it wasn’t even going across a WAN link.  We’re talking VERY slow, as in 1/300th the speed I was expecting.  The good news is that this problem had nothing to do with the EqualLogic arrays.  It was an upstream switch that I was using to split my feed from my ISP.  The temporary switch was not negotiating correctly, and causing packet fragmentation.  Once that switch was replaced, all was good.

The other strange issue was that even though replication was running great in this test environment, I was getting errors with VSS.  ASM/ME at startup would indicate “No control volume detected.”  Even though replicas were running, the replica’s can’t be accessed, used, or managed in any way.  After a significant amount of experimentation, I eventually opened up a case with Dell Support.  Running out of time to troubleshoot, I decided to move the equipment offsite so that I could meet my deadline.  Well, when I came back to the office, VSS control magically worked.  I suspect that the array simply needed to be restarted after I had changed the IP addressing assigned to it. 

My CoLo facility is an impressive site.  Located in the Westin Building in Seattle, it is also where the Seattle Internet Exchange (SIX) is located.  Some might think of it as another insignificant building in Seattle’s skyline, but it plays an important part in efficient peering for major Service Providers.  Much of the building has been converted from a hotel to a top tier, highly secure datacenter and a location in which ISP’s get to bridge over to other ISP’s without hitting the backbone.  Dedicated water and power supplies, full facility fail-over, and elevator shafts that have been remodeled to provide nothing but risers for all of the cabling.  Having a CoLo facility that is also an Internet Exchange Point for your ISP is a nice combination.

Since I emulated the offsite topology internally, I was able to simply plug in the equipment, and turn it on, with the confidence that it will work.  It did. 

My early measurements on my feed to the CoLo are quite good.  Since the replication times include buildup and teardown of the sessions, one might get a more accurate measurement on sustained throughput on larger replicas.  The early numbers show that my 30mbps circuit is translating to replication rates that range in the neighborhood of 10 to 12GB per hour (205MB per min, or 3.4MB per sec.).  If multiple jobs are running at the same time, the rate will be affected by the other replication jobs, but the overall throughput appears to be about the same.  Also affecting speeds will be other traffic coming to and from our site.

There is still a bit of work to do.  I will monitor the resources, and tweak the scheduling to minimize the overlap on the replication jobs.  In past posts, I’ve mentioned that I’ve been considering the idea of separating the guest OS swap files from the VM’s, in an effort to reduce the replication size.  Apparently I’m not the only one thinking about this, as I stumbled upon this article.  It’s interesting, but a nice amount of work.  Not sure if I want to go down that road yet.

I hope this series helped someone with their plans to deploy replication.  Not only was it fun, but it is a relief to know that my data, and the VM’s that serve up that data, are being automatically replicated to an offsite location.

Exchange 2007 on a VM, and the case of the mysterious Isapi deadlock detected error.

 

 

Are you running Exchange 2007 on a VM?  Are you experiencing  odd warning events in the event log that look something like this?

Event Type: Warning
Event Source: W3SVC-WP
Event Category: None
Event ID: 2262
Date:  
Time:  12:28:18 PM
User:  N/A
Computer: [yourserver]
Description:
ISAPI ‘c:\WINDOWS\Microsoft.NET\Framework64\v2.0.50727\aspnet_isapi.dll’ reported itself as unhealthy for the following reason: ‘Deadlock detected’.

If you’ve answered yes to these questions, you’ve most certainly looked for the fix, and found other users in the same boat.  They try and try to fix the issue with adjustments in official documentation or otherwise, with no results.

That was me.  …until I ran across this link.

So, as suggested, I added a 2nd vCPU to my exchange server (running on Windows Server 2008 x64, in a vSphere cluster), and started it up.  These specific warning messages in my event log went away completely.  Okay, after several weeks of monitoring, I may have had a couple of warnings here and there.  But that’s it.  No longer the hundreds of warnings every day.

As for the official explanation, I don’t have one.  Adding vCPU’s to fix problems is not something I want to get in the habit of, but it was an interesting problem, with an interesting solution that was worth sharing.

 

Helpful links:

Microsoft’s most closely related KB article on the issue (that didn’t fix anything for me):
http://support.microsoft.com/kb/821268

Application pool recycling:
http://technet.microsoft.com/en-us/library/cc735314(WS.10).aspx

Restoring an Exchange 2007 mailbox using EqualLogic’s ASM/ME

 

 

Ask most IT Administrators in small to medium sized organizations to recover an Exchange mailbox, and you’ll get responses like, “how important is it to recover it?”  and “How much of it is gone?”  You might even get the slightly patronizing “Oh, well you were close to your storage quota anyway”   This is IT-speak for “I don’t want to recover it” (labor intensive), “I’m not sure if I can recover it” (it didn’t work the last time they tried), or “I can’t recover it, and don’t really want to tell you that.”  Trust me, I’ve been there.

The recovery process has ranged anywhere from non-existent (Exchange 4.0) to  supposedly easy (according to the glossy ad of the 3rd party solution you might have purchased, but could never get to work correctly), to cautiously doable, but an incomplete solution, with later versions of Exchange.  As each year passes, I’m always hoping that technology can make the process easier, without pushing through another big purchase.

I got my wish.  Technology has indeed improved the process.  I recently had a user mailbox’s contents vanish.  Who knows what happened, but I had to get the mailbox back fast, so I had to familiarize myself with the process again.  This time, since my Exchange 2007 Server is virtualized, and the Exchange databases and logs reside on guest attached volumes, I was able to take advantage of EqualLogic’s “AutoSnapshot Manager Microsoft Edition” or ASM/ME. 

ASM/ME allowed me to easily recover an Exchange Storage Group, then mount it as a Recovery Storage Group (RSG).  From that point I could restore just that single mailbox on top of the existing mailbox.  What a refreshing discovery to see how simple the process has become.  And that, it just worked.  No weird errors to investigate, no tapes to fiddle with.  It was a complete solution for a mailbox recovery scenario.  The best part of all, was that its a function available free of charge to any EqualLogic user who is using the Host Integration Toolkit (HITKit) on their Exchange server. 

Here is how you do it.

1.  On the Exchange Server, open up ASM/ME, highlight the smart copy collection that you’d like to recover.  I say “Collection” because I want to recover the volume that has the DB on it and the volume that has the Transaction Logs on it at the same time.

2.  Select option to “Create Recover Storage Group”

3.  Select the desired Storage Group

4.  It will prompt for two drive letters not being used.  This will represent the location of the restored volumes.  So if the Exchange databases are on E: and the Transaction Logs are on F:, it might prompt you to used “G:” and “H:” respectively.

5.  It will complete with the following message
image

6.  Close out of ASM/ME, and launch the “Database Recovery Management” in the Toolbox section of the Exchange Management Console.  This leads to the “Exchange Troubleshooting Assistant in the Exchange Management Console (EMC).

7.  Run through the restoration process.  It will restore the selected mailbox on top of the existing mailbox.

8.  Once it is complete, as the dialog above instructs, you will need to dismount and logoff the smart copy collection set with ASM/ME after the RSG is removed.

The process was fast, and worked the very first time without error.  I’d still prefer to never have to recover a mailbox, but it is nice to know that now, thanks to ASM/ME and the Exchange Database Recovery Management tool, that its really easy to do.

 

Helpful Links

A more detailed guide on using RSGs in Exchange:
http://www.msexchange.org/tutorials/Working-Recovery-Storage-Groups-Exchange-2007.html 

Working with a Recovery Storage group in the Exchange Management Shell (EMS) instead of the EMC.
http://technet.microsoft.com/en-us/library/bb125197(EXCHG.80).aspx

Replication with an EqualLogic SAN; Part 4

 

If you had asked me 6+ weeks ago how far along my replication project would be on this date, I would have thought I’d be basking in the glory of success, and admiring my accomplishments.

…I should have known better.

Nothing like several IT emergencies unrelated to this project to turn one’s itinerary into garbage.  A failed server (an old physical storage server that I don’t have room on my SAN for), a tape backup autoloader that tanked, some Exchange Server and Domain Controller problems, and a host of other odd things that I don’t even want to think about.  It’s overlooked how much work it takes to keep an IT infrastructure from not losing any ground from the day before.  At times, it can make you wonder how any progress is made on anything.

Enough complaining for now.  Lets get back to it.  

 

Replication Frequency

For my testing, all of my replication is set to occur just once a day.  This is to keep it simple, and to help me understand what needs to be adjusted when my offsite replication is finally turned up at the remote site.

I’m not overly anxious to turn up the frequency even if the situation allows.  Some pretty strong opinions exist on how best to configure the frequency of the replicas.  Do a little bit with a high frequency, or a lot with a low frequency.  What I do know is this.  It is a terrible feeling to lose data, and one of the more overlooked ways to lose data is for bad data to overwrite your good data on the backups before you catch it in time to stop it.  Tapes, disk, simple file cloning, or fancy replication; the principal is the same, and so is the result.   Since the big variable is retention period, I want to see how much room I have to play with before I decide on frequency.  My purpose of offsite replication is disaster recovery.  …not to make a disaster bigger.

 

Replication Sizes

The million dollar question has always been how much changed data, as perceived from the SAN will occur for a given period of time, on typical production servers.  It is nearly impossible to know this until one is actually able to run real replication tests.  I certainly had no idea.  This would be a great feature for Dell/EqualLogic to add to their solution suite.  Have a way for a storage group to run in a simulated replication where it simply collects statistics that would accurately reflect the amount of data that would be replicate during the test period.  What a great feature for those looking into SAN to SAN replication.

Below are my replication statistics for a 30 day period, where the replicas were created once per day, after the initial seed replica was created.

Average data per day per VM

  • 2 GB for general servers (service based)
  • 3 GB for servers with guest iSCSI attached volumes.
  • 5.2 GB for code compiling machines

Average data per day for guest iSCSI attached data volumes

  • 11.2 GB for Exchange DB and Transaction logs (for a 50GB database)
  • 200 MB for a SQL Server DB and Transaction logs
  • 2 GB for SharePoint DB and Transaction logs

The replica sizes for the VM’s were surprisingly consistent.  Our code compiling machines had larger replica sizes, as they write some data temporarily to the VM’s during their build processes.

The guest iSCSI attached data volumes naturally varied more from day-to-day activities.  Weekdays had larger amounts of replicated data than weekends.  This was expected.

Some servers, and how they generate data may stick out like sore thumbs.  For instance, our source code control server uses a crude (but important) way of an application layer backup.  The result is that for 75 GB worth of repositories, it would generate 100+ GB of changed data that it would want to replicate.  If the backup mechanism (which is a glorified file copy and package dump) is turned off, the amount of changed data is down to a very reasonable 200 MB per day.  This is a good example of how we will have to change our practices to accommodate replication.

 

Decreasing the amount of replicated data

Up to this point, the only step to reduce the amount of data replication is the adjustment made in vCenter to move the VM’s swap files off onto another VMFS volume that will not be replicated.  That of course only affects the VM’s paging files – not the guest VM’s paging files that are controlled by the OS.  I suspect that a healthy amount of changed data on the VMs are the paging files for the OS.  The amount of changed data on those VM’s looked suspiciously similar to the amount of RAM assigned to the VM.  There typically is some correlation to how much RAM an OS has to run with, and the size of the page file.  This is pure speculation at this point, but certainly worth looking into.

The next logical step would be to figure out what could be done to reconfigure VM’s to perhaps place their paging/swap files in a different, non-replicated location.   Two issues come to mind when I think about this step. 

1.)  This adds an unknown amount of complexity (for deploying, and restoring) to the systems running.  You’d have to be confident in the behavior of each OS type when it comes to restoring from a replica where it expects to see a page file in a certain location, but does not.  How scalable this approach is would also need to be asked.  It might be okay for a few machines, but how about a few hundred?  I don’t know.

2.)  It is unknown as to how much of a payoff there will be.  If the amount of data per VM gets reduced by say, 80%, then that would be pretty good incentive.  If it’s more like 10%, then not so much.  It’s disappointing that there seems to be only marginal documentation on making such changes.  I will look to test this when I have some time, and report anything interesting that I find along the way.

 

The fires… unrelated, and related

One of the first problems to surface recently were issues with my 6224 switches.  These were the switches that I put in place of our 5424 switches to provide better expandability.  Well, something wasn’t configured correctly, because the retransmit ratio was high enough that SANHQ actually notified me of the issue.  I wasn’t about to overlook this, and reported it to the EqualLogic Support Team immediately.

I was able to get these numbers under control by reconfiguring the NIC’s on my ESX hosts to talk to the SAN with standard frames.  Not a long term fix, but for the sake of the stability of the network, the most prudent step for now.

After working with the 6224’s, they do seem to behave noticeably different than the 5242’s.  They are more difficult to configure, and the suggested configurations from the Dell documentation seem were more convoluted and contradictory.  Multiple documents and deployment guides had inconsistent information.  Technical Support from Dell/EqualLogic has been great in helping me determine what the issue is.  Unfortunately some of the potential fixes can be very difficult to execute.  Firmware updates on a stacked set of 6224’s will result in the ENTIRE stack rebooting, so you have to shut down virtually everything if you want to update the firmware.  The ultimate fix for this would be a revamp of the deployment guides (or lets try just one deployment guide) for the 6224’s that nullifies any previous documentation.  By way of comparison, the 5424 switches were, and are very easy to deploy. 

The other issue that came up was some unexpected behavior regarding replication, and it’s use of free pool space.  I don’t have any empirical evidence to tie these two together, but this is what I had observed.

During this past month in which I had an old physical storage server fail on me, there was a moment where I had to provision what was going to be a replacement for this box, as I wasn’t even sure if the old physical server was going to be recoverable.  Unfortunately, I didn’t have a whole lot of free pool space on my array, so I had to trim things up a bit, to get it to squeeze on there.  Once I did, I noticed all sorts of weird behavior.

1.  Since my replication jobs (with ASM/ME and ASM/VE) leverage the free pool space for the creation of temporary replica/snap that is created on the source array, this caused problems.  The biggest one was that my Exchange server would completely freeze during it’s ASM/ME snapshot process.  Perhaps I had this coming to me, because I deliberately configured it to use free pool space (as opposed to a replica reserve) for it’s replication.  How it behaved caught me off guard, and made it interesting enough for me to never want to cut it close on free pool space again.

2.  ASM/VE replica jobs also seems to behave odd with very little free pool space.  Again, this was self inflicted because of my configuration settings.  It left me desiring a feature that would allow you to set a threshold so that in the event of x amount of free pool space remaining, replication jobs would simply not run.  This goes for ASM/VE and ASM/ME.

Once I recovered that failed physical system, I was able to remove that VM I set aside for emergency turn up.  That increased my free pool space back up over 1TB, and all worked well from that point on. 

 

Timing

Lastly, one subject matter came up that doesn’t show up in any deployment guide I’ve seen.  The timing of all this protection shouldn’t be overlooked.  One wouldn’t want to stack several replication jobs on top of each other that use the same free pool space, but haven’t had the time to replicate.  Other snapshot jobs, replicas, consistency checks, traditional backups, etc should be well coordinated to keep overlap to a minimum.  If you are limited on resources, you may also be able to use timing to your advantage.  For instance, set your daily replica of your Exchange database to occur at 5:00am, and your daily snapshot to occur at 5:00pm.  That way, you have reduced your maximum loss period from 24 hours to 12 hours, just by offsetting the times.

Replication with an EqualLogic SAN; Part 3

 

In parts one and two of my journey in deploying replication between two EqualLogic PS arrays, I described some of the factors that came into play on how my topology would be designed, and the preparation that needed to occur to get to the point of testing the replication functions. 

Since my primary objective of this project was to provide offsite protection of my VMs and data in the event of a disaster at my primary facility,  I’ve limited my tests to validating that the data is recoverable from or at the remote site.   The logistics of failing over to a remote site (via tools like Site Recovery Manager) is way outside the scope of what I’m attempting to accomplish right now.  That will certainly be a fun project to work on some day, but for now, I’ll be content with knowing my data is replicating offsite successfully.

With that out of the way, let the testing begin…

 

Replication using Group Manager 

Just like snapshots, replication using the EqualLogic Group Manager is pretty straight forward.  However, in my case, using this mechanism would not produce snapshots or replicas that are file-system consistent of VM datastores, and would only be reliable for data that was not being accessed, or VM’s that were turned off.  So for the sake of brevity, I’m going to skip these tests.

 

ASM/ME Replica creation.

My ASM/ME replication tests will simulate how I plan on replicating the guest attached volumes within VMs.  Remember, these are replicas of the guest attached volumes  only – not of the VM. 

On each VM where I have guest attached volumes and the HITKit installed (Exchange, SQL, file servers, etc.) I launched ASM/ME to configure and create the new replicas.  I’ve scheduled them to occur at a time separate from the daily snapshots.

image

As you can see, there are two different icons used; one represents snapshots, and the other representing replicas.  Each snapshot and replica will show that the guest attached volumes (in this case, “E:\” and “F:\” )  have been protected using the Exchange VSS writer.  The two drives are being captured because I created the job from a “Collection” which makes most sense for Exchange and SQL systems that have DB files and transaction log data that you’d want to capture at the exact same time.  For the time being, I’m just letting them run once a day to collect some data on replication sizes.  ASM/ME is where recovery tasks would be performed on the guest attached volumes.

A tip for those who are running ASM/ME for Smartcopy snapshots or replication.  Define in your schedules a “keep count” number of snapshots or replicas that fall within the amount of snapshot reserve you have for that volume.  Otherwise, ASM/ME may take a very long time to start  the console and reconcile the existing smart copies, and you will also find those old snapshots in the “broken” container of ASM/ME.    The startup delay can be so long, it almost looks as if the application has hung, but it has not, so be patient.  (By the way, ASM/VE version 2.0, which should be used to protect your VMs, does not have any sort of “keep count” mechanism.  Lets keep our fingers crossed for that feature in version 3.0)

 

ASM/ME Replica restores

Working with replicas using ASM/ME is about as easy as it gets.  Just highlight the replica, and click on “Mount as read-only.”  Unlike a snapshot, you do not have the option to “restore” over the existing volume when its a replica.

image

ASM/ME will ask for a drive letter to assign that cloned replica to.  Once it’s mounted, you may do with the data as you wish.  Note that it will be in a read only state.  This can be changed later if needed.

When you are finished with the replica, you can click on the “Unmount and Resume Replication…”

image

ASM/ME will ask you if you want to keep the replica around after you unmount it.  To keep it, uncheck the box next to “Delete snapshot from the PS Series group…”

 

ASM/VE replica creation

ASM/VE replication, which will be the tool I use to protect my VMs, took a bit more time to set up correctly due to the way that ASM/VE likes to work.  I somehow missed the fact that one needed a second ASM/VE server to run at the target/offsite location for the ASM/VE server at the primary site to communicate with.  ASM/VE also seems to be hyper-sensitive to the version of Java installed on the ASM/VE servers.  Don’t get too anxious on updating to the latest version of Java.   Stick with a version recommended by EqualLogic.  I’m not sure what that officially would be, but I have been told by Tech Support that version 1.6 Update 18 is safe.

Unlike creating Smartcopy snapshots in ASM/VE, you cannot use the “Virtual Machines” view in ASM/VE to create Smartcopy replicas.  Only Datastores, Datacenters, and Clusters support replicas.  In my case, I will click  “Datastores” view to create Replicas.  Since I made the adjustments to where my VM’s were placed in the datastores, (see part 2, under “Preparing VMs for Replication”) it will still be clear as to which VMs will be replicated. 

image

After creating a Smartcopy replica of one of the datastores, I went to see how it looked.  In ASM/VE it appeared to complete successfully, and in SANHQ it also seemed to indicate a successful replica.  ASM/VE then gave a message of “contacting ASM peer” in the “replica status” column.  I’ve seen this occur right after I kicked off a replication job, but on successful jobs, it will disappear shortly.  If it doesn’t disappear, this can be a configuration issue (user accounts used to establish the connection due to known issues with ASM/VE 2.0), or caused by Java.

 

ASM/VE replica restores

At first, ASM/VE Smartcopy replicas didn’t make much sense to me, especially when it came to restores.  Perhaps I was attempting to think of them as a long distance snapshot, or that they might behave in the same way as ASM/ME replicas.  They work a bit  differently than that.  It’s not complicated, just different.

To work with the Smartcopy replica, you must first log into the ASM/VE server at the remote site.  From there, click on “Replication” > “Inbound Replicas” highlighting the replica from the datastore you are interested in.  Then it will present you with the options of “Failover from replica” and “clone from replica”  If you attempt to do this from the ASM/VE server from the primary site, these options never present themselves.  It makes sense to me after the fact, but took me a few tries to figure that out.  For my testing purposes, I’m focusing exclusively on “clone from replica.”  The EqualLogic documentation has good information on when each option can be used.

When choosing “Clone from Replica” it will have a checkbox for “Register new virtual machines.”  In my case, I uncheck this box, as my remote site will have just a few hosts running ESXi, and will not have a vCenter server to contact.

image

 

Once it is complete, access will need to be granted for the remote host in which you will want to try to mount the volume.  This can be accomplished by logging into the Group Manager of the target/offsite SAN group, selecting the cloned volume, and entering CHAP credentials, the IP address of the remote host, or the iSCSI initiator name. 

image

 

Jump right on over to the vSphere client for the remote host, and under “Configuration” > “Storage Adapters”  right click on your iSCSI software adapter, and select “Rescan”  When complete, go to “Configuration” > “Storage” and you will notice that it the volume does NOT show up.  Click “Add Storage” > “Disk/LUN”

image

 

When a datastore is recognized as a snapshot, it will present you with the following options.  See http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf for more information on which option to choose.

image

 

Once completed, the datastore that was replicated to the remote site and cloned so that it can be made available to the remote ESX/i host, should now be visible in “Datastores.” 

image

From there just browse the Datastore, drilling down to the folder of the VM you wish to turn up, highlight and right click the .vmx file, and select “Add to inventory.”  Your replicated VM should now be ready for you to power up.

If you are going to be cloning a VM replica living on the target array to a datastore, you will need to do one additional step if any of the VM’s have guest attached volumes using the guest iSCSI initiator.  At the target location, open up Group Manager, and drill down to “Replication Partners” > “[partnername]” and highlight the “Inbound” tab.  Expand the volume(s) that are associated with that VM.  Highlight the replica that you want, then click on “Clone replica”

image

This will allow you to reattach a guest attached volume to that VM.  Remember that I’m using the cloning feature simply to verify that my VM’s and data are replicating as they should.  Turning up systems for offsite use is a completely different ballgame, and not my goal – for right now anyway.

Depending on how you have your security and topology set up, and how connected your ESX host is offsite, your test VM you just turned up at the remote site may have the ability to contact Active Directory at your primary site, or guest attached volumes at your primary site.  This can cause problems for obvious reasons, so be careful to not let either one of those happen.  

 

Summary

While demonstrating some of these capabilities recently to the company, the audience (Developers, Managers, etc.) was very impressed with the demonstration, but their questions reminded me of just how little they understood the new model of virtualization, and shared storage.  This can be especially frustrating for Software Developers, who generally consider that there isn’t anything in IT that they don’t understand or know about.  They walked away impressed, and confused.  Mission accomplished.

Now that I’ve confirmed that my data and VM’s are replicating correctly, I’ll be building up some of my physical topology so that the offsite equipment has something to hook up to.  That will give me a chance to collect some some statistics on replication, which I will share on the next post.

Replication with an EqualLogic SAN; Part 1

 

Behind every great virtualized infrastructure is a great SAN to serve everything up.  I’ve had the opportunity to work with the Dell/EqualLogic iSCSI array for a while now, taking advantage of all of the benefits that the iSCSI based SAN array offers.  One feature that I haven’t been able to use is the built in replication feature.  Why?  I only had one array, and I didn’t have a location offsite to replicate to.

I suppose the real “part 1” of my replication project was selling the idea to the Management Team.  When it came to protecting our data and the systems that help generate that data, it didn’t take long for them to realize it wasn’t a matter of what we could afford, but how much we could afford to lose.  Having a building less than a mile away burn to the ground also helped the proposal.  On to the fun part; figuring out how to make all of this stuff work.

Of the many forms of replication out there, the most obvious one for me to start with is native SAN to SAN replication.  Why?  Well, it’s built right into the EqualLogic PS arrays, with no additional components to purchase, or license keys or fees to unlock features.  Other solutions exist, but it was best for me to start with the one I already had.

For companies with multiple sites, replication using EqualLogic arrays seems pretty straight forward.  For a company with nothing more than a single site, there are a few more steps that need to occur before the chance to start replicating data can happen.

 

Decision:  Colocation, or hosting provider

One of the first decisions that had to be made was if we wanted our data to be replicated to a Colocation (CoLo) with equipment that we owned and controlled, or with a hosting provider that can provide native PS array space and replication abilities.  Most hosting providers use a mixed variety of metering of data replicated to charge.  Accurately estimating your replication costs assumes you have a really good understanding of how much data will be replicated.  Unfortunately, this is difficult to know until you start replicating.  The pricing models of these hosting providers reminded me too much of a cab fare; never knowing what you are going to pay until you get the big bill when you are finished.    A CoLo with equipment that we owned fit with our current and future objectives much better.  We wanted fixed costs, and the ability to eventually do some hosting of critical services at the CoLo (web, ftp, mail relay, etc.), so it was an easy decision for us.

Our decision was to go with a CoLo facility located in the Westin Building in downtown Seattle.  Commonly known as the Seattle Internet Exchange (SIX), this is an impressive facility not only in it’s physical infrastructure, but how it provides peered interconnects directly from one ISP to another.  Our ISP uses this facility, so it worked out well to have our CoLo there as well

 

Decision:  Bandwidth

Bandwidth requirements for our replication was, and is still unknown, but I knew our bonded T1’s probably weren’t going to be enough, so I started exploring other options for higher speed access.  The first thing to check was to see if we qualified for a Metro-E or “Ethernet over Copper” (award winner for the dumbest name ever).  Metro-E removes the element of T-carrier lines along with any proprietary signaling, and provides internet access of point-to-point connections at Layer 2, instead of Layer 3.  We were not close enough to the carriers central office to get adequate bandwidth, and even if we were, it probably wouldn’t scale up to our future needs.

Enter QMOE, or Qwest Metro Optical Ethernet.  This solution feeds Layer 2 Ethernet to our building via fiber, offering the benefit of high bandwidth, low latency, that can be scaled easily.

Our first foray using QMOE is running a 30mbps point-to-point feed to our CoLo, and uplinked to the Internet.  If we need more later, there is no need to add or change equipment.  Just have them turn up the dial, and bill you accordingly.

 

Decision:  Topology

Topology planning has been interesting to say the least.  The best decision here depends on the use-case, and lets not forget, what’s left in the budget. 

Two options immediately presented themselves.

1.  Replication data from our internal SAN would be routed (Layer 3) to the SAN at the CoLo.

2.  Replication data  from our internal SAN would travel by way of a VLAN to the SAN at the CoLo.

If my need was only to send replication data to the CoLo, one could take advantage of that layer 2 connection, and send replication data directly to the CoLo, without it being routed.  This would mean that it would have to bypass any routers/firewalls in place, and have to be running to the CoLo on it’s own VLAN.

The QMOE network is built off of Cisco Equipment, so in order to utilize any VLANing from the CoLo to the primary facility, you must have Cisco switches that will support their VLAN trunking protocol (VTP).  I don’t have the proper equipment for that right now.

In my case, here is a very simplified illustration as to how the two topologies would look:

Routed Topology

image

 

Topology using VLANs

image

One may introduce more overhead and less effective throughput when the traffic becomes routed.  This is where a WAN optimization solution could come into play.  These solutions (SilverPeak, Riverbed, etc.) appear to be extremely good at improving effective throughput across many types of WAN connections.  These of course must sit at the correct spot in the path to the destination.  The units are often priced on bandwidth speed, and while they are very effective, are also quite an investment.  But they work at layer 3, and must in between the source and a router at both ends of the communication path; something that wouldn’t exist on a Metro-E circuit where VLANing was used to transmit replicated data.

The result is that for right now, I have chosen to go with a routed arrangement with no WAN optimization.  This does not differ too much from a traditional WAN circuit, other than my latencies should be much better.  The next step if our needs are not sufficiently met would be to invest in a couple of Cisco switches, then send replication data over it’s own VLAN to the CoLo, similar to the illustration above.

 

The equipment

My original SAN array is an EqualLogic PS5000e connected to a couple of Dell PowerConnect 5424 switches.  My new equipment closely mirrors this, but is slightly better;  An EqualLogic PS6000e and two PowerConnect 6224 switches.  Since both items will scale a bit better, I’ve decided to change out the existing array and switches with the new equipment.

 

Some Lessons learned so far

If you are changing ISPs, and your old ISP has authoritative control of your DNS zone files, make sure your new ISP has the zone file EXACTLY the way you need it.  Then confirm it one more time.  Spelling errors and omissions in DNS zone files doesn’t work out very well, especially when you factor in the time it takes for the corrections to propagate through the net.  (Usually up to 72 hours, but can feel like a lifetime when your customers can’t get to your website) 

If you are going to go with a QMOE or Metro-E circuit, be mindful that you might have to force the external interface on your outermost equipment (in our case, the firewall/router, but could be a managed switch as well) to negotiate to 100mbps full duplex.  Auto negotiation apparently doesn’t work to well on many Metro-E implementations, and can cause fragmentation that will reduce your effective throughput by quite a bit.  This is exactly what we saw.  Fortunately it was an easy fix.

 

Stay tuned for what’s next…

Comparing Nehalem and Harpertown running vSphere in a production environment

 

The good press that Intel’s Nehalem chip and underlying architecture has been receiving lately gave me pretty good reason to be excited for the arrival of two Dell M610 blades based on the Nehalem chipset.  I really wanted to know how they were going to stack up against my Dell M600’s (running Harpertown chips).  So I thought I’d do some side-by-side comparisons in a real world environment.  It was also an opportunity to put some 8 vCPU VMs to the test under vSphere.

First, a little background information.  The software my company produces runs on just about every version of Windows, Linux, and Unix there is.  We have to compile and validate (exercise) those builds on every single platform.  The majority of our customers run under Windows and Linux, so the ability to virtualize our farm of Windows and Linux build machines was a compelling argument in my case for our initial investment.

Virtualizing build/compiler machines is a great way to take advantage of your virtualized infrastructure.  What seems odd to me though is that I never read about others using their infrastructure in this way.  Our build machines are critical to us.  Ironically, they’d often be running on old leftover systems.  Now that they are virtualized, we are now letting those physical machines do nothing but exercise and validate the builds.  Unfortunately, we cannot virtualize our exerciser machines because of our reliance on GPU’s from the physical machine’s video cards in our validation routines. 

Our Development Team has also invested heavily in Agile and Scrum principals.  One of the hallmarks of that is Test Driven Development (TDD).    Short development cycles, and the ability for each developer to compile and test their changes allows for more aggressive programming, producing more dramatic results.

How does this relate?  Our Developers need build machines that are as fast as possible.  Unlike so many other applications, their compilers actually can use every processor you give them (some better than others, as you will see).  This meant that many Developer machines were being over spec’d, because we’d use them as a build machine as well the Developer’s primary workstation.  This worked, but you could imagine the disruption that occurs when a Developer’s machine was scheduled to be upgraded or modified in any way. (read:  angry Developer gets territorial over their system, even though YOU are the IT guy).    Plus, we typically spend more for desktop workstations than necessary because of the needed horsepower for these systems performing dual roles.

Two recent advancements have allowed me to deliver on my promises to leverage our virtualized infrastructure for our build processes.  vSphere’s improved co-scheduler (along with support for 8 vCPUs), and Intel’s Nehalem chip.  Let’s see how the improvements pan out.

Hardware tested

  • Dell PowerEdge M600 (Harpertown).  Dual chip, quad core Intel E5430 (2.66 Ghz).  32GB RAM
  • Dell PowerEdge M610 (Nehalem).  Dual chip, quad core Intel x5550.  (2.66 Ghz). 32GB RAM

 

Software & VM’s and applications tested

  • vSphere Enterprise Plus 4.0 Update 1
  • VM:  Windows XP x64.  2GB RAM.  4 vCPUs.  Visual Studio 2005*
  • VM:  Windows XP x64.  2GB RAM.  8 vCPUs.  Visual Studio 2005*
  • VM:  Ubuntu 8.04 x64.  2GB RAM.  4 vCPUs.  Cmake
  • VM:  Ubuntu 8.04 x64.  4GB RAM**.  8 vCPUs.  Cmake

*I wanted to test Windows 7 and Visual Studio 2008, which is said to be better at multithreading, but ran out of time.

** 8vCPU Linux VM was bumped up to 4GB of RAM to eliminate some swapping errors I was seeing, but it never used more than about 2.5 GB during the build run.

 

Testing scenarios

My goals for testing were pretty straight forward

  • Compare how VMs executing full builds, running on hosts with Harpertown chips compared to the same VMs running on hosts with Nehalem chips
  • Compare performance of builds when I changed the number of vCPU’s assigned to a VM.
  • Observe how well each compiler on each platform handled multithreading

I limited observations to full build runs, as incremental builds don’t lend well to using multiple threads. 

I admit that my testing methods were far from perfect.  I wish I could have sampled more data to come up with more solid numbers, but these were production build systems, and the situation dictated that I not interfere too much with our build processes just for my own observations.  My focus is mostly on CPU performance in real world scenarios.  I monitored other resources such as disk I/O and memory just to make sure they were not inadvertently affecting the results beyond my real world allowances.

The numbers

Each test run shows two graphs.  The Line graph shows total CPU utilization as a percentage, that is available to the VM.  The stacked line graph shows the number of CPU cycles in Mhz used by the given vCPU. 

Each testing scenario shows the time in minutes to complete.

  Windows XP64 Linux x64
2 vCPU Nehalem 41 N/A
4 vCPU Harpertown 32 38
4 vCPU Nehalem 27 32
8 vCPU Nehalem 32 8.5

 

VM #1  WinXP64.  4 vCPU.  2GB RAM.  Visual Studio 2005.
HarperTown chipset (E5430)
Full build:  33 minutes

 01-tpb004-4vcpu-m600-cpu

 02-tpb004-4vcpu-m600-cpustacked

VM #2 WinXP64.  4 vCPU.  2GB RAM.  Visual Studio 2005
Nehalem chipset (x5550)
Full build:  27 minutes

01-tpb004-4vcpu-m610-cpu

 02-tpb004-4vcpu-m610-cpustacked

VM #3 WinXP64.  8 vCPU.  2GB RAM.  Visual Studio 2005.
Nehalem chipset (x5550)
Full build:  32 minutes

01-tpb004-8vcpu-m610-cpu

02-tpb004-8vcpu-m610-cpustacked

VM #4 WinXP64.  2 vCPU.  2GB RAM.  Visual Studio 2005.
Nehalem chipset (x5550)
Full build:  41 minutes

01-tpb004-2vcpu-m610-cpu

 02-tpb004-2vcpu-m610-cpustacked

VM #5 Ubuntu 8.04 x64.  4 vCPU.  2GB RAM.  Cmake.
HarperTown chipset (E5430)
Full build:  38 minutes

(no graphs available.  My dog ate ‘em.)

 

VM #6 Ubuntu 8.04 x64.  4 vCPU.  2GB RAM.  Cmake.
Nehalem chipset (x5550)
Full build:  32 minutes

01-tpb002-4vcpu-m610-cpu

02-tpb002-4vcpu-m610-cpustacked

VM #7 Ubuntu 8.04 x64.  8 vCPU.  4GB RAM.  Cmake.
Nehalem chipset (x5550)
Full build:  8.5 minutes  (note:  disregard first blip of data on chart)

01-tpb002-8vcpu-m610-cpu

02-tpb002-8vcpu-m610-cpustacked

Notice the  tremendous multithreading performance of build process under Ubuntu 8.10 (x64)!!!  Remarkably even for each vCPU and thread, which is best observed on the stacked graph charts, where the higher that it is stacked, the better it is using all vCPUs available.  Windows and it’s compiler were not nearly as good, actually becoming less efficient when I moved from 4 vCPUs to 8 vCPUs.  The build times reflect this.

A few other things I noticed along the way…

Unlike the old E5430 hosts, hyper threading is possible on the x5550 hosts, and according to VMWare’s documentation, is recommended.  Whether it actually improves performance is subject to some debate, as found here.

If you want to VMotion VMs between your x5550 and your E5430 based hosts, you will need to turn on EVC mode in VCenter.  You can do this in the cluster settings section of VCenter.  According to Intel and VMware, you won’t be dumbing down or hurting the performance of your new hosts. 

My Dell M610 blades (Nehalem) had the Virtualization Technology toggle turned off in the BIOS.  This was the same as my M600’s (Harpertown).  Why this is the default is beyond me, especially on a blade.  Remember to set that before you even start installing vSphere.

For windows VM’s, remember that the desktop OS’ are limited to what it sees as two physical sockets.  By default, it relates one core on the ESX host as one processor in one socket.  To utilze more than just 2 vCPUs on those VMs, set the “cpuid.corespersocket” setting in the settings of the VM.  More details can be found here.

Conclusion
I’ve observed nice performance gains using the hosts with the Nehalem chips.  15 to 20% from my small samples.  However, my very crude testing has not revealed improvements as noted in various posts suggesting that a single vCPU VM running on a Nehalem chips would be nearly equal to that of a 2 vCPU VM on a Harpertown chip (see here).  This is not to say that it can’t happen.  I just haven’t seen that yet.

I was impressed how well and even the multithreading abilities of the compilers running on a Linux VM are, versus the Windows counterpart.  So were the Developers, who saw the 8.5 minute build time as good or better than any physical system we have in the office.  But make no mistake, if you are running a VM with 8 vCPU’s on a host with 8 cores, and it’s able to use all of those 8 vCPU’s, you won’t be getting something for nothing.  Your ESX host will be nearly pegged for those times its running full tilt, and other VMs will suffer.  This was the reason behind our purchase of additional blades.

Side effects of upgrading VM’s to Virtual Hardware 7 in vSphere

 

New Years day, 2010 was a day of opportunity for me.  With the office closed, I jumped on the chance of upgrading my ESX cluster from 3.5 to vSphere 4.0 U1.  Those in IT know that a weekend flanked by a holiday is your friend; offering a bit more time to resolve problems, or back out entirely.  The word on the street was the upgrade to vSphere was a smooth one, but I wanted extra time, just in case.  The plan was laid out, the prep work had been done, and I was ready to pull the trigger.  The steps fell into 4 categories.

1.  Upgrade vCenter 
2.  Upgrade ESX hosts 
3.  Upgrade VM Tools and virtual hardware 
4.  Tidy up by taking care of licensing, VCB, and confirm that everything works as expected.

Pretty straightforward, right?  Well, in fact it was.  VMware should be commended for the fine job they did in making the transition  relatively easy.  I didn’t even need the 3 day weekend to do it.  Did I overlook anything?  Of course.  In particular, I didn’t pay attention to how the Virtual Hardware 7 upgrade was going to affect my systems at the application layer.

At the heart of the matter is that when the Virtual Hardware upgrade is made, VMware will effortlessly disable and hide those old NICs, add new NICs in there, and then reassign the addressing info that were on the old cards.  It’s pretty slick, but  does cause a few problems.

  1. Static IP addressing gets transferred over correctly, but the other NIC settings do not.  Do you have all but IPv4 disabled (e.g. Client for Microsoft networks, QoS, etc.) for your iSCSI connections to your SAN?  Do you have NetBIOS over TCP/IP shut off as well?  Well, after the Hardware 7 upgrade, all of those services will be turned on.  Do you have IPv6 disabled on your LAN NIC?  (no, it’s not a commentary on my dislike of IPv6, but there are many legitimate reasons to do this).  That will be turned back on.
  2. NIC binding order will reset itself to an order you probably do not want.  These affect services in a big way, especially when you factor in side effect #1.  (Please note that none of my systems were multi-homed on the LAN side.  The additional NIC’s for each VM were simply for iSCSI based storage access using a guest iSCSI initiator.)
  3. Guest iSCSI initiator settings *may* be different.  A few of the most common reactions I saw were the “Discovery” tab had duplicate entries, and the “Volumes and Devices” tab no longer had the drive letter of the guest initiated drive.  This is necessary to have in order for some services to not jump the gun too early.
  4. Duplicate reverse DNS records.  I stumbled upon this after the update based off of some errors I was seeing.  Many mysteries can occur with orphaned, duplicate reverse DNS records.  Get rid of ‘em as soon as you see them.  It won’t hurt to check your WINs service and clear that out as well, and keep an eye on those machines with DHCP reservations.
  5. In Microsoft’s Operating Systems, their network configuration subsystem generates a Global Unique Identifier (GUID) for each NIC that is partially based on the MAC address of the NIC.  This GUID may or may not be used in applications like Exchange, CRM, Sharepoint, etc.  When the NIC changes, the GUID changes.  …and services may break.

Items 1 through 4 are pretty easy to handle – even easier when you know what’s coming.  Item #5 is a total wildcard.

What’s interesting about this is that it created the kinds of problems that in many ways are the most problematic for Administrators; where you think it’s running fine, but it’s not.  Most things work long enough to make your VM snapshots no longer relevant, if you plan on the quick fix. 

Now, in hindsight, I see that some of this was documented, as much of this type of thing comes up in P2V, and V2V conversions.  However, much of it was not.  My hope is to save someone else a little heartache.  Here is what I did after each VM was upgraded to Virtual Hardware 7.

All VM’s

Removing old NICs

  1. Open up command shell option for the "run as administrator". In Shell, type set devmgr_show_nonpresent_devices=1, then hit Enter
  2. Type start devmgmt.msc then hit Enter
  3. Click View > Show hidden devices
  4. Expand Network Adapter tree
  5. Right click grayed out NICs, and click uninstall
  6. Click View > Show hidden devices to untoggle.
  7. Exit out of application
  8. type set devmgr_show_nonpresent_devices=0, then hit Enter.

Change binding order of new NICs

  1. Right click on Network, then click Properties > Manage Network Connections
  2. Rename NICs to "LAN" "iSCSI-1" and "iSCSI-2" or whatever you wish.
  3. Change binding order to have LAN NIC at the top of the list
  4. Disable IPV6 on LAN NIC
  5. For iSCSI NIC’s, disable all but TCP/IPv4. Verify static IP (w/ no gateway or DNS servers), verify "Register this connections address in DNS" is unchecked, and Disable NetBIOS over TCP/IP

Verify/Reset iSCSI initator settings

  1. Open iSCSI initiator
  2. Verify that in the Discovery tab, just one entry is in there; x.x.x.x:3260, Default Adapter, Default IP address
  3. Verify Targets and favorite Targets tab
  4. On Volumes and Devices tab, click on "Autoconfigure" to repopulate entries (clears up mangled entries on some).
  5. Click OK, and restart machine.

 DNS and DHCP

  1. Remove duplicate reverse lookup records for systems upgraded to Virtual Hardware 7
  2. For systems that have DHCP reserved addresses, jump into your DHCP manager, and modify as needed.

Exchange 2007 Server

Exchange seemed operational at first, but the more I looked at the event logs, the more I realized I needed to do some clean up

After performing the tasks under the “All VMs” fix, most issues went away.  However, one stuck around.  Because of the GUID issue, if your Exchange Server is running the transport server role, it will flag you with Event ID: 205 errors.  It is still looking for the old GUID.  Here is what to do.

First, determine status of the NICs

[PS] C:\Windows\System32>get-networkconnectioninfo

Name : Intel(R) PRO/1000 MT Network Connection #4
DnsServers : {192.168.0.9, 192.168.0.10}
IPAddresses : {192.168.0.20}
AdapterGuid : 5ca83cae-0519-43f8-adfe-eefca0f08a04
MacAddress : 00:50:56:8B:5F:97

Name : Intel(R) PRO/1000 MT Network Connection #5
DnsServers : {}
IPAddresses : {10.10.0.193}
AdapterGuid : 6d72814a-0805-4fca-9dee-4bef87aafb70
MacAddress : 00:50:56:8B:13:3F

Name : Intel(R) PRO/1000 MT Network Connection #6
DnsServers : {}
IPAddresses : {10.10.0.194}
AdapterGuid : 564b8466-dbe2-4b15-bd15-aafcde21b23d
MacAddress : 00:50:56:8B:2C:22

Then get the transport server info

[PS] C:\Windows\System32>get-transportserver | ft -wrap Name,*DNS*

image 

Then, set the transport server info correctly

set-transportserver SERVERNAME -ExternalDNSAdapterGuid 5ca83cae-0519-43f8-adfe-eefca0f08a04

set-transportserver SERVERNAME -InternalDNSAdapterGuid 5ca83cae-0519-43f8-adfe-eefca0f08a04

 

Sharepoint Server

Thank heavens we are just in the pre-deployment stages for Sharepoint.  Our Sharepoint Consultant asked what I did to mess up the Central Administration site, as it could no longer be accessed (the other sites were fine however).  After a bizarre series of errors, I thought it would be best to restore it from snapshot and test what was going on.  The virtual hardware upgrade definitely broke the connection, but so did removing the existing NIC, and adding another one.   As of now, I can’t determine that it is in fact a problem with the NIC GUID, but it sure seems to be.   My only working solution in the time allowed was to keep the server at Hardware level 4, and build up a new Sharepoint Front End Server. 

One might question why, with the help of vCenter,  a MAC address can’t be forced on the server.   Even though one is able to get the last 12 characters of the GUID (representing the MAC address) the first part is different.  It makes sense because the new device is different.  The applications care about the GUID as a whole, not just the MAC address.

Here is how you can find the GUID of the system’s NIC in question.  Run this BEFORE you perform a virtual hardware update, and save it for future reference in case you run into problems.  Also make note of where it exists in the registry.  It’s not a solution to the issue I had with Sharepoint, but its worth knowing about.

C:\Users\administrator.DOMAINNAME>net config rdr
Computer name
\\SERVERNAME
Full Computer name SERVERNAME.domainname.lan
User name Administrator
Workstation active on
NetbiosSmb (000000000000)
NetBT_Tcpip_{56BB9E44-EA93-43C3-B7B3-88DD478E9F73} (0050568B60BE)
Software version Windows Server (R) 2008 Standard
Workstation domain DOMAINNAME
Workstation Domain DNS Name domainname.lan
Logon domain DOMAINNAME
COM Open Timeout (sec) 0
COM Send Count (byte) 16
COM Send Timeout (msec) 250

 

 In hindsight…

Let me be clear that there is really not much that VMWare can do about this.  The same troubles would occur on a physical machine if you needed to change out Network cards.  The difference is, that it’s not done as easily, as often, or as transparently as on a VM.

If I were to do it over again (and surely I will the day when VMware releases a major upgrade again), I would have done a few things different.

  1. Note all existing application errors and warnings on each server prior to the upgrade, just so I don’t try to ponder if that warning I’m starring at had existed before the upgrade.
  2. Note those GUID’s before you upgrade.  You could always capture it after restoring from a snapshot if you do run into problems, but save yourself a little time and get this down on paper ahead of time.
  3. Take the virtual hardware upgrade slowly.  After everything else went pretty smooth, I was in a “get it done” mentality.  Although the results were not  catastrophic, I could have done better minimizing the issues.
  4. Keep the snapshots around at least the length of your scheduled maintenance window.  It’s not a get-out-of-jail card, but if you have the weekend or off-hours to experiment, it offers you a good tool to do so.

This has also helped me in the decision of taking a very conservative approach to implementing the new VMXNet3 NIC driver to existing VMs.  I might simply update my templates and only deploy them on new systems, or systems that don’t run services that rely on the NIC’s GUID.

One final note.  “GUID” can be many things depending on the context, and may be referenced in different ways (UUID, SID, etc).  Not all GUID’s are NIC GUID’s.  The term can be used quite loosely in various subject matters.   What does this mean to you?  It means that it makes searching the net pretty painful at times.

Interesting links:

A simple powershell script to get the NIC GUID
http://pshscripts.blogspot.com/2008/07/get-guidps1.html

Resource allocation for Virtual Machines

Ever since I started transitioning our production systems to my ESX cluster, I’ve been fascinated how visible resource utilization has become.  Or to put it another way, how blind I was before.  I’ve also been interested to hear about the natural tendency of many Administrator’s to over allocate resources to their VM’s.  Why does it happen?  Who’s at fault?  From my humble perspective, it’s a party everyone has shown up to.

  • Developers & Technical writers
  • IT Administrators
  • Software Manufacturers
  • Politics

Developers & Technical Writers
Best practices and installation guides are usually written by Technical Writers for that Software Manufacturer.  They are provided information by whom else?  The Developers.  Someone on the Development team will take a look-see at their Task Manager or htop in Linux, maybe PerfMon if they have extra time.  They determine (with a less than thorough vetting process) what the requirement should be, and then pass it off to the Technical Writer.  Does the app really need two CPU’s or does that just indicate it’s capable of multithreading?  Or both?  …Or none of the above?  Developers are the group that seems to be most challenged at understanding the new paradigm of virtualization, yet are the one’s to decide what the requirements are.  Some barely know what it is, or dismiss it as nothing more than a cute toy that won’t work for their needs.  It’s pretty fun to show them otherwise, but frustrating to see their continued suspicions of the technology.

IT Administrators (yep, me included)
Take a look at any installation guide for your favorite (or least favorite) application or OS.  Resource minimums are still written for hardware based provisioning.   Most best practice guides outline memory and CPU requirements within the first few pages.  Going against recommendations on page 2 generally isn’t good IT karma.  It feels as counterintuitive as trying to breathe with your head under water.  Only through experience have I grown more comfortable with the practice.  It’s still tough though.

Software Manufacturers
Virtualization can be a sensitive matter to Software Manufactures.  Some would prefer that it doesn’t exist, and choose to document it and license it in that way.  Others will insist that resources are resources, and why would they ever recommend their server application can run with just 768MB of RAM and a single CPU core if there was even a remote possibility of it hurting performance.

Politics
Let’s face it.  How much is Microsoft going to dive into memory recommendations for an Exchange Server when their own virtualization solution does not support some of the advanced memory handling features that VMWare supports?  The answer is, they aren’t.  It’s too bad, because their products run so well in a virtual environment.  Politics can also come from within.  IT departments get coerced by management, project teams or departments, or are just worried about SLA’s of critical services.  They acquiesce to try to keep everyone happy.

What can be done about it.
Rehab for everyone involved.  Too ambitious?  Okay, let’s just try to improve Installation/Best Practices guides from the Software Manufactures.

  • Start with two or three sets of minimums for requirements.  Provisioning the application or OS on a physical machine, followed by provisioning on a VM accommodating a few different hypervisors.
  • Clearly state if the application is even capable of multithreading.  That would eliminate some confusion on whether you even want to consider two or more vCPU’s  on a VM.  I suspect many red faces would show up when software manufactures admit to their customers they haven’t designed their software to work with more than one core anyway.  But this simple step would help Administrators greatly.
  • For VM based installations, note the low threshold amount for RAM in which unnecessary amounts of disk paging will begin to occur.   While the desire is to allocate as little resources as needed, nobody wants disk thrashing to occur.
  • For physical servers, one may have a single server playing a dozen different roles.  Minimums sometimes assume this, and they will throw in a buffer to pad the minimums – just in case.  With a VM, it might be providing just a single role.  Acknowledge that this new approach exists, and adjust your requirements accordingly.

Wishful thinking perhaps, but it would be a start.  Imagine the uproar (and competition) that would occur if a software manufacturer actually spec’d a lower memory or CPU requirement when running under one hypervisor versus another?  …Now I’m really dreaming.

IT Administrators have some say in this too.  Simply put, the IT department is a service provider.  Your staff and the business model are your customers.  As a Virtualization Administrator, you have the ability to assert your expertise on provisioning systems to provide a service, and it work as efficiently as possible.  Let them define what the acceptance criteria for the need they have, and then you deal with how to make it happen.

Real World Numbers
There are many legitimate variables that make it difficult to give one size fits all recommendations on resource requirements.  This makes it difficult for those first starting out.  Rather than making suggestions, I decided I would just summarize some of my systems I have virtualized, and how the utilization rates are for a staff of about 50 people, 20 of them being Software Developers.  These are numbers pulled during business hours.  I do not want to imply that these are the best or most efficient settings.  In fact, many of them were  “first guess” settings that I plan on adjusting later.  They might offer you a point of reference for comparison, or help in your upcoming deployment.

Server/Function Avg % of RAM used Avg % of CPU used / occasional spike Comments
AD Domain (all roles) Controller, DNS, DHCP. 
Windows Server 2008 x64
1 vCPU, 2GB RAM
9% 2% / 15% So much for my DC’s working hard.  2GB is overkill for sure, and I will be adjusting all three of my DC’s RAM downward.  I thought the chattiness of DC’s was more of a burden than it really is.
Exchange 2007 Server (all roles)
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
30% 50% / 80% Consistently our most taxed VM, but pleasantly surprised by how well this runs.
Print Server, AV server 
Windows Server 2008 x64
1 vCPU, 2GB RAM
18% 3% / 10% Sitting as a separate server only because I hate having application servers running as print servers.
Source Code Control Database Server
Windows Server 2003 x64
1 vCPU, 1GB RAM
14% 2% / 40% There were fears from our Dev Team that this was going to be inferior to our physical server, and they suggested the idea of assigning 2 vCPU’s “just in case.”  I said no.  They reported a 25% performance improvement compared to the physical server.  Eventually they might figure out the ol’ IT guy knows what he’s doing.
File Server
Windows Server 2008 x64
1 vCPU, 2GB RAM
8% 4% / 20% Low impact as expected.  Definitely a candidate to reduce resources.
Sharepoint Front End Server
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
10% 15% / 30% Built up, but not fully deployed to everyone in the organization.
Sharepoint Back End/SQL Server
Windows Server 2008 x64
1 vCPU, 2.5GB RAM
9% 15% / 50% I will be keeping a close eye on this when it ramps up to full production use.  SharePoint farms are known to be hogs.  I’ll find out soon enough.
SQL Server for project tracking.
Windows Server 2003 x64
1 vCPU, 1.5GB RAM
12% 4% / 50% Lower than I would have thought.
Code compiling system
Windows XP x64
1 vCPU 1GB RAM
35% 5% / 100% Will spike to 100% CPU usage during compiling (20 min.).  Compilers allow for telling it how many cores to use.
Code compiling system
Ubuntu 8.10 LTS x64
1 vCPU 1GB RAM
35% 5% / 100% All Linux distros seem to naturally prepopulate more RAM than their Windows counterparts, at the benefit perhaps of doing less paging.
       

To complicate matters a bit, you might observe different behaviors on some OS’s (XP versus Vista/2008 versus Windows 7/2008R2, or SQL 2005 versus SQL 2008) in their willingness to pre populate RAM.  Give SQL 2008 4GB of RAM, and it will want to use it even if it isn’t doing much.   You might notice this when looking at relatively idle VM’s with different OS’s, where some have a smaller memory footprint than others.   At the time of this writing, none of my systems were running Windows 2008 R2, as it wasn’t supported on ESX 3.5 as I was deploying them.

Some of these numbers are a testament to ESX’s/vSphere’s superior memory management handling and CPU scheduling.  Memory ballooning, swapping, Transparent Page Sharing all contribute to pretty startling efficiency.

I have yet to virtualize my CRM, web, mail relay, and miscellaneous servers, so I do not have any good data yet for these types of systems.  Having just upgraded to vSphere in the last few days, this also clears the way for me to assign multiple vCPU’s to the code compiling machines (as many as 20 VM’s).  The compilers have switches that can toggle exactly how many cores end up being used, and our Development Team needs these builds compiled as fast as possible.  That will be a topic for another post.

Lessons Learned
I love having systems isolated to performing their intended function now.  Who wants Peachtree crashing their Email server anyway?  Those administrators working in the trenches know that a server that is serving up a single role is easy to manage, way more stable, and doesn’t cross contaminate other services.  In a virtual environment, it’s  worth any additional costs in OS licensing or overhead.

When the transition to virtualizing our infrastructure began, I thought our needs, and our circumstances would be different than they’ve proven to be.  Others claim extraordinary consolidation ratios with virtualization.  I believed we’d see huge improvements, but those numbers wouldn’t possibly apply to us, because (chests puffed out) we needed real power.  Well, I was wrong, and so far, we really are like everyone else.

Helpful links