Reworking my PowerConnect 6200 switches for my iSCSI SAN

It sure is easy these days to get spoiled with the flexibility of virtualization and shared storage.  Optimization, maintenance, fail-over, and other adjustments are so much easier than they used to be.  However, there is an occasional reminder that some things are still difficult to change.  For me, that reminder was my switches I use for my SAN.

One of the many themes I kept hearing at this year’s Dell Storage Forum (a great experience I must say) throughout several of the breakout sessions I went to was “get your SAN switches configured correctly.”  A nice reminder to something I was all too aware of already; my Dell PowerConnect 6224 switches were not configured correctly since the day they replaced my slightly less capable (but rock solid) PowerConnect 5424’s.  I returned from the forum committed to getting my switchgear updated and configured the correct way.  Now for the tough parts…  What does “correct” really mean when it comes to the 6200 series switches?  And why didn’t I take care of this a long time ago?  Here are just a few excuses reasons. 

  • At the time of initial deployment, I had difficulty tracking down documentation written specifically for the 6224’s to be configured with iSCSI.  Eventually, I did my best to interpret the configuration settings of the 5424’s, and apply the same principals to the 6224’s.  Unfortunately, the 6224’s are a different animal than the 5424’s, and that showed up after I placed them into production – a task that I regretfully rushed.
  • When I deployed them into production, the current firmware was the 2.x generation.  It was my understanding after the deployment that the 2.x firmware on the 6200 series definitely had growing pains.  I also had the unfortunate timing that the next major revision came out shortly after I put them into production.
  • I had two stacked 6224 switches running my production SAN environment (a setup that was quite common for those I asked at the Dell Storage Forum). While experimenting with settings might be fun in a lab, it is no fun, and serious business when they are running a production environment. I wanted to make adjustments just once, but had difficulty confirming settings.
  • When firmware needs to be updated (a conclusion to an issue I was reporting to Technical Support), it is going to take down the entire stack.  This means that you’d better have everything that uses the SAN off unless you like living dangerously.  Major firmware updates will also require the boot code in each switch to be updated.  A true “lights out” maintenance window that required everything to be shut down.  The humble little 5424’s LAGd together didn’t have that problem.
  • The 2.x to 3.x firmware update also required the boot code to be updated.  However, you simply couldn’t run an “update bootcode” command.  The documentation made this very clear.  The PowerConnect Technical Support Team indicated that the two versions ran different algorithms to unpack the contents, which was the reason for yet another exception to the upgrade process. 

One of the many best practices recommended at the Forum was to stack the switches instead of LAGing them.  Stack, stack, stack was drilled into everyone’s head.  The reasons are very good, and make a lot of sense.

  • Stacking modules in many ways extend the circuiting of a single switch, thus the stacking module doesn’t have to honor or be limited by traditional Ethernet.
  • Managing one switch manages them all.
  • Better, more scalable bandwidth between switches
  • No messing around with LAG’s

But here lays the conundrum of many Administrators who are responsible for production environments.  While stacked 6224’s offer redundancy against hardware failure, they offer no redundancy when it comes to maintenance.  These stacked switches are seen as one logical unit, and may be your weakest link when it comes to maintenance of your virtualized infrastructure.  Interestingly enough, when inquiring further on effective strategies for updating under this topology, I observed a few things;  many other users who were stuck with this very same dilemma, and the answers provided weren’t too exciting.  There were generally three answers I heard from this design decision:

  • Plan for a “lights out” maintenance window.
  • Buy another set of two switches, stack those, then trunk the two together via 10Gbe,
  • Buy better switches. 

See why I wasn’t too excited about my options?

Decision time.  I knew I’d suffer a bit of downtime updating the firmware and revamping the configuration no matter what I did.  Do I stack them as recommended, only to be faced with the same dilemma on the next firmware upgrade?  Or do I LAG the switches together so that I avoid this upgrade fiasco in the future?  LAG’ing is not perfect either, and the more arrays I add (as well as the inter-array traffic increasing with new array features), the more it might compound some of the limitations of LAGs. 

What option won out?  I decided to give stacking ONE more try.  I had to keep the eye on my primary objective; correcting my configuration by way of firmware upgrade and build up a simple, pristine configuration from scratch.  The idea was that the configuration would initially contain the minimum set of modifications to get them working according to best practices.  Then, I could build off of the configuration in the future.  Also influencing my decision was finding out that recommended settings with LAGs apparently change frequently.  For instance, just recently, the recommended setting for flow control for the port channel in a LAG was changed.  These are the types of things I wanted to stay away from.  But with that said, I will continue to keep the option open to LAGing them, for the sole reason that it offers the flexibility for maintenance without shutting down your entire cluster.

So here was my minimum desired results for the switch stack after the upgrade and reconfiguration.  Pretty straight forward. 

  • Management traffic on another VLAN (VLAN 10) on port 1 (for uplinking) and port 2 (for local access).
  • iSCSI traffic on it’s own VLAN (VLAN 100), on all ports not including the management ports.
  • Essentially no traffic on the Default VLAN
  • Recommended global and port specific settings (flow control, spanning tree, jumbo frames, etc.) for iSCSI traffic endpoint connections
  • iSCSI traffic that was available to be routed through my firewall (for replication).

My configuration rework assumed the successful boot code and firmware upgrade to version 3.2.1.3.  I pondered a few different ways to speed this process up, but ultimately just followed the very good steps provided with the documentation for the firmware.  They were clear, and accurate.

By the way, on June 20th, 2011, Dell released their very latest firmware update (thank you RSS feed) to 3.2.1.3 A23.  This now includes their “Auto Detection” of ports for iSCSI traffic.  Even though the name implies a feature that might be helpful, the documentation did not provide enough information needed, and I decided to manually configure as originally planned.

For those who might be in the same boat as I was, here were the exact steps I did for building up a pristine configuration after updating the firmware and boot code.  The configuration below was definitely a combined effort by the folks from the EqualLogic and PowerConnect Teams, and me pouring over a healthy amount of documentation.  It was my hope that this combined effort would eliminate some of the contradictory information I found in previous best practices articles, forum threads, and KB articles that assumed earlier firmware.  I’d like to thank them for being tolerant of my attention to detail, and to get this right the first time.  You’ll see that the rebuild steps are very simple.  Getting confirmation on this was not.

Step 1:  Reset the switch to defaults (make a backup of your old config, just in case)
enable
delete startup-config
reload

 
Step 2:  When prompted, follow the setup wizard in order to establish your management IP, etc. 
 
Step 3:  Put the switch into admin and configuration mode.
enable
configure

 
Step 4:  Establish Management Settings
hostname [yourstackhostname]
enable password [yourenablepassword]
spanning-tree mode rstp
flowcontrol

 
Step 5: Add the appropriate VLAN IDs to the database and setup interfaces.
vlan database
vlan 10
vlan 100
exit
interface vlan 1
exit
interface vlan 10
name Management
exit
interface vlan 100
name iSCSI
exit
ip address vlan 10
 
Step 6: Create an Etherchannel Group for Management Uplink
interface port-channel 1
switchport mode access
switchport access vlan 10
exit
NOTE: Because the switches are stacked, port one on each switch will be configured in this channel-group which can then be connected to their core switch or intermediate switch for management access. Port two on each switch can be used if they need to plug a laptop into the management VLAN, etc.
 
Step 7: Configure/assign Port 1 as part of the management channel-group:
interface ethernet 1/g1
switchport access vlan 10
channel-group 1 mode auto
exit
interface ethernet 2/g1
switchport access vlan 10
channel-group 1 mode auto
exit
 
Step 8: Configure Port 2 as Management Access Switchports (not part of the channel-group):
interface ethernet 1/g2
switchport access vlan 10
exit
interface ethernet 2/g2
switchport access vlan 10
exit
 
Step 9: Configure Ports 3-24 as iSCSI access Switchports
interface range ethernet 1/g3-1/g24
switchport access vlan 100
no storm-control unicast
spanning-tree portfast
mtu 9216
exit
interface range ethernet 2/g3-2/g24
switchport access vlan 100
no storm-control unicast
spanning-tree portfast
mtu 9216
exit
NOTE:  Binding the xg1 and xg2 interfaces into a port-channel is not required for stacking. 
 
Step 10: Exit from Configuration Mode
exit
 
Step 11: Save the configuration!
copy running-config startup-config

Step 12: Back up the configuration
console#copy startup-config tftp://[yourTFTPip]/conf.cfg

In hindsight, the most time consuming aspect of all of this was trying to confirm the exact settings for the 6224’s in an iSCSI SAN.  Running in second was shutting down all of my VMs, ESX hosts, and anything else that connected to the SAN switchgear.  The upgrade and the rebuild was relatively quick and trouble-free.  I’m thrilled to have this behind me now, and I hope that by passing this information along, you too will have a very simple working example to build your configuration off of.  As for the 6224’s, they are working fine now.  I will continue to keep my fingers crossed that Dell will eventually provide a way to update firmware to a stacked set of 6200 series switches without a lights out maintenance window.

Zero to 32 Terabytes in 30 minutes. My new EqualLogic PS4000e

Rack it up, plug it in, and away you go.  Those are basically the steps needed to expand a storage pool by adding another PS array using the Dell/EqualLogic architecture.  A few weeks ago I took delivery of a new PS4000e to compliment my PS6000e at my primary site.  The purpose of this additional array was really simple.  We needed raw storage capacity.  My initial proposal and deployment of my virtualized infrastructure a few years ago was a good one, but I deliberately did not include our big flat-file storage servers in this initial scope of storage space requirements.  There was plenty to keep me occupied between the initial deployment, and now.  It allowed me to get most of my infrastructure virtualized, and gave a chance for buy-in to the skeptics who thought all of this new-fangled technology was too good to be true.  Since that time, storage prices have fallen, and larger drive sizes have become available.  Delaying the purchase aligned well with “just-in-time” purchasing principals, and also gave me an opportunity to address the storage issue in the correct way.   At first, I thought all of this was a subject matter not worthy of writing about.  After all, EqualLogic makes it easy to add storage.  But that only addresses part of the problem.  Many of you face the same dilemma regardless of what your storage solution is; user facing storage growth.

Before I start rambling about my dilemma, let me clarify what I mean by a few terms I’ll be using; “user facing storage” and “non user facing storage.” 

  • User Facing Storage is simply the storage that is presented to end users via file shares (in Windows) and NFS mounts (in Linux).  User facing storage is waiting there, ready to be sucked up by an overzealous end user. 
  • Non User Facing Storage is the storage occupied by the servers themselves, and the services they provide.  Most end users generally have no idea on how much space a server reserves for say, SQL databases or transaction logs (nor should they!)  Non user facing storage is easier to anticipate needs and manage because it is only exposed to system administrators. 

Which array…

I decided to go with the PS4000e because of the value it returns, and how it addresses my specific need.  If I had targeted VDI or some storage for other I/O intensive services, I would have opted for one of the other offerings in the EqualLogic lineup.  I virtualized the majority of my infrastructure on one PS6000e with 16, 1TB drives in it, but it wasn’t capable of the raw capacity that we now needed to virtualize our flat-file storage.  While the effective number of 1GB ports is cut in half on the PS4000e as compared to the PS6000e, I have not been able to gather any usage statistics against my traditional storage servers that suggest the throughput of the PS4000e will not be sufficient.  The PS4000e allowed me to trim a few dollars off of my budget line estimates, and may work well at our CoLo facility if we ever need to demote it.

I chose to create a storage pool so that I could keep my volumes that require higher performance on the PS6000, and have the dedicated storage volumes on the PS4000.  I will do the same for when I eventually add other array types geared for specific roles, such as VDI.

Truth be told, we all know that 16, 2 terabyte drives does not equal 32 Terabytes of real world space.  RAID50 penalty knocks that down to about 21TB.  Cut that by about half for average snapshot reserves, and it’s more like 11TB.  Keeping a little bit of free pools space available is always a good idea, so let’s just say it effectively adds 10TB of full fledged enterprise class storage.  This adds to my effective storage space of 5TB on my PS6000.  Fantastic.  …but wait, one problem.  No, several problems.

The Dilemma

Turning up the new array was the easy part.  In less than 30 minutes, I had it mounted, turned on, and configured to work with my existing storage group.  Now for the hard part; figuring out how to utilize the space in the most efficient way.  User facing storage is a wildcard; do it wrong and you’ll pay for it later.  While I didn’t know the answer, I did know some things that would help me come to an educated decision.

  • If I migrate all of the data on my remaining physical storage servers (two of them, one Linux, and one Windows) over to my SAN, it will consume virtually all of my newly acquired storage space.
  • If I add a large amount of user-facing storage, and present that to end users, it will get sucked up like a vacuum.
  • If I blindly add large amounts of great storage at the primary site without careful thought, I will not have enough storage at the offsite facility to replicate to.
  • Large volumes (2TB or larger) not only run into technical limitations, but are difficult to manage.  At that size, there may also be a co-mingling of data that is not necessarily business critical.  Doling out user facing storage in large volumes is easy to do.  It will come back to bite you later on.
  • Manipulating the old data in the same volume as new data does not bode well for replication and snapshots, which look at block changes.  Breaking them into separate volumes is more effective.
  • Users will not take the time or the effort clean up old data.
  • If data retention policies are in place, users will generally be okay with it after a substantial amount of complaining. It’s not too different than the complaining you might here when there are no data retention policies, but you have no space.  Pick your poison.
  • Your users will not understand data retention policies if you do not understand them.  Time for a plan.

I needed a way to compartmentalize some of the data so that it could be identified as “less important” and then perhaps live on less important storage.  By “less important storage” this could mean that it lives on a part of the SAN that is not replicated, or in a worst case scenario, on even some old decommissioned physical servers, where it resides for a defined amount of time before it is permanently archived and removed from the probationary location.

The Solution (for now)

Data Lifecycle management.  For many this means some really expensive commercial package.  This might be the way to go for you too.  To me, this is really nothing more than determining what is important data, and what isn’t as important, and having a plan to help automate the demotion, or retirement of that data.  However, there is a fundamental problem of this approach.  Who decides what’s important?  What are the thresholds?  Last accessed time?  Last modified time?  What are the ramifications of cherry-picking files from a directory structure because they exceed policy thresholds?  What is this going to break?  How easy is it to recover data that has been demoted?  There are a few steps that I need to do to accomplish this. 

1.  Poor man’s storage tiering.  If you are out of SAN space, re-provision an old server.  The purpose of this will be to serve up volumes that can be linked to the primary storage location through symbolic links.  These volumes can then be backed up at a less frequent interval, as it would be considered less important.  If you eventually have enough SAN storage space, these could be easily moved onto the SAN, but in a less critical role, or on a SAN array that has larger, slower disks.

2.  Breaking up large volumes.  I’m convinced that giant volumes do nothing for you when it comes to understanding and managing the contents.  Turning larger blobs into smaller blobs also serves another very important role.  It allows the intelligence of the EqualLogic solutions to do their work on where the data should live in a collection of arrays.  A storage Group that consists of say, an SSD based array, a PS6000, and a PS4000 can effectively store the volumes in the correct array that best suites the demand.

3.  Automating the process.  This will come in two parts; a.) deciding on structure, policies, etc. and b.) making or using tools to move the files from one location to another.  On the Linux side, this could mean anything from a bash script, or something written in python.  Then use cron to schedule the occurrence.  In Windows, you could leverage PowerShell, vbscript, or batch files.  This would be as simple, or as complex as your needs require.  However, if you are like me, you have limited time to tinker with scripting.  If there is something turn-key that does the job, go for it.  For me, that is an affordable little utility called “TreeSize Pro”  This gives you not only the ability to analyze the contents of NTFS volumes, but can easily automate the pruning of this data to another location.

4.  Monitoring the result.  This one is easy to overlook, but you will need to monitor the fruits of your labor, and make sure it is doing what it should be doing; maintaining available storage space on critical storage devices.  There are a handful of nice scripts that have been written for both platforms that help you monitor free storage space at the server level.

The result

The illustration below helps demonstrate how this would work. 

image

As seen below, once a system is established to automatically move and house demoted data, you can more effectively use storage on the SAN.

image

Separation anxiety…

In order to make this work, you will have to work hard in making sure that the all of this is pretty transparent to the end user.  If you have data that has complex external references, you would want to preserve the integrity of the data that relies on those dependent files.  Hey, I never said this was going to be easy. 

A few things worth remembering…

If 17 years in IT, and a little observation in human nature has taught me one thing, it is that we all undervalue our current data, and overvalue our old data.  You see it time and time again.  Storage runs out, and there are cries for running down to the local box store and picking up some $99 hard drives.  What needs to reside on there is mission critical (hence the undervaluing of the new data).  Conversely, efforts to have users clean up old data from 10+ years ago had users hiding files in special locations, even though it was recorded that it had not been modified, or even accessed in 4+ years.  All of this of course lives on enterprise class storage.  An all too common example of overvaluing old data.

Tip.  Remember your Service Level Agreements.  It is common in IT to not only have SLAs for systems and data, but for one’s position.  These without doubt are tied to one another.  Make sure that one doesn’t compromise the other.  Stop gap measures to accommodate more storage will trigger desperate, affordable solutions.  (e.g. adding cheap non-redundant drives in an old server somewhere).  Don’t do it!  All of those arm-chair administrators in your organization will be nowhere to be found when those drives fail, and you are left to clean up the mess.

Tip.  Don’t ever thin provision user facing storage.  Fortunately, I was lucky to be clued into this early on, but I could only imagine the well intentioned administrator who wanted to present a nice amount of storage space to the user, only to find it sucked up a few days later.  Save the thin provisioning for non user facing storage (servers with SQL databases and transaction logs, etc.)

Tip.  If you are presenting proposals to management, or general information updates to users, I would suggest quoting only the amount of effective, usable space that will be added.  In other words, don’t say you are adding 32TB to your storage infrastructure, when in fact, it is closer to 10TB.  Say that it is 10TB of extremely sophisticated, redundant enterprise class storage that you can “bet the business” on.  It’s scalability, flexibility and robustness is needed for the 24/7 environments we insist upon.  It will just make it easier that way.

Tip.  It may seem unnecessary to you, but continue to show off snapshots, replication, and other unique aspects of SAN storage, if you still have those who doubt the power of this kind of technology – especially when they see the cost per TB.  Repeat to them how long (if even possible) it would take to protect that same data under traditional storage.  Do everything you can to help those who approve these purchases.  More than likely, they won’t be as impressed by say, how quick a snapshot is, but rather, shocked how traditional storage can’t be protected very well.

You may have noticed I do not have any rock-solid answers for managing the growth and sustainability of user facing data.  Situations vary, but the factors that help determine that path for a solution are quite similar.  Whether you decide on a turn-key solution, or choose to demonstrate a little ingenuity in times of tight budgets, the topic is one that you will probably have to face at some point.

 

How I use Dell/EqualLogic’s SANHQ in my environment

 

One of the benefits of investing in Dell/EqualLogic’s SAN solutions are the number of great tools included with the product, at no extra charge.  I’ve written in the past about leveraging their AutoSnapshot Manager for VM and application consistent snapshots and replicas.  Another tool that deserves a few words is SAN HeadQuarters (SANHQ). 

SANHQ allows for real-time and historical analysis of your EqualLogic arrays.  Many EqualLogic users are well versed with this tool, and may not find anything here that they didn’t already know.  But I’m surprised to hear that many are not.  So, what better way to help those unfamiliar with SANHQ than to describe how it helps me with my environment.

While the tool itself is “optional” in the sense that you don’t need to deploy it to use the EqualLogic arrays, it is an easy (and free) way to expose the powers of your storage infrastructure.  If you want to see what your storage infrastructure is doing, do yourself a favor and run SANHQ.   

Starting up the application, you might find something like this:

image

You’ll find an interesting assortment of graphs, and charts that help you decipher what is going on with your storage.  Take a few minutes and do a little digging.  There are various ways that it can help you do your job better.

 

Monitoring

Sometimes good monitoring is downright annoying.  It’s like your alarm clock next to the bed; it’s difficult to overlook, but that’s the point.  SANHQ has proven to be an effective tool for proactive monitoring and alerting of my arrays.  While some of these warnings are never fun, it’s biggest value is that it can help prevent those larger, much more serious problems, which always seem to be a series of small issues thrown together.  Here are some examples of how it has acted as the canary in the coalmine for me in my environment.

  • When I had a high number of TCP retransmits after changing out my SAN Switchgear, it was SANHQ that told me something was wrong.  EqualLogic Support helped me determine that my new switchgear wasn’t handling jumbo frames correctly. 
  • When I had a network port go down on the SAN, it was SANHQ that alerted me via email.  A replacement network cable fixed the problem, and the alarm went away.
  • If replication across groups is unable to occur, you’ll get notified right away that replication isn’t running.  The reasons for this can be many, but SANHQ usually gives you the first sign that something is up.  This works across physical topologies where your target my be at another site.
  • Under maintenance scenarios, you might find the need to pause replication on a volume, or on the entire group.  SANHQ will do a nice job of reminding you that it’s still not replicating, and will bug you at a regular interval that it’s still not running.

 

Analysis and Planning

SANHQ will allow you to see performance data at the group level, by storage pools, volumes, or volume collections.  One of the first things I do when spinning up a VM that uses guest attached volumes, is to jump into SANHQ, and see how those guest attached volumes are running.  How are the average IOPS? What about Latencies and Queue depth?  All of those can be found  easily in SANHQ, and can help put your mind at ease if you are concerned about your new virtualized Exchange or SQL servers.  Here is a screenshot of a 7 day history for SQL server with guest attached volumes, driving our SharePoint backend services.

image

The same can be done of course for VMFS volumes.  This information will compliment existing data one gathers from vCenter to understand if there are performance issues with a particular VMFS volume.

Often times monitoring and analysis isn’t about absolute numbers, but rather, allowing the user to see changes relative to previous conditions.  This is especially important for the IT generalist who doesn’t have time or the know-how for deep dive storage analysis, or have a dedicated Storage Administrator to analyze the data.  This is where the tool really shines.  For whatever type of data you are looking at, you can easily choose a timeline by the last hour, 8 hours, 1 day, 7 days, 30 days, etc.  The anomalies, if there are any, will stand out. 

image

Simply click on the Timeline that you want, and the historical data of the Group, member, volume, etc will show up below.

image

I find analyzing individual volumes (when they are VMFS volumes) and volume collections (when they are guest attached volumes) the most helpful in making sure that there are not any hotspots in I/O.  It can help you determine if a VM might be better served in a VMFS volume that hasn’t been demanding as much I/O as the one it’s currently in.

It can also play a role in future procurement.  Those 15k SAS drives may sound like a neat idea, but does your environment really need that when you decide to add storage?  Thinking about VDI?  It can be used to help determine I/O requirements.  Recently, I was on the phone with a friend of mine, Tim Antonowicz.  Tim is a Senior Solutions Architect from Mosaic Technology who has done a number of successful VDI deployments (and who recently started a new blog).  We were discussing the possibility of VDI in my environment, and one of the first things he asked of me was to pull various reports from SANHQ so that he could understand our existing I/O patterns.  It wasn’t until then that I noticed all of the great storage analysis offerings that any geek would love.  There are a number of canned reports that can be saved out as a pdf, html, csv, or other format to your liking.

image

Replication Monitoring

The value of SANHQ went way up for me when I started replication.  It will give you summaries of the each volume replicated.

image

If you click on an individual volume, it will help you see transfer sizes and replication times of the most recent replicas.  It also separates inbound replica data from outbound replica data.

image

While the times and the transfer rates will be skewed somewhat if you have multiple replica’s running (as I do), it is a great example on how you can understand patterns in changed data on a specific volume.  The volume captured above represents where one of my Domain Controllers lives.  As you can see, it’s pretty consistent, and doesn’t change much, as one would expect (probably not much more than the swap file inside the VM, but that’s another story).  Other kinds of data replicated will fluctuate more widely.  This is your way to see it.

 

Running SANHQ

SANHQ will live happily on a stand alone VM.  It doesn’t require much, but does need direct access to your SAN, and uses SNMP.  Once installed, SANHQ can be run directly on that VM, or the client-only application can be installed on your workstation for a little more convenience.  If you are replicating data, you will want SANHQ to connect to the source site, and the target site, for most effective use of the tool.

Improvements?  Sure, there are a number of things that I’d love to see.  Setting alarms for performance thresholds.  Threshold templates that you could apply to a volume (VMFS or native) that would help you understand the numbers (green = good.  Red = bad).  The ability to schedule reports, and define how and where they are posted.  Free pool space activity warnings (important if you choose to keep replica reserves low and leverage free pool space).  Array diagnostics dumps directly from SANHQ.  Programmatic access for scripting.  Improvements like these could make a useful product become indispensible in a production environment.

Replication with an EqualLogic SAN; Part 5

 

Well, I’m happy to say that replication to my offsite facility is finally up and running now.  Let me share with you the final steps to get this project wrapped up. 

You might recall that in my previous offsite replication posts, I had a few extra challenges.  We were a single site organization, so in order to get replication up and running, an infrastructure at a second site needed to be designed and in place.  My topology still reflects what I described in the first installment, but simple pictures don’t describe the work getting this set up.  It was certainly a good exercise in keeping my networking skills sharp.  My appreciation for the folks who specialize in complex network configurations, and address management has been renewed.  They probably seldom hear words of thanks for say, that well designed sub netting strategy.  They are an underappreciated bunch for sure.

My replication has been running for some time now, but this was all within the same internal SAN network.  While other projects prevented me from completing this sooner, it gave me a good opportunity to observe how replication works.

Here is the way my topology looks fully deployed.

image

Most Collocations or Datacenters give you about 2 square feet to move around, (only a slight exaggeration on the truth) so it’s not the place you want to be contemplating reasons why something isn’t working.  It’s also no fun realizing you don’t have the remote access you need to make the necessary modifications, and you don’t, or can’t drive to the CoLo.  My plan for getting this second site running was simple.  Build up everything locally (switchgear, firewalls, SAN, etc.) and change my topology at my primary site to emulate my the 2nd site.

Here is the way it was running while I worked out the kinks.

image

All replication traffic occurs over TCP port 3260.  Both locations had to have accommodations for this.  I also had to ensure I could manage the array living offsite.  Testing this out with the modified infrastructure at my primary site allowed me to verify traffic was flowing correctly.

The steps taken to get two SAN replication partners transitioned from a single network to two networks (onsite) were:

  1. Verify that all replication is running correctly when the two replication partners are in the same SAN Network
  2. You will need a way to split the feed from your ISP, so if you don’t have one already, place a temporary switch at the primary site on the outside of your existing firewall.  This will allow you to emulate the physical topology of the real site, while having the convenience of all of the equipment located at the primary site. 
  3. After the 2nd firewall (destined for the CoLo) is built and configured, place it on that temporary switch at the primary site.
  4. Place something (a spare computer perhaps) on the SAN segment of the 2nd firewall so you can test basic connectivity (to ensure routing is functioning, etc) between the two SAN networks. 
  5. Pause replication on both ends, take the target array and it’s switchgear offline. 
  6. Plug the target array’s Ethernet ports to the SAN switchgear for the second site, then change the IP addressing of the array/group so that it’s running under the correct net block.
  7. Re-enable replication and run test replicas.  Starting out with the Group Manager.  Then to ASM/VE, then onto ASM/ME.

It would be crazy not to take one step at a time on this, as you learn a little on each step, and can identify issues more easily.  Step 3 introduced the most problems, because traffic has to traverse routers that also are secure gateways.  Not only does one have to consider a couple of firewalls, you now run into other considerations that may be undocumented.  For instance.

  • ASM/VE replication occurs courtesy of vCenter.  But ASM/ME replication is configured inside the VM.  Sure, it’s obvious, but so obvious it’s easy to overlook.  That means any topology changes will require adjustments in each VM that utilize guest attached volumes.  You will need to re-run the “Remote Setup Wizard” to adjust the IP address of the target group that you will be replicating to.
  • ASM/ME also uses a VSS control channel to talk with the array.  If you changed the target array’s group and interface IP addresses, you will probably need to adjust what IP range will be allowed for VSS control.
  • Not so fast though.  VM’s that use guest iSCSI initiated volumes typically have those iSCSi dedicated virtual network cards set with no default gateway.  You never want to enter more than one default gateway on this sort of situation.  The proper way to do this will be to add a persistent static route.  This needs to be done before you run the remote Setup Wizard above.  Fortunately the method to do this hasn’t changed for at least a decade.  Just type in

route –p add [destinationnetwork] [subnetmask] [gateway] [metric]

  • Certain kinds of traffic that passes almost without a trace across a layer 2 segment shows up right away when being pushed through very sophisticated firewalls who’s default stances are deny all unless explicitly allowed.  Fortunately, Dell puts out a nice document on their EqualLogic arrays.
  • If possible, it will be easiest to configure your firewalls with route relationships between the source SAN and the target SAN.  It may complicate your rulesets (NAT relationships are a little more intelligent when it comes to rulesets in TMG), but it simplifies how each node is seeing each other.  This is not to say that NAT won’t work, but it might introduce some issues that wouldn’t be documented.

Step 7 exposed an unexpected issue; terribly slow replicas.  Slow even though it wasn’t even going across a WAN link.  We’re talking VERY slow, as in 1/300th the speed I was expecting.  The good news is that this problem had nothing to do with the EqualLogic arrays.  It was an upstream switch that I was using to split my feed from my ISP.  The temporary switch was not negotiating correctly, and causing packet fragmentation.  Once that switch was replaced, all was good.

The other strange issue was that even though replication was running great in this test environment, I was getting errors with VSS.  ASM/ME at startup would indicate “No control volume detected.”  Even though replicas were running, the replica’s can’t be accessed, used, or managed in any way.  After a significant amount of experimentation, I eventually opened up a case with Dell Support.  Running out of time to troubleshoot, I decided to move the equipment offsite so that I could meet my deadline.  Well, when I came back to the office, VSS control magically worked.  I suspect that the array simply needed to be restarted after I had changed the IP addressing assigned to it. 

My CoLo facility is an impressive site.  Located in the Westin Building in Seattle, it is also where the Seattle Internet Exchange (SIX) is located.  Some might think of it as another insignificant building in Seattle’s skyline, but it plays an important part in efficient peering for major Service Providers.  Much of the building has been converted from a hotel to a top tier, highly secure datacenter and a location in which ISP’s get to bridge over to other ISP’s without hitting the backbone.  Dedicated water and power supplies, full facility fail-over, and elevator shafts that have been remodeled to provide nothing but risers for all of the cabling.  Having a CoLo facility that is also an Internet Exchange Point for your ISP is a nice combination.

Since I emulated the offsite topology internally, I was able to simply plug in the equipment, and turn it on, with the confidence that it will work.  It did. 

My early measurements on my feed to the CoLo are quite good.  Since the replication times include buildup and teardown of the sessions, one might get a more accurate measurement on sustained throughput on larger replicas.  The early numbers show that my 30mbps circuit is translating to replication rates that range in the neighborhood of 10 to 12GB per hour (205MB per min, or 3.4MB per sec.).  If multiple jobs are running at the same time, the rate will be affected by the other replication jobs, but the overall throughput appears to be about the same.  Also affecting speeds will be other traffic coming to and from our site.

There is still a bit of work to do.  I will monitor the resources, and tweak the scheduling to minimize the overlap on the replication jobs.  In past posts, I’ve mentioned that I’ve been considering the idea of separating the guest OS swap files from the VM’s, in an effort to reduce the replication size.  Apparently I’m not the only one thinking about this, as I stumbled upon this article.  It’s interesting, but a nice amount of work.  Not sure if I want to go down that road yet.

I hope this series helped someone with their plans to deploy replication.  Not only was it fun, but it is a relief to know that my data, and the VM’s that serve up that data, are being automatically replicated to an offsite location.

Exchange 2007 on a VM, and the case of the mysterious Isapi deadlock detected error.

 

 

Are you running Exchange 2007 on a VM?  Are you experiencing  odd warning events in the event log that look something like this?

Event Type: Warning
Event Source: W3SVC-WP
Event Category: None
Event ID: 2262
Date:  
Time:  12:28:18 PM
User:  N/A
Computer: [yourserver]
Description:
ISAPI ‘c:\WINDOWS\Microsoft.NET\Framework64\v2.0.50727\aspnet_isapi.dll’ reported itself as unhealthy for the following reason: ‘Deadlock detected’.

If you’ve answered yes to these questions, you’ve most certainly looked for the fix, and found other users in the same boat.  They try and try to fix the issue with adjustments in official documentation or otherwise, with no results.

That was me.  …until I ran across this link.

So, as suggested, I added a 2nd vCPU to my exchange server (running on Windows Server 2008 x64, in a vSphere cluster), and started it up.  These specific warning messages in my event log went away completely.  Okay, after several weeks of monitoring, I may have had a couple of warnings here and there.  But that’s it.  No longer the hundreds of warnings every day.

As for the official explanation, I don’t have one.  Adding vCPU’s to fix problems is not something I want to get in the habit of, but it was an interesting problem, with an interesting solution that was worth sharing.

 

Helpful links:

Microsoft’s most closely related KB article on the issue (that didn’t fix anything for me):
http://support.microsoft.com/kb/821268

Application pool recycling:
http://technet.microsoft.com/en-us/library/cc735314(WS.10).aspx

Restoring an Exchange 2007 mailbox using EqualLogic’s ASM/ME

 

 

Ask most IT Administrators in small to medium sized organizations to recover an Exchange mailbox, and you’ll get responses like, “how important is it to recover it?”  and “How much of it is gone?”  You might even get the slightly patronizing “Oh, well you were close to your storage quota anyway”   This is IT-speak for “I don’t want to recover it” (labor intensive), “I’m not sure if I can recover it” (it didn’t work the last time they tried), or “I can’t recover it, and don’t really want to tell you that.”  Trust me, I’ve been there.

The recovery process has ranged anywhere from non-existent (Exchange 4.0) to  supposedly easy (according to the glossy ad of the 3rd party solution you might have purchased, but could never get to work correctly), to cautiously doable, but an incomplete solution, with later versions of Exchange.  As each year passes, I’m always hoping that technology can make the process easier, without pushing through another big purchase.

I got my wish.  Technology has indeed improved the process.  I recently had a user mailbox’s contents vanish.  Who knows what happened, but I had to get the mailbox back fast, so I had to familiarize myself with the process again.  This time, since my Exchange 2007 Server is virtualized, and the Exchange databases and logs reside on guest attached volumes, I was able to take advantage of EqualLogic’s “AutoSnapshot Manager Microsoft Edition” or ASM/ME. 

ASM/ME allowed me to easily recover an Exchange Storage Group, then mount it as a Recovery Storage Group (RSG).  From that point I could restore just that single mailbox on top of the existing mailbox.  What a refreshing discovery to see how simple the process has become.  And that, it just worked.  No weird errors to investigate, no tapes to fiddle with.  It was a complete solution for a mailbox recovery scenario.  The best part of all, was that its a function available free of charge to any EqualLogic user who is using the Host Integration Toolkit (HITKit) on their Exchange server. 

Here is how you do it.

1.  On the Exchange Server, open up ASM/ME, highlight the smart copy collection that you’d like to recover.  I say “Collection” because I want to recover the volume that has the DB on it and the volume that has the Transaction Logs on it at the same time.

2.  Select option to “Create Recover Storage Group”

3.  Select the desired Storage Group

4.  It will prompt for two drive letters not being used.  This will represent the location of the restored volumes.  So if the Exchange databases are on E: and the Transaction Logs are on F:, it might prompt you to used “G:” and “H:” respectively.

5.  It will complete with the following message
image

6.  Close out of ASM/ME, and launch the “Database Recovery Management” in the Toolbox section of the Exchange Management Console.  This leads to the “Exchange Troubleshooting Assistant in the Exchange Management Console (EMC).

7.  Run through the restoration process.  It will restore the selected mailbox on top of the existing mailbox.

8.  Once it is complete, as the dialog above instructs, you will need to dismount and logoff the smart copy collection set with ASM/ME after the RSG is removed.

The process was fast, and worked the very first time without error.  I’d still prefer to never have to recover a mailbox, but it is nice to know that now, thanks to ASM/ME and the Exchange Database Recovery Management tool, that its really easy to do.

 

Helpful Links

A more detailed guide on using RSGs in Exchange:
http://www.msexchange.org/tutorials/Working-Recovery-Storage-Groups-Exchange-2007.html 

Working with a Recovery Storage group in the Exchange Management Shell (EMS) instead of the EMC.
http://technet.microsoft.com/en-us/library/bb125197(EXCHG.80).aspx

Replication with an EqualLogic SAN; Part 4

 

If you had asked me 6+ weeks ago how far along my replication project would be on this date, I would have thought I’d be basking in the glory of success, and admiring my accomplishments.

…I should have known better.

Nothing like several IT emergencies unrelated to this project to turn one’s itinerary into garbage.  A failed server (an old physical storage server that I don’t have room on my SAN for), a tape backup autoloader that tanked, some Exchange Server and Domain Controller problems, and a host of other odd things that I don’t even want to think about.  It’s overlooked how much work it takes to keep an IT infrastructure from not losing any ground from the day before.  At times, it can make you wonder how any progress is made on anything.

Enough complaining for now.  Lets get back to it.  

 

Replication Frequency

For my testing, all of my replication is set to occur just once a day.  This is to keep it simple, and to help me understand what needs to be adjusted when my offsite replication is finally turned up at the remote site.

I’m not overly anxious to turn up the frequency even if the situation allows.  Some pretty strong opinions exist on how best to configure the frequency of the replicas.  Do a little bit with a high frequency, or a lot with a low frequency.  What I do know is this.  It is a terrible feeling to lose data, and one of the more overlooked ways to lose data is for bad data to overwrite your good data on the backups before you catch it in time to stop it.  Tapes, disk, simple file cloning, or fancy replication; the principal is the same, and so is the result.   Since the big variable is retention period, I want to see how much room I have to play with before I decide on frequency.  My purpose of offsite replication is disaster recovery.  …not to make a disaster bigger.

 

Replication Sizes

The million dollar question has always been how much changed data, as perceived from the SAN will occur for a given period of time, on typical production servers.  It is nearly impossible to know this until one is actually able to run real replication tests.  I certainly had no idea.  This would be a great feature for Dell/EqualLogic to add to their solution suite.  Have a way for a storage group to run in a simulated replication where it simply collects statistics that would accurately reflect the amount of data that would be replicate during the test period.  What a great feature for those looking into SAN to SAN replication.

Below are my replication statistics for a 30 day period, where the replicas were created once per day, after the initial seed replica was created.

Average data per day per VM

  • 2 GB for general servers (service based)
  • 3 GB for servers with guest iSCSI attached volumes.
  • 5.2 GB for code compiling machines

Average data per day for guest iSCSI attached data volumes

  • 11.2 GB for Exchange DB and Transaction logs (for a 50GB database)
  • 200 MB for a SQL Server DB and Transaction logs
  • 2 GB for SharePoint DB and Transaction logs

The replica sizes for the VM’s were surprisingly consistent.  Our code compiling machines had larger replica sizes, as they write some data temporarily to the VM’s during their build processes.

The guest iSCSI attached data volumes naturally varied more from day-to-day activities.  Weekdays had larger amounts of replicated data than weekends.  This was expected.

Some servers, and how they generate data may stick out like sore thumbs.  For instance, our source code control server uses a crude (but important) way of an application layer backup.  The result is that for 75 GB worth of repositories, it would generate 100+ GB of changed data that it would want to replicate.  If the backup mechanism (which is a glorified file copy and package dump) is turned off, the amount of changed data is down to a very reasonable 200 MB per day.  This is a good example of how we will have to change our practices to accommodate replication.

 

Decreasing the amount of replicated data

Up to this point, the only step to reduce the amount of data replication is the adjustment made in vCenter to move the VM’s swap files off onto another VMFS volume that will not be replicated.  That of course only affects the VM’s paging files – not the guest VM’s paging files that are controlled by the OS.  I suspect that a healthy amount of changed data on the VMs are the paging files for the OS.  The amount of changed data on those VM’s looked suspiciously similar to the amount of RAM assigned to the VM.  There typically is some correlation to how much RAM an OS has to run with, and the size of the page file.  This is pure speculation at this point, but certainly worth looking into.

The next logical step would be to figure out what could be done to reconfigure VM’s to perhaps place their paging/swap files in a different, non-replicated location.   Two issues come to mind when I think about this step. 

1.)  This adds an unknown amount of complexity (for deploying, and restoring) to the systems running.  You’d have to be confident in the behavior of each OS type when it comes to restoring from a replica where it expects to see a page file in a certain location, but does not.  How scalable this approach is would also need to be asked.  It might be okay for a few machines, but how about a few hundred?  I don’t know.

2.)  It is unknown as to how much of a payoff there will be.  If the amount of data per VM gets reduced by say, 80%, then that would be pretty good incentive.  If it’s more like 10%, then not so much.  It’s disappointing that there seems to be only marginal documentation on making such changes.  I will look to test this when I have some time, and report anything interesting that I find along the way.

 

The fires… unrelated, and related

One of the first problems to surface recently were issues with my 6224 switches.  These were the switches that I put in place of our 5424 switches to provide better expandability.  Well, something wasn’t configured correctly, because the retransmit ratio was high enough that SANHQ actually notified me of the issue.  I wasn’t about to overlook this, and reported it to the EqualLogic Support Team immediately.

I was able to get these numbers under control by reconfiguring the NIC’s on my ESX hosts to talk to the SAN with standard frames.  Not a long term fix, but for the sake of the stability of the network, the most prudent step for now.

After working with the 6224’s, they do seem to behave noticeably different than the 5242’s.  They are more difficult to configure, and the suggested configurations from the Dell documentation seem were more convoluted and contradictory.  Multiple documents and deployment guides had inconsistent information.  Technical Support from Dell/EqualLogic has been great in helping me determine what the issue is.  Unfortunately some of the potential fixes can be very difficult to execute.  Firmware updates on a stacked set of 6224’s will result in the ENTIRE stack rebooting, so you have to shut down virtually everything if you want to update the firmware.  The ultimate fix for this would be a revamp of the deployment guides (or lets try just one deployment guide) for the 6224’s that nullifies any previous documentation.  By way of comparison, the 5424 switches were, and are very easy to deploy. 

The other issue that came up was some unexpected behavior regarding replication, and it’s use of free pool space.  I don’t have any empirical evidence to tie these two together, but this is what I had observed.

During this past month in which I had an old physical storage server fail on me, there was a moment where I had to provision what was going to be a replacement for this box, as I wasn’t even sure if the old physical server was going to be recoverable.  Unfortunately, I didn’t have a whole lot of free pool space on my array, so I had to trim things up a bit, to get it to squeeze on there.  Once I did, I noticed all sorts of weird behavior.

1.  Since my replication jobs (with ASM/ME and ASM/VE) leverage the free pool space for the creation of temporary replica/snap that is created on the source array, this caused problems.  The biggest one was that my Exchange server would completely freeze during it’s ASM/ME snapshot process.  Perhaps I had this coming to me, because I deliberately configured it to use free pool space (as opposed to a replica reserve) for it’s replication.  How it behaved caught me off guard, and made it interesting enough for me to never want to cut it close on free pool space again.

2.  ASM/VE replica jobs also seems to behave odd with very little free pool space.  Again, this was self inflicted because of my configuration settings.  It left me desiring a feature that would allow you to set a threshold so that in the event of x amount of free pool space remaining, replication jobs would simply not run.  This goes for ASM/VE and ASM/ME.

Once I recovered that failed physical system, I was able to remove that VM I set aside for emergency turn up.  That increased my free pool space back up over 1TB, and all worked well from that point on. 

 

Timing

Lastly, one subject matter came up that doesn’t show up in any deployment guide I’ve seen.  The timing of all this protection shouldn’t be overlooked.  One wouldn’t want to stack several replication jobs on top of each other that use the same free pool space, but haven’t had the time to replicate.  Other snapshot jobs, replicas, consistency checks, traditional backups, etc should be well coordinated to keep overlap to a minimum.  If you are limited on resources, you may also be able to use timing to your advantage.  For instance, set your daily replica of your Exchange database to occur at 5:00am, and your daily snapshot to occur at 5:00pm.  That way, you have reduced your maximum loss period from 24 hours to 12 hours, just by offsetting the times.