Replication with an EqualLogic SAN; Part 2
May 13, 2010 20 Comments
In part 1 of this series, I outlined the decisions made in order to build a replicated environment. On to the next step. Racking up the equipment, migrating my data, and laying some groundwork for testing replication.
While waiting for the new equipment to arrive, I wanted to take care of a few things first:
1. Update my existing PS5000E array up to the latest firmware. This has never been a problem, other than the times that I’ve forgotten to log in as the default ‘grpadmin’ account (the only account allowed to do firmware updates). The process is slick, with no perceived interruption.
2. Map out how my connections should be hooked up on the switches. Redundant switches can only be redundant if you plug everything in the correct way.
3. IP addressing. It’s all too easy just to randomly assign IP addresses to a SAN. It may be it’s own isolated network, but in the spirit of “design as if you know its going to change” it might just be worth observing good addressing practices. My SAN is on a /24 net block. But I configure my IP addresses to respect potential address boundaries within that address range. This is so that I can subnet or VLAN them down (e.g. /28) later on, as well as helping to simplify rule sets on my ISA server that are based on address boundaries, and not a scattering of addresses.
Preparing the new array
Once the equipment arrived, it made most sense to get the latest firmware on the new array. The quickest way is to set it up temporarily using the “initialize PS series array” feature in the “Remote Setup Wizard” of the EqualLogic HITKit on a machine that can access the array. Make it it’s own group, update the firmware, then reset the array to the factory defaults. After completing the update and typing “reset” up comes the most interesting confirmation prompt you’ll ever see. Instead of “Reset this array to factory defaults?” [Y/N]” where a “Y” or “N” is required, the prompt is “Reset this array to factory defaults? [n/DeleteAllMyDataNow]” You can’t say that isn’t clear. I applaud EqualLogic for making this very clear. Wiping a SAN array clean is serious stuff, and definitely should be harder than typing a “Y” after the word “reset.”
After the unit was reset, I was ready to join it to the existing group temporarily so that I could evacuate all of the data from the old array, and have it placed on the new array. I plugged all of the array ports into the SAN switches, and turned it on. Using the Remote Setup Wizard, I initialized the array, joined it to the group, then assigned and activated the rest of the NICs. To migrate all of the data from one array to another, highlight the member with the data on it, then click on “Delete Member” Perhaps EqualLogic will revisit this term. “Delete” just implies way too many things that doesn’t relate to this task.
The process of migrating data chugs along nicely. VM’s and end users are none-the-wiser. Once it is complete, the old array will remove itself from the group, and reset itself to the factory defaults. It’s really impressive. Actually, the speed and simplicity of the process gave me confidence when we need to add additional storage.
When the old array was back to it’s factory defaults, I went back to initialize the array, and set it up as a new member in a new group. This would be my new group that would be used for some preliminary replication testing, and will eventually live at the offsite location.
As for how this process compares with competing products, I’m the wrong guy to ask. I’ve had zero experience with Fiber Channel SANs, and iSCSI SANs from other vendors. But what I can say is that it was easy, and fast.
After configuring the replication between the two group, which consisted of configuring a few shared passwords between the the two groups, and configuring replication to occur on each volume, I was ready to try it out …Almost.
Snapshots, and replication.
It’s worth taking a step back to review a few things on snapshots and how the EqualLogic handles them. Replicas appear to work in a similar (but not exact) manner to snapshots, so many of the same principals apply. Remember that snapshots can be made in several ways.
1. The most basic are snapshots created in the EqualLogic group Manager. These do exactly as they say, making a snapshot of the volume. The problem is that they are not file-system consistent of VM datastores, and would only be suitable for datastores in which all of the VM’s were turned off at the time the snapshot was made.
2. To protect VM’s, “Autosnapshot manager VMware Edition” (ASM/VE) provides and ability to create a point-in-time snapshot, leveraging vCenter through VMware’s API, then does some nice tricks to make this an independent snapshot (well, of the datastore anyway) that you see in the EqualLogic group manager, under each respective volume.
3. For VM’s with guest iscsi attached drives, there is “Autosnapshot Manager Microsoft Edition” (ASM/ME). This great tool is installed with the Host Integration Toolkit (HITkit). This makes application aware snapshots by taking advantage of the Microsoft Volume Shadow Copy Service Provider. This is key for protecting SQL databases, Exchange databases, and even flat-file storage residing on guest attached drives. It insures that all I/O is flushed when the snapshot is created. I’ve grown quite partial to this type of snapshot, as its nearly instant, no interruption to the end users or services, and provides easy recoverability. The downside is that it can only protect data on iscsi attached drives within the VM’s guest iscsi initiator, and must have a VSS writer specific to an application (e.g. Exchange, SQL) in order for it to talk correctly. You cannot protect the VM itself with this type of snapshot. Also, vCenter is generally unaware of these types of guest attached drives, so VCB backups and other apps that rely on vCenter won’t include these types of volumes.
So just as I use ASM/ME for smartcopy snapshots of my guest attached drives, and ASM/VE for my VM snapshots, I will use these tools in the similar way to create VM and application aware replica’s of the VM’s and the data.
ASM/VE tip: Smartcopy snapshots using ASM/VE give the option to “Include PS series volumes accessed by guest iSCSI initiators.” I do not use this option for a few very good reasons, and rely completely on ASM/ME for properly capturing guest attached volumes.
Default replication settings in EqualLogic Group Manager
When one first configures a volume for replication, some of the EqualLogic defaults are set very generous. The two settings to look out for are the “Total replica reserve” and the “Local replication reserve.” The result is that these very conservative settings can chew up a lot of your free space on your SAN. Assuming you have a decent amount of free space in your storage pool, and you choose to stagger some of your replication to occur at various times of the day, you can reduce the “Local replication reserve” down to it’s minimum, then click the checkbox for “allow temporary use of free pool space.” This will minimize the impact of enabling replication on your array.
Preparing VM’s for replication
There were a few things I needed to do to prepare my VM’s to be replicated. I wasn’t going to tackle all optimization techniques at this time, but thought it be best to get some of the easy things out of the way first.
1. Reconfigure VM’s so that swap file is NOT in the same directory as the other VM files. (This is the swap file for the VM at the hypervisor level; not to be confused with the guest OS swap file.) First I created a volume in the EqualLogic group manager that would be dedicated for VM swap files, then made sure it was visible to each ESX host. Then, simply configure the swap location at the cluster level in vCenter, followed by changing the setting on each ESX host. The final step will be to power off and power on of each VM. (A restart/reboot will not work for this step). Once this is completed, you’ve eliminated a sizeable amount of data that doesn’t need to be replicated.
2. Revamp datastores to reflect good practices with ASM/VE. (I’d say “best practices” but I’m not sure if they exist, or if these qualify as such). This is a step that takes into consideration how ASM/VE works, and how I use ASM/VE. I’ve chosen to make my datastores reflect how my VM’s are arranged in vCenter. Below is a screenshot in vCenter of the folders that contain all of my VMs.
Each folder has VMs in it that reside in just one particular datastore. So for instance, the “Prodsystems-Dev” has a half dozen VM’s exclusively for our Development team. These all reside in one datastore called VMFS05DS. When a scheduled snapshot of a vcenter folder (e.g. “Prodsystems-Dev”) using ASM/VE, it will only hit those VM’s in that vcenter folder, and the single datastore that they reside on. If it is not done this way, an ASM/VE snapshot of a folder containing VM’s that reside in different datastores will generate snapshots in each datastore. This becomes terribly confusing to administer, especially when trying to recover a VM.
Since I recreated many of my volumes and datastores, I also jumped on the opportunity to make these new datastores with a 4MB block size instead of the the default 1MB block size. Not really necessary in my situation, but based on the link here, it seems like a a good idea.
Once the volumes and the datastores were created and sized the way I desired, I used the storage vmotion function in vCenter to move each VM into the appropriate datastore to mimic my arrangement of folders in vCenter. Because I’m sizing my datastores for a functional purpose, I have a mix of large and small datastores. I probably would have made these the same size if it weren’t for how ASM/VE works.
The datastores are in place, and now mimic the arrangement of folders of VM’s in vCenter. Now I’m ready to do a little test replication. I’ll save that for the next post.
Michael Ellerbeck has some great posts on his experiences with EqualLogic, replication, Dell switches, and optimization. A lot of good links within the posts.
The Dell/EqualLogic Document Center has some good overview documents on how these components work together. Lots of pretty pictures.