May 31, 2010 21 Comments
In parts one and two of my journey in deploying replication between two EqualLogic PS arrays, I described some of the factors that came into play on how my topology would be designed, and the preparation that needed to occur to get to the point of testing the replication functions.
Since my primary objective of this project was to provide offsite protection of my VMs and data in the event of a disaster at my primary facility, I’ve limited my tests to validating that the data is recoverable from or at the remote site. The logistics of failing over to a remote site (via tools like Site Recovery Manager) is way outside the scope of what I’m attempting to accomplish right now. That will certainly be a fun project to work on some day, but for now, I’ll be content with knowing my data is replicating offsite successfully.
With that out of the way, let the testing begin…
Replication using Group Manager
Just like snapshots, replication using the EqualLogic Group Manager is pretty straight forward. However, in my case, using this mechanism would not produce snapshots or replicas that are file-system consistent of VM datastores, and would only be reliable for data that was not being accessed, or VM’s that were turned off. So for the sake of brevity, I’m going to skip these tests.
ASM/ME Replica creation.
My ASM/ME replication tests will simulate how I plan on replicating the guest attached volumes within VMs. Remember, these are replicas of the guest attached volumes only – not of the VM.
On each VM where I have guest attached volumes and the HITKit installed (Exchange, SQL, file servers, etc.) I launched ASM/ME to configure and create the new replicas. I’ve scheduled them to occur at a time separate from the daily snapshots.
As you can see, there are two different icons used; one represents snapshots, and the other representing replicas. Each snapshot and replica will show that the guest attached volumes (in this case, “E:\” and “F:\” ) have been protected using the Exchange VSS writer. The two drives are being captured because I created the job from a “Collection” which makes most sense for Exchange and SQL systems that have DB files and transaction log data that you’d want to capture at the exact same time. For the time being, I’m just letting them run once a day to collect some data on replication sizes. ASM/ME is where recovery tasks would be performed on the guest attached volumes.
A tip for those who are running ASM/ME for Smartcopy snapshots or replication. Define in your schedules a “keep count” number of snapshots or replicas that fall within the amount of snapshot reserve you have for that volume. Otherwise, ASM/ME may take a very long time to start the console and reconcile the existing smart copies, and you will also find those old snapshots in the “broken” container of ASM/ME. The startup delay can be so long, it almost looks as if the application has hung, but it has not, so be patient. (By the way, ASM/VE version 2.0, which should be used to protect your VMs, does not have any sort of “keep count” mechanism. Lets keep our fingers crossed for that feature in version 3.0)
ASM/ME Replica restores
Working with replicas using ASM/ME is about as easy as it gets. Just highlight the replica, and click on “Mount as read-only.” Unlike a snapshot, you do not have the option to “restore” over the existing volume when its a replica.
ASM/ME will ask for a drive letter to assign that cloned replica to. Once it’s mounted, you may do with the data as you wish. Note that it will be in a read only state. This can be changed later if needed.
When you are finished with the replica, you can click on the “Unmount and Resume Replication…”
ASM/ME will ask you if you want to keep the replica around after you unmount it. To keep it, uncheck the box next to “Delete snapshot from the PS Series group…”
ASM/VE replica creation
ASM/VE replication, which will be the tool I use to protect my VMs, took a bit more time to set up correctly due to the way that ASM/VE likes to work. I somehow missed the fact that one needed a second ASM/VE server to run at the target/offsite location for the ASM/VE server at the primary site to communicate with. ASM/VE also seems to be hyper-sensitive to the version of Java installed on the ASM/VE servers. Don’t get too anxious on updating to the latest version of Java. Stick with a version recommended by EqualLogic. I’m not sure what that officially would be, but I have been told by Tech Support that version 1.6 Update 18 is safe.
Unlike creating Smartcopy snapshots in ASM/VE, you cannot use the “Virtual Machines” view in ASM/VE to create Smartcopy replicas. Only Datastores, Datacenters, and Clusters support replicas. In my case, I will click “Datastores” view to create Replicas. Since I made the adjustments to where my VM’s were placed in the datastores, (see part 2, under “Preparing VMs for Replication”) it will still be clear as to which VMs will be replicated.
After creating a Smartcopy replica of one of the datastores, I went to see how it looked. In ASM/VE it appeared to complete successfully, and in SANHQ it also seemed to indicate a successful replica. ASM/VE then gave a message of “contacting ASM peer” in the “replica status” column. I’ve seen this occur right after I kicked off a replication job, but on successful jobs, it will disappear shortly. If it doesn’t disappear, this can be a configuration issue (user accounts used to establish the connection due to known issues with ASM/VE 2.0), or caused by Java.
ASM/VE replica restores
At first, ASM/VE Smartcopy replicas didn’t make much sense to me, especially when it came to restores. Perhaps I was attempting to think of them as a long distance snapshot, or that they might behave in the same way as ASM/ME replicas. They work a bit differently than that. It’s not complicated, just different.
To work with the Smartcopy replica, you must first log into the ASM/VE server at the remote site. From there, click on “Replication” > “Inbound Replicas” highlighting the replica from the datastore you are interested in. Then it will present you with the options of “Failover from replica” and “clone from replica” If you attempt to do this from the ASM/VE server from the primary site, these options never present themselves. It makes sense to me after the fact, but took me a few tries to figure that out. For my testing purposes, I’m focusing exclusively on “clone from replica.” The EqualLogic documentation has good information on when each option can be used.
When choosing “Clone from Replica” it will have a checkbox for “Register new virtual machines.” In my case, I uncheck this box, as my remote site will have just a few hosts running ESXi, and will not have a vCenter server to contact.
Once it is complete, access will need to be granted for the remote host in which you will want to try to mount the volume. This can be accomplished by logging into the Group Manager of the target/offsite SAN group, selecting the cloned volume, and entering CHAP credentials, the IP address of the remote host, or the iSCSI initiator name.
Jump right on over to the vSphere client for the remote host, and under “Configuration” > “Storage Adapters” right click on your iSCSI software adapter, and select “Rescan” When complete, go to “Configuration” > “Storage” and you will notice that it the volume does NOT show up. Click “Add Storage” > “Disk/LUN”
When a datastore is recognized as a snapshot, it will present you with the following options. See http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf for more information on which option to choose.
Once completed, the datastore that was replicated to the remote site and cloned so that it can be made available to the remote ESX/i host, should now be visible in “Datastores.”
From there just browse the Datastore, drilling down to the folder of the VM you wish to turn up, highlight and right click the .vmx file, and select “Add to inventory.” Your replicated VM should now be ready for you to power up.
If you are going to be cloning a VM replica living on the target array to a datastore, you will need to do one additional step if any of the VM’s have guest attached volumes using the guest iSCSI initiator. At the target location, open up Group Manager, and drill down to “Replication Partners” > “[partnername]” and highlight the “Inbound” tab. Expand the volume(s) that are associated with that VM. Highlight the replica that you want, then click on “Clone replica”
This will allow you to reattach a guest attached volume to that VM. Remember that I’m using the cloning feature simply to verify that my VM’s and data are replicating as they should. Turning up systems for offsite use is a completely different ballgame, and not my goal – for right now anyway.
Depending on how you have your security and topology set up, and how connected your ESX host is offsite, your test VM you just turned up at the remote site may have the ability to contact Active Directory at your primary site, or guest attached volumes at your primary site. This can cause problems for obvious reasons, so be careful to not let either one of those happen.
While demonstrating some of these capabilities recently to the company, the audience (Developers, Managers, etc.) was very impressed with the demonstration, but their questions reminded me of just how little they understood the new model of virtualization, and shared storage. This can be especially frustrating for Software Developers, who generally consider that there isn’t anything in IT that they don’t understand or know about. They walked away impressed, and confused. Mission accomplished.
Now that I’ve confirmed that my data and VM’s are replicating correctly, I’ll be building up some of my physical topology so that the offsite equipment has something to hook up to. That will give me a chance to collect some some statistics on replication, which I will share on the next post.