November 2011

Ever since my series of posts on replication with a Dell EqualLogic SAN, I’ve had a lot of interest from other users wondering how I actually use the built-in tools provided by Dell EqualLogic to protect my environment. This is one of the reasons why I’ve written so much about ASM/ME, ASM/LE, and SANHQ. Well, it’s been a while since I’ve touched on any information about ASM/VE, and since I’ve updated my infrastructure to vSphere 5.0 and the HIT/VE 3.1, I thought I’d share a few pointers that have helped me work with this tool in my environment.

The first generation of HIT/VE was really nothing more than a single tool referred to as “Auto-Snapshot Manager / VMware Edition” or ASM/VE. A lot has changed, as it is now part of a larger suite of VMware-centric tools from EqualLogic called the Host Integration Tools / VMware Edition or HIT/VE. This consists of the following; EqualLogic Auto-Snapshot Manager, EqualLogic Datastore Manager, and the EqualLogic Virtual Desktop Deployment Utility. HIT/VE is one of three Host Integration toolsets. The others being HIT/ME and HIT/LE for Microsoft and Linux respectively.

Ever since HIT/VE 3.0, Dell EqualLogic thankfully transitioned toward and appliance/plug-in model. This reduced overhead, complexity, and removed some of the quirks with the older implementations. Because I had been lagging behind in updating vSphere, I was still using 2.x up until recently, and skipped right over 3.0 to 3.1. Surprisingly, many of the same practices that have served me well with the older version adopt quite well to the new version.

Let me preface that these are just my suggestions off of personal use with all versions of the HIT over the past 3 years. Just as with any solution, there are a number of different ways to achieve the same result. The information provided may or may not align with best practices from Dell, or your own practices. But the tips I provide have stood up to the rigors of a production environment, and have actually worked in real recovery scenarios. Whatever decisions you make should compliment your larger protection strategies, as this is just one piece of the puzzle.

Tips for Configuring and working with the HIT/VE appliance

1. The initial configuration will ask for registration in vCenter (configuration item #8 on the appliance). You may only register one HIT/VE appliance in vCenter.

2. The HIT/VE appliance was designed to integrate with vCenter. But it also offers the flexibility of access. After the initial configuration, you can verify and modify settings in the respective ASM appliances by browsing directly to their IP address, FQDN, or DNS alias name. You may type in: http://%5BapplianceFQDN%5D or for the Auto-Snapshot Manager, type in http://%5BapplianceFQDN%5D/vmsnaptool.html

3. Configuration of the storage management network on the appliance is optional, and depending on your topology, may not be needed.

4. When setting up replication partners, ASM will ask for a “Server URL” This implies you should enter an “http://” or “https://” Just enter in the IP address or FQDN without the http:// prefix. A true URL as it implies will not work.

5. After you have configured your HIT/VE appliances, run back through and double check the settings. I had two of them mysteriously reset some DNS configuration during the initial deployment. It’s been fine since that time. It might have been my mistake (twice), but it might not.

6. For just regular (local) SmartCopies, create one HIT/VE appliance. Have the appliance sit in its own small datastore. Make sure you do not protect this volume via ASM. Dell warns you about this. For environments where replication needs to occur, set up a second HIT/VE appliance at the remote site. The same rules apply there.

7. Log files on the appliance are accessible via Samba. I didn’t discover this until I was working through the configuration and some issues I was running into. What a pleasant way to to pull the log data off of the appliance. Nice work!

Tips for ASM/VE

8. Just as I learned and recommended in 2.x, the most important suggestion I have to successfully utilizing ASM/VE in your environment is to arrange vCenter folders to represent the contents of your datastores. Include in the name some indicated of the referencing volume/datastore (seen in the screen capture below, where “103” refers to a datastore called VMFS103. The reason for this is so that you can keep your smartcopy snapshots straight during creation. If you don’t do this, when you make a SmartCopy of a folder containing VM’s that reside in multiple datastores, you will see SAN snapshots in each one of those volumes, but they didn’t necessarily capture all of the data correctly. You will get really confused, and confusion is not what you need when understanding the what and how of recovering systems or data.

9. Schedule or manually create SmartCopy Snapshots by Folder. Schedule or manually create SmartCopy Replicas by dataStore. Replicas cannot be created by vCenter Folder. This strategy has been the most effective for me, but if you didn’t feel like re-arranging your folders in vCenter, you could schedule or manually create SmartCopy Snapshots by datastore as well.

10. Do not schedule or create Smartcopies by individual machine. This will get confusing (see above), and may interfere with your planning of snapshot retention periods. If you want to protect a system against some short term step (e.g. installing service pack, etc.), just use a hypervisor snapshot, and remove when complete.

11. ASM/VE 2.x was limited to making smart copies of VM’s that had vmdk files all in the same location. 3.x does not have this limitation. This offers up quite a bit of flexibility if you have VM’s with independent vmdks in other datastores.

12. Test, and document. Create a couple of small volumes, large enough to hold 2 test VM’s in each. Make a SmartCopy of the VMWare folder where those VM’s reside. Do a few more SmartCopies, then attempt a restore. Test. Add a vmdk in another datastore to one of the VM’s then test again. This is the best way to not only understand what is going on, but to have no surprises or trepidation when you have to do it for real. It is especially important to understand how the other VM’s in the same datastore will behave, and how VM’s with multiple vmdks in different datastores will act, as well as what a “restore by rollback” is. And while you’re add it, make a OneNote or Word document outlining the exact steps for recovery, and what to expect. Create one for local SmartCopies, and another for remote replicas. This avoids not thinking clearly under the heat of the moment. Your goal is to make things better by a restore, not worse. Oh, and if you can’t find the time to document the process, don’t worry, I’m sure the next guy who replaces you will find the time.

13. Set snapshot and replication retention numbers in ASM/VE. This much needed feature was added to the 3.0 version. Tweak each snapshot reserve to a level that you feel comfortable with, and that also matches up against your overall protection policies. There will be some tuning for each volume so that you can offer the protection times needed, without allocating too much space to snapshot reserves. ASM will only be able to manage the snapshots that it creates, so if you have some older snaps of a particular datastore, you may need to do a little cleanup work.

14. Watch the frequency!!! The only thing worse than not having a backup of a system or data, is to have several bad copies of it, and to realize that the last good one just purged itself out. A great example of this is something going wrong on a Friday night. You maybe don’t notice it mid-day on Monday. But your high frequency SmartCopies only had room for two days worth of changed data. With ASM/VE, I tend to prefer very modest frequencies. Once a day is fine with me on many of my systems. Most of the others that I like to have more frequent SmartCopies of have the actual data on guest attached volumes. Early on in my use, I had a series of events that were almost disastrous, all because I was overzealous on the frequency, but not mindful enough of the retention. Don’t be a victim of the ease at cranking up the frequency at the expense of retention. This is something you’ll never find in a deployment or operations guide, and applies to all aspects of data protection.

15. If you are creating SmartCopy snapshots and SmartCopy replicas, use your scheduling an opportunity to shrink the window of vulnerability. Instead of running a replica right after a snapshot each once a day, right after eachother, split the difference so that the replica runs in between the the last SmartCopy snapshot, and the next one.

16. Keep your SmartCopy and replica frequencies and scheduling as simple as possible. If you can’t understand it, who will? Perhaps start with a frequency rate of just once a day for all of your datastores, then go from there. You might find a frequency such as once a day might work for 99% of your systems. I’ve found that for most of my data that I need to protect at more frequent intervals, those are on guest attached volumes anyway, and I schedule those up via ASM/ME to meet my needs.

17. For SmartCopy snapshots, I tend to schedule them so that there is only one job on one datastore at a time. With the next one scheduled say 5 minutes afterward. For SmartCopy replicas, if you choose to use free pool space, instead of replica reserve (as I do), you might want to offset those more, so that the replica has time to fully complete in order for the space held by the invisible local replica can be reclaimed for the next job. Generally this isn’t too much of an issue, unless you are really tight on space.

18. The SmartCopy defaults have been changed a bit since ASM/VE 2.x. No need to tick any of the checkboxes such as “Perform virtual machine memory dump” and “Set created PS Series snapshots online” In fact, I would untick the “Included PS Series volumes access by guest iSCSI initiators” More info on why below.

19. ASM/VE still gives you the option to snapshot volumes that are attached to that VM via guest iSCSI initiators. In general, don’t do it. Why? If you chose to use this option for Microsoft based VM’s, it would indeed make a snapshot, giving you the impression that all is well, but these would not be coordinated with the internal VSS writer inside the VM, so they are not truly application consistent snapshots of the guest volumes. Sure, they might work, but they might not. They may also interfere with your retention policies in ASM/ME. Do you really want to take that chance with your Exchange or SQL databases, or flat file storage? If you think flat file storage isn’t important to quiesce, remember that source code control systems like Subversion typically use file systems, and not a database. It is my position that the only time you should use this option is if you are protecting a Linux VM with guest attached volumes. Linux has no equivalent to VSS, so you get a free pass on using this option. However, because this option is a per-job definition, you’ll want to separate Windows based VM’s with guest volumes from Linux based VM’s with guest volumes. If you wanted to avoid that, you could just rely on on a crash consistent copy of that linux guest attached volume via a scheduled snapshot in the Group Manager GUI. So the moral of the story is this. To protect your guest attached volumes in VM’s running Windows, rely entirely on ASM/ME to create a SmartCopy SAN snapshot of your guest attached volumes.

20. If you need to cherry-pick a file off of a snapshot, or look at an old registry setting, consider restoring or cloning to another volume, and make sure that the restored VM does not have any direct access to the same network that the primary system is running. Having a special portgroup in vCenter that is just for this purpose works nice. Many times this method can be the least disruptive to your environment.

21. I still like to have my DC’s in individual datastores, on their own, and create SmartCopy schedules that do not occur simultaneously. I found that in practice, our very sensitive automated code compiling system which has dozens (if not hundreds) of active ssh sessions ran into less interference this way compared to when I initially had them in one datastore, or intertwined in datastores with other VMs. Depending on the number of DCs you have, you might be able to group a couple together, with perhaps splitting off the DC running the PDC emulator role into a separate datastore. Beware that the SmartCopy for your DC should just be considered as a way to protect the system, not AD. More info on my post about protecting Active Directory here.

Tips for DataStore Manager

22. The Datastore Manager in vCenter is one of my favorite new ways to view my PS Group. Not only do you get a quick check on how my datastores look (limiting the view to just VMFS volumes), but it also shows which volumes have replicas in flight. It has quickly become one of my most used items in vCenter.

23. Use the ACL policies feature in Datastore Manager. With the new integration between vCenter and the Group Manager, you can easily create volumes. The ACL policies feature in the HITVE is a way for you to save a predetermined set of ACL’s for your hosts (CHAP, IP, or IQN). While I personally prefer using IQN’s, any combination of the three will work. Having an ACL policy is a great way to provision the access to a volume quickly. If you are using manually configured multi-pathing, it is important to note that creating datastores by this way will using a default pathing of “VMWare fixed.” You will need to manually change that to “VMWare Round Robin.” I am told that if you are using the EqualLogic Multi-pathing Extension Module (MEM), that this will be set to the proper setting. I don’t know that for sure because MEM hasn’t been released for vSphere 5.0 as of this writing.

24. VMFS5 offers some really great features, but many of them are only available if they were natively created (not upgraded from VMFS3). If you choose to recreate them by doing a little juggling with Storage vMotion (as I am), remember that this might wreak havoc on your replication, as you will need to re-seed the volumes. But if you can, you are exposed to many great features of VMFS5. You might also use this as an opportunity to revisit your datastores and re-arrange if necessary.

25. If you are going to redo some of your volumes from scratch (to take full advantage of VMFS5), if they are replicated, redo the volumes with the highest change rate first. They’re already pushing a lot of data through your pipe, so you might as well get them taken care of first. And who knows, your replicas might be improved with the new volume.

Hopefully this gives you a few good ideas for your own environment. Thanks for reading.

It is never any fun getting left behind in IT. Major upgrades every year or two might not be a big deal if you only had to deal with one piece of software, but take a look at most software inventories, and you’ll see possibly dozens of enterprise level applications and supporting services that all contribute to the chaos. It can be overwhelming for just one person to handle. While you may be perfectly justified in holding off on specific upgrades, there still seems to be a bit of guilt around doing so. You might have ample business and technical factors to support such decisions, and a well crafted message providing clear reasons to stakeholders. The business and political pressures ultimately win out, and you find yourself addressing the more customer/user facing application upgrades before the behind-the-scenes tools that power it all.

That is pretty much where I stood with my virtualized infrastructure. My last major upgrade was to vSphere 4.0. Sure, I had visions of keeping up with every update and patch, but a little time passed, and several hundred distractions later, I found myself left behind. When vSphere 4.1 came out, I also had every intention of upgrading. However, I was one of the legions of users who had a vCenter server running on a 32bit OS, and that complicated matters a little bit. I looked at the various publications and posts on the upgrade paths and experiences. Nothing seemed quite as easy as I was hoping for, so I did what came easiest to my already packed schedule; nothing. I wondered just how many Administrators found themselves in the same predicament; not touching an aging, albeit perfectly fine running system.

My ESX 4.0 cluster served my organization well, but times change, and so do needs. A few things come up to kick-start the desire to upgrade.

I needed to deploy a pilot VDI project, fast. (more about this in later posts)
We were a victim of our own success with virtualization, and I needed to squeeze even more power and efficiency out of our investment in our infrastructure.

Both are pretty good reasons to upgrade, and while I would have loved to do my typical due diligence on every possible option, I needed a fast track. My move to vSphere 5.0 was really just a prerequisite of sorts to my work with VDI.

But how should I go about an upgrade?

Do I update my 4.0 hosts to the latest update that would be eligible for an upgrade path to 5.0, and if so, how much work would that be? Should I transition to a new vCenter server, migrating the database, then run a mixed environment of ESX hosts running with different versions? What sort of problems would that introduce? After conferring with a trusted colleague of mine who always seems to have pragmatic sensibilities when it comes to virtualization, I decided which option was going to be the best for me. I opted not to do any upgrade, and simply transition to a pristine new cluster. It looked something like this:

Take a host (either new, or by removing an existing one from the cluster), and build it up with ESXi 5.0.
Build up a new 64bit VM for running a brand new vCenter, and configure as needed.
Remove one VM at a time from the old cluster by powering them down, remove from inventory, add to the new cluster.
Once enough VM’s have been removed, take another host, remove from the old cluster, rebuild as ESXi 5.0, and add to the new cluster.
Repeat until finished.

For me, the decision to start from scratch won out. Why?

I could build up a pristine vCenter server, with a database that wasn’t going to carry over any unwanted artifacts of my previous installation.
I could easily set up the new vCenter to emulate my old settings. Folders, EVC settings, resource pools, etc.
I could transition or build up my supporting VM’s or appliances to my new infrastructure to make sure they worked before committing to the transition.
I could afford a simple restart of each VM as I transitioned it to a new cluster. I used this as an opportunity to update the VMware Tools when added to the new inventory.
I was willing to give up historical data in my old vSphere 4.0 cluster for the sake of simplicity of the plan and cleanliness of the configuration.
Predictability. I didn’t have to read a single white paper or discussion thread on database migrations or troubles with DSNs.
I have a well documented ESX host configuration that is not terribly complex, and easy to recreate across 6 hosts.
I just happened to have purchased an additional blade and license of ESX, so it was an ideal time to introduce it to my environment.
I could get my entire setup working, then get my licensing figured out after it’s all complete.

You’ll notice that one option similar to this approach would have been to simply remove a host of running VM’s out of the existing cluster, and add it to the new cluster. This may have been just as good of a plan, as it would have avoided the need to manually shut down and remove each VM one at a time during the transition. However, I would have needed to run a mix of ESX 4.0 and 5.0 hosts in the new cluster. I didn’t want to carry anything over from the old setup. I would have needed to upgrade or rebuild the host anyway, and I had to restart each VM to make sure it was running the latest tools. If for nothing other than clarity of mind, my approach seemed best for me.

Prior to beginning the transition, I needed to update my Dell EqualLogic firmware to 5.1.2. A collection of very nice improvements made this a nice upgrade, but a requirement for what I wanted to do. While the upgrade itself went smoothly, it did re-introduce an issue or two. The folks at Dell EqualLogic are aware of this, and are working to address it hopefully in their next release. The combination of the firmware upgrade, and vSphere 5 allowed me to use the latest and greatest tools from EqualLogic, primarily the Host Integration Tools VMWare Edition (HIT/VE) and the storage integration in vSphere thanks to VASA. Although, as of this writing, EqualLogic does not have a full production release of their MultiPathing Extension Module (MEM) for vSphere 5.0. The EPA version was just released, but I’ll probably wait for the full release of MEM to come out before I apply it to the hosts in the cluster.

While I was eager to finish the transition, I didn’t want to prematurely create any problems. I took a page from my own lessons learned during my upgrade to ESX 4.0, and exercised some restraint when it came to updating my Virtual Hardware for each VM to version 8. My last update of Virtual Hardware levels in each VM caused some unexpected results, as I shared in “Side effects of upgrading VM’s to Virtual Hardware 7 in vSphere” Apparently, I wasn’t the only one who ran into issues, because that post has statistically been my all time most popular post. The abilities of Virtual Hardware 8 powered VMs are pretty neat, but I’m in no rush to make any virtual hardware changes to some of my key production systems, especially those noted.

So, how did it work out? The actual process completed without a single major hang-up, and am thrilled with the result. The irony here is that even though vSphere provides most of the intelligence behind my entire infrastructure, and does things that are mind bogglingly cool, it was so much easier to upgrade than say, SharePoint, AD, Exchange, or some other enterprise software. Great technologies are great because they work like you think they should. No exception here. If you are considering a move to vSphere 5.0, and are a little behind on your old infrastructure, this upgrade approach might be worth considering.

Now, onto that little VDI project…

Resources

A great resource on setting up SQL 2008 R2 for vCenter
How to Install Microsoft SQL Server 2008 R2 for VMware vCenter 5

Installing vCenter 5 Best Practices
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2003790

A little VMFS 5.0 info
http://www.yellow-bricks.com/2011/07/13/vsphere-5-0-what-has-changed-for-vmfs/

Information on the EqualLogic Multipathing Extension Module (MEM), and if you are an EqualLogic customer, why you should care.
https://whiteboardninja.wordpress.com/2011/02/01/equallogic-mem-and-vstorage-apis/

Month: November 2011

Tips for using Dell’s updated EqualLogic Host Integration Tools – VMware Edition (HIT/VE)

Upgrading to vSphere 5.0 by starting from scratch. …well, sort of.