Ever since my series of posts on replication with a Dell EqualLogic SAN, I’ve had a lot of interest from other users wondering how I actually use the built-in tools provided by Dell EqualLogic to protect my environment. This is one of the reasons why I’ve written so much about ASM/ME, ASM/LE, and SANHQ. Well, it’s been a while since I’ve touched on any information about ASM/VE, and since I’ve updated my infrastructure to vSphere 5.0 and the HIT/VE 3.1, I thought I’d share a few pointers that have helped me work with this tool in my environment.
The first generation of HIT/VE was really nothing more than a single tool referred to as “Auto-Snapshot Manager / VMware Edition” or ASM/VE. A lot has changed, as it is now part of a larger suite of VMware-centric tools from EqualLogic called the Host Integration Tools / VMware Edition or HIT/VE. This consists of the following; EqualLogic Auto-Snapshot Manager, EqualLogic Datastore Manager, and the EqualLogic Virtual Desktop Deployment Utility. HIT/VE is one of three Host Integration toolsets. The others being HIT/ME and HIT/LE for Microsoft and Linux respectively.
Ever since HIT/VE 3.0, Dell EqualLogic thankfully transitioned toward and appliance/plug-in model. This reduced overhead, complexity, and removed some of the quirks with the older implementations. Because I had been lagging behind in updating vSphere, I was still using 2.x up until recently, and skipped right over 3.0 to 3.1. Surprisingly, many of the same practices that have served me well with the older version adopt quite well to the new version.
Let me preface that these are just my suggestions off of personal use with all versions of the HIT over the past 3 years. Just as with any solution, there are a number of different ways to achieve the same result. The information provided may or may not align with best practices from Dell, or your own practices. But the tips I provide have stood up to the rigors of a production environment, and have actually worked in real recovery scenarios. Whatever decisions you make should compliment your larger protection strategies, as this is just one piece of the puzzle.
Tips for Configuring and working with the HIT/VE appliance
1. The initial configuration will ask for registration in vCenter (configuration item #8 on the appliance). You may only register one HIT/VE appliance in vCenter.
2. The HIT/VE appliance was designed to integrate with vCenter. But it also offers the flexibility of access. After the initial configuration, you can verify and modify settings in the respective ASM appliances by browsing directly to their IP address, FQDN, or DNS alias name. You may type in: http://%5BapplianceFQDN%5D or for the Auto-Snapshot Manager, type in http://%5BapplianceFQDN%5D/vmsnaptool.html
3. Configuration of the storage management network on the appliance is optional, and depending on your topology, may not be needed.
4. When setting up replication partners, ASM will ask for a “Server URL” This implies you should enter an “http://” or “https://” Just enter in the IP address or FQDN without the http:// prefix. A true URL as it implies will not work.
5. After you have configured your HIT/VE appliances, run back through and double check the settings. I had two of them mysteriously reset some DNS configuration during the initial deployment. It’s been fine since that time. It might have been my mistake (twice), but it might not.
6. For just regular (local) SmartCopies, create one HIT/VE appliance. Have the appliance sit in its own small datastore. Make sure you do not protect this volume via ASM. Dell warns you about this. For environments where replication needs to occur, set up a second HIT/VE appliance at the remote site. The same rules apply there.
7. Log files on the appliance are accessible via Samba. I didn’t discover this until I was working through the configuration and some issues I was running into. What a pleasant way to to pull the log data off of the appliance. Nice work!
Tips for ASM/VE
8. Just as I learned and recommended in 2.x, the most important suggestion I have to successfully utilizing ASM/VE in your environment is to arrange vCenter folders to represent the contents of your datastores. Include in the name some indicated of the referencing volume/datastore (seen in the screen capture below, where “103” refers to a datastore called VMFS103. The reason for this is so that you can keep your smartcopy snapshots straight during creation. If you don’t do this, when you make a SmartCopy of a folder containing VM’s that reside in multiple datastores, you will see SAN snapshots in each one of those volumes, but they didn’t necessarily capture all of the data correctly. You will get really confused, and confusion is not what you need when understanding the what and how of recovering systems or data.
9. Schedule or manually create SmartCopy Snapshots by Folder. Schedule or manually create SmartCopy Replicas by dataStore. Replicas cannot be created by vCenter Folder. This strategy has been the most effective for me, but if you didn’t feel like re-arranging your folders in vCenter, you could schedule or manually create SmartCopy Snapshots by datastore as well.
10. Do not schedule or create Smartcopies by individual machine. This will get confusing (see above), and may interfere with your planning of snapshot retention periods. If you want to protect a system against some short term step (e.g. installing service pack, etc.), just use a hypervisor snapshot, and remove when complete.
11. ASM/VE 2.x was limited to making smart copies of VM’s that had vmdk files all in the same location. 3.x does not have this limitation. This offers up quite a bit of flexibility if you have VM’s with independent vmdks in other datastores.
12. Test, and document. Create a couple of small volumes, large enough to hold 2 test VM’s in each. Make a SmartCopy of the VMWare folder where those VM’s reside. Do a few more SmartCopies, then attempt a restore. Test. Add a vmdk in another datastore to one of the VM’s then test again. This is the best way to not only understand what is going on, but to have no surprises or trepidation when you have to do it for real. It is especially important to understand how the other VM’s in the same datastore will behave, and how VM’s with multiple vmdks in different datastores will act, as well as what a “restore by rollback” is. And while you’re add it, make a OneNote or Word document outlining the exact steps for recovery, and what to expect. Create one for local SmartCopies, and another for remote replicas. This avoids not thinking clearly under the heat of the moment. Your goal is to make things better by a restore, not worse. Oh, and if you can’t find the time to document the process, don’t worry, I’m sure the next guy who replaces you will find the time.
13. Set snapshot and replication retention numbers in ASM/VE. This much needed feature was added to the 3.0 version. Tweak each snapshot reserve to a level that you feel comfortable with, and that also matches up against your overall protection policies. There will be some tuning for each volume so that you can offer the protection times needed, without allocating too much space to snapshot reserves. ASM will only be able to manage the snapshots that it creates, so if you have some older snaps of a particular datastore, you may need to do a little cleanup work.
14. Watch the frequency!!! The only thing worse than not having a backup of a system or data, is to have several bad copies of it, and to realize that the last good one just purged itself out. A great example of this is something going wrong on a Friday night. You maybe don’t notice it mid-day on Monday. But your high frequency SmartCopies only had room for two days worth of changed data. With ASM/VE, I tend to prefer very modest frequencies. Once a day is fine with me on many of my systems. Most of the others that I like to have more frequent SmartCopies of have the actual data on guest attached volumes. Early on in my use, I had a series of events that were almost disastrous, all because I was overzealous on the frequency, but not mindful enough of the retention. Don’t be a victim of the ease at cranking up the frequency at the expense of retention. This is something you’ll never find in a deployment or operations guide, and applies to all aspects of data protection.
15. If you are creating SmartCopy snapshots and SmartCopy replicas, use your scheduling an opportunity to shrink the window of vulnerability. Instead of running a replica right after a snapshot each once a day, right after eachother, split the difference so that the replica runs in between the the last SmartCopy snapshot, and the next one.
16. Keep your SmartCopy and replica frequencies and scheduling as simple as possible. If you can’t understand it, who will? Perhaps start with a frequency rate of just once a day for all of your datastores, then go from there. You might find a frequency such as once a day might work for 99% of your systems. I’ve found that for most of my data that I need to protect at more frequent intervals, those are on guest attached volumes anyway, and I schedule those up via ASM/ME to meet my needs.
17. For SmartCopy snapshots, I tend to schedule them so that there is only one job on one datastore at a time. With the next one scheduled say 5 minutes afterward. For SmartCopy replicas, if you choose to use free pool space, instead of replica reserve (as I do), you might want to offset those more, so that the replica has time to fully complete in order for the space held by the invisible local replica can be reclaimed for the next job. Generally this isn’t too much of an issue, unless you are really tight on space.
18. The SmartCopy defaults have been changed a bit since ASM/VE 2.x. No need to tick any of the checkboxes such as “Perform virtual machine memory dump” and “Set created PS Series snapshots online” In fact, I would untick the “Included PS Series volumes access by guest iSCSI initiators” More info on why below.
19. ASM/VE still gives you the option to snapshot volumes that are attached to that VM via guest iSCSI initiators. In general, don’t do it. Why? If you chose to use this option for Microsoft based VM’s, it would indeed make a snapshot, giving you the impression that all is well, but these would not be coordinated with the internal VSS writer inside the VM, so they are not truly application consistent snapshots of the guest volumes. Sure, they might work, but they might not. They may also interfere with your retention policies in ASM/ME. Do you really want to take that chance with your Exchange or SQL databases, or flat file storage? If you think flat file storage isn’t important to quiesce, remember that source code control systems like Subversion typically use file systems, and not a database. It is my position that the only time you should use this option is if you are protecting a Linux VM with guest attached volumes. Linux has no equivalent to VSS, so you get a free pass on using this option. However, because this option is a per-job definition, you’ll want to separate Windows based VM’s with guest volumes from Linux based VM’s with guest volumes. If you wanted to avoid that, you could just rely on on a crash consistent copy of that linux guest attached volume via a scheduled snapshot in the Group Manager GUI. So the moral of the story is this. To protect your guest attached volumes in VM’s running Windows, rely entirely on ASM/ME to create a SmartCopy SAN snapshot of your guest attached volumes.
20. If you need to cherry-pick a file off of a snapshot, or look at an old registry setting, consider restoring or cloning to another volume, and make sure that the restored VM does not have any direct access to the same network that the primary system is running. Having a special portgroup in vCenter that is just for this purpose works nice. Many times this method can be the least disruptive to your environment.
21. I still like to have my DC’s in individual datastores, on their own, and create SmartCopy schedules that do not occur simultaneously. I found that in practice, our very sensitive automated code compiling system which has dozens (if not hundreds) of active ssh sessions ran into less interference this way compared to when I initially had them in one datastore, or intertwined in datastores with other VMs. Depending on the number of DCs you have, you might be able to group a couple together, with perhaps splitting off the DC running the PDC emulator role into a separate datastore. Beware that the SmartCopy for your DC should just be considered as a way to protect the system, not AD. More info on my post about protecting Active Directory here.
Tips for DataStore Manager
22. The Datastore Manager in vCenter is one of my favorite new ways to view my PS Group. Not only do you get a quick check on how my datastores look (limiting the view to just VMFS volumes), but it also shows which volumes have replicas in flight. It has quickly become one of my most used items in vCenter.
23. Use the ACL policies feature in Datastore Manager. With the new integration between vCenter and the Group Manager, you can easily create volumes. The ACL policies feature in the HITVE is a way for you to save a predetermined set of ACL’s for your hosts (CHAP, IP, or IQN). While I personally prefer using IQN’s, any combination of the three will work. Having an ACL policy is a great way to provision the access to a volume quickly. If you are using manually configured multi-pathing, it is important to note that creating datastores by this way will using a default pathing of “VMWare fixed.” You will need to manually change that to “VMWare Round Robin.” I am told that if you are using the EqualLogic Multi-pathing Extension Module (MEM), that this will be set to the proper setting. I don’t know that for sure because MEM hasn’t been released for vSphere 5.0 as of this writing.
24. VMFS5 offers some really great features, but many of them are only available if they were natively created (not upgraded from VMFS3). If you choose to recreate them by doing a little juggling with Storage vMotion (as I am), remember that this might wreak havoc on your replication, as you will need to re-seed the volumes. But if you can, you are exposed to many great features of VMFS5. You might also use this as an opportunity to revisit your datastores and re-arrange if necessary.
25. If you are going to redo some of your volumes from scratch (to take full advantage of VMFS5), if they are replicated, redo the volumes with the highest change rate first. They’re already pushing a lot of data through your pipe, so you might as well get them taken care of first. And who knows, your replicas might be improved with the new volume.
Hopefully this gives you a few good ideas for your own environment. Thanks for reading.