Tips for using Dell’s updated EqualLogic Host Integration Tools – VMware Edition (HIT/VE)

Ever since my series of posts on replication with a Dell EqualLogic SAN, I’ve had a lot of interest from other users wondering how I actually use the built-in tools provided by Dell EqualLogic to protect my environment.  This is one of the reasons why I’ve written so much about ASM/ME, ASM/LE, and SANHQ.  Well, it’s been a while since I’ve touched on any information about ASM/VE, and since I’ve updated my infrastructure to vSphere 5.0 and the HIT/VE 3.1, I thought I’d share a few pointers that have helped me work with this tool in my environment.

The first generation of HIT/VE was really nothing more than a single tool referred to as “Auto-Snapshot Manager / VMware Edition” or ASM/VE.  A lot has changed, as it is now part of a larger suite of VMware-centric tools from EqualLogic called the Host Integration Tools / VMware Edition or HIT/VE.  This consists of the following; EqualLogic Auto-Snapshot Manager, EqualLogic Datastore Manager, and the EqualLogic Virtual Desktop Deployment Utility.  HIT/VE is one of three Host Integration toolsets.  The others being HIT/ME and HIT/LE for Microsoft and Linux respectively.

Ever since HIT/VE 3.0, Dell EqualLogic thankfully transitioned toward and appliance/plug-in model.  This reduced overhead, complexity, and removed some of the quirks with the older implementations.  Because I had been lagging behind in updating vSphere, I was still using 2.x up until recently, and skipped right over 3.0 to 3.1.  Surprisingly, many of the same practices that have served me well with the older version adopt quite well to the new version.

Let me preface that these are just my suggestions off of personal use with all versions of the HIT over the past 3 years.  Just as with any solution, there are a number of different ways to achieve the same result.  The information provided may or may not align with best practices from Dell, or your own practices.  But the tips I provide have stood up to the rigors of a production environment, and have actually worked in real recovery scenarios.  Whatever decisions you make should compliment your larger protection strategies, as this is just one piece of the puzzle.

Tips for Configuring and working with the  HIT/VE appliance

1.  The initial configuration will ask for registration in vCenter (configuration item #8 on the appliance).  You may only register one HIT/VE appliance in vCenter.

2.  The HIT/VE appliance was designed to integrate with vCenter.  But it also offers the flexibility of access.  After the initial configuration, you can verify and modify settings in the respective ASM appliances by browsing directly to their IP address, FQDN, or DNS alias name.  You may type in: http://%5BapplianceFQDN%5D or for the Auto-Snapshot Manager, type in http://%5BapplianceFQDN%5D/vmsnaptool.html

3.  Configuration of the storage management network on the appliance is optional, and depending on your topology, may not be needed.

4.  When setting up replication partners, ASM will ask for a “Server URL”  This implies you should enter an “http://” or “https://”  Just enter in the IP address or FQDN without the http:// prefix.  A true URL as it implies will not work.

5.  After you have configured your HIT/VE appliances, run back through and double check the settings.  I had two of them mysteriously reset some DNS configuration during the initial deployment.  It’s been fine since that time.  It might have been my mistake (twice), but it might not.

6. For just regular (local) SmartCopies, create one HIT/VE appliance.  Have the appliance sit in its own small datastore.  Make sure you do not protect this volume via ASM. Dell warns you about this.  For environments where replication needs to occur, set up a second HIT/VE appliance at the remote site.  The same rules apply there.

7.  Log files on the appliance are accessible via Samba.  I didn’t discover this until I was working through the configuration and some issues I was running into.  What a pleasant way to to pull the log data off of the appliance.  Nice work!

Tips for ASM/VE

8.  Just as I learned and recommended in 2.x, the most important suggestion I have to successfully utilizing ASM/VE in your environment is to arrange vCenter folders to represent the contents of your datastores.  Include in the name some indicated of the referencing volume/datastore (seen in the screen capture below, where “103” refers to a datastore called VMFS103.  The reason for this is so that you can keep your smartcopy snapshots straight during creation.  If you don’t do this, when you make a SmartCopy of a folder containing VM’s that reside in multiple datastores, you will see SAN snapshots in each one of those volumes, but they didn’t necessarily capture all of the data correctly.  You will get really confused, and confusion is not what you need when understanding the what and how of recovering systems or data.

image

9.  Schedule or manually create SmartCopy Snapshots by Folder.  Schedule or manually create SmartCopy Replicas by dataStore.  Replicas cannot be created by vCenter Folder.  This strategy has been the most effective for me, but if you didn’t feel like re-arranging your folders in vCenter, you could schedule or manually create SmartCopy Snapshots by datastore as well.

10.  Do not schedule or create Smartcopies by individual machine.  This will get confusing (see above), and may interfere with your planning of snapshot retention periods.  If you want to protect a system against some short term step (e.g. installing service pack, etc.), just use a hypervisor snapshot, and remove when complete.

11.  ASM/VE 2.x was limited to making smart copies of VM’s that had vmdk files all in the same location.  3.x does not have this limitation.  This offers up quite a bit of flexibility if you have VM’s with independent vmdks in other datastores.

12. Test, and document. Create a couple of small volumes, large enough to hold 2 test VM’s in each.  Make a SmartCopy of the VMWare folder where those VM’s reside.  Do a few more SmartCopies, then attempt a restore.  Test.  Add a vmdk in another datastore to one of the VM’s then test again.  This is the best way to not only understand what is going on, but to have no surprises or trepidation when you have to do it for real.  It is especially important to understand how the other VM’s in the same datastore will behave, and how VM’s with multiple vmdks in different datastores will act, as well as what a “restore by rollback” is.  And while you’re add it, make a OneNote or Word document outlining the exact steps for recovery, and what to expect.  Create one for local SmartCopies, and another for remote replicas.  This avoids not thinking clearly under the heat of the moment.  Your goal is to make things better by a restore, not worse. Oh, and if you can’t find the time to document the process, don’t worry, I’m sure the next guy who replaces you will find the time.

13.  Set snapshot and replication retention numbers in ASM/VE.  This much needed feature was added to the 3.0 version.  Tweak each snapshot reserve to a level that you feel comfortable with, and that also matches up against your overall protection policies.  There will be some tuning for each volume so that you can offer the protection times needed, without allocating too much space to snapshot reserves.  ASM will only be able to manage the snapshots that it creates, so if you have some older snaps of a particular datastore, you may need to do a little cleanup work.

14.  Watch the frequency!!!  The only thing worse than not having a backup of a system or data, is to have several bad copies of it, and to realize that the last good one just purged itself out.  A great example of this is something going wrong on a Friday night.  You maybe don’t notice it mid-day on Monday.  But your high frequency SmartCopies only had room for two days worth of changed data.  With ASM/VE, I tend to prefer very modest frequencies.  Once a day is fine with me on many of my systems.  Most of the others that I like to have more frequent SmartCopies of have the actual data on guest attached volumes.  Early on in my use, I had a series of events that were almost disastrous, all because I was overzealous on the frequency, but not mindful enough of the retention.  Don’t be a victim of the ease at cranking up the frequency at the expense of retention.  This is something you’ll never find in a deployment or operations guide, and applies to all aspects of data protection.

15.  If you are creating SmartCopy snapshots and SmartCopy replicas, use your scheduling an opportunity to shrink the window of vulnerability.  Instead of running a replica right after a snapshot each once a day, right after eachother, split the difference so that the replica runs in between the the last SmartCopy snapshot, and the next one.

16.  Keep your SmartCopy and replica frequencies and scheduling as simple as possible.  If you can’t understand it, who will?  Perhaps start with a frequency rate of just once a day for all of your datastores, then go from there.  You might find a frequency such as once a day might work for 99% of your systems.  I’ve found that for most of my data that I need to protect at more frequent intervals, those are on guest attached volumes anyway, and I schedule those up via ASM/ME to meet my needs.

17.  For SmartCopy snapshots, I tend to schedule them so that there is only one job on one datastore at a time.  With the next one scheduled say 5 minutes afterward.  For SmartCopy replicas, if you choose to use free pool space, instead of replica reserve (as I do), you might want to offset those more, so that the replica has time to fully complete in order for the space held by the invisible local replica can be reclaimed for the next job.  Generally this isn’t too much of an issue, unless you are really tight on space.

18.  The SmartCopy defaults have been changed a bit since ASM/VE 2.x. No need to tick any of the checkboxes such as “Perform virtual machine memory dump” and “Set created PS Series snapshots online” In fact, I would untick the “Included PS Series volumes access by guest iSCSI initiators” More info on why below.

19.  ASM/VE still gives you the option to snapshot volumes that are attached to that VM via guest iSCSI initiators.  In general, don’t do it.  Why? If you chose to use this option for Microsoft based VM’s, it would indeed make a snapshot, giving you the impression that all is well, but these would not be coordinated with the internal VSS writer inside the VM, so they are not truly application consistent snapshots of the guest volumes.  Sure, they might work, but they might not.  They may also interfere with your retention policies in ASM/ME.  Do you really want to take that chance with your Exchange or SQL databases, or flat file storage?  If you think flat file storage isn’t important to quiesce, remember that source code control systems like Subversion typically use file systems, and not a database.  It is my position that the only time you should use this option is if you are protecting a Linux VM with guest attached volumes.  Linux has no equivalent to VSS, so you get a free pass on using this option.  However, because this option is a per-job definition, you’ll want to separate Windows based VM’s with guest volumes from Linux based VM’s with guest volumes.  If you wanted to avoid that, you could just rely on on a crash consistent copy of that linux guest attached volume via a scheduled snapshot in the Group Manager GUI.  So the moral of the story is this.  To protect your guest attached volumes in VM’s running Windows, rely entirely on ASM/ME to create a SmartCopy SAN snapshot of your guest attached volumes.

20.  If you need to cherry-pick a file off of a snapshot, or look at an old registry setting, consider restoring or cloning to another volume, and make sure that the restored VM does not have any direct access to the same network that the primary system is running.  Having a special portgroup in vCenter that is just for this purpose works nice.  Many times this method can be the least disruptive to your environment.

21.  I still like to have my DC’s in individual datastores, on their own, and create SmartCopy schedules that do not occur simultaneously.  I found that in practice, our very sensitive automated code compiling system which has dozens (if not hundreds) of active ssh sessions ran into less interference this way compared to when I initially had them in one datastore, or intertwined in datastores with other VMs.  Depending on the number of DCs you have, you might be able to group a couple together, with perhaps splitting off the DC running the PDC emulator role into a separate datastore.  Beware that the SmartCopy for your DC should just be considered as a way to protect the system, not AD.  More info on my post about protecting Active Directory here.

Tips for DataStore Manager

22.  The Datastore Manager in vCenter is one of my favorite new ways to view my PS Group.  Not only do you get a quick check on how my datastores look (limiting the view to just VMFS volumes), but it also shows which volumes have replicas in flight.  It has quickly become one of my most used items in vCenter.

23.  Use the ACL policies feature in Datastore Manager. With the new integration between vCenter and the Group Manager, you can easily create volumes. The ACL policies feature in the HITVE is a way for you to save a predetermined set of ACL’s for your hosts (CHAP, IP, or IQN).  While I personally prefer using IQN’s, any combination of the three will work.  Having an ACL policy is a great way to provision the access to a volume quickly.  If you are using manually configured multi-pathing, it is important to note that creating datastores by this way will using a default pathing of “VMWare fixed.”  You will need to manually change that to “VMWare Round Robin.”  I am told that if you are using the EqualLogic Multi-pathing Extension Module (MEM), that this will be set to the proper setting.  I don’t know that for sure because MEM hasn’t been released for vSphere 5.0 as of this writing.

24.  VMFS5 offers some really great features, but many of them are only available if they were natively created (not upgraded from VMFS3).  If you choose to recreate them by doing a little juggling with Storage vMotion (as I am), remember that this might wreak havoc on your replication, as you will need to re-seed the volumes.  But if you can, you are exposed to many great features of VMFS5.  You might also use this as an opportunity to revisit your datastores and re-arrange if necessary.

25.  If you are going to redo some of your volumes from scratch (to take full advantage of VMFS5), if they are replicated, redo the volumes with the highest change rate first.  They’re already pushing a lot of data through your pipe, so you might as well get them taken care of first.  And who knows, your replicas might be improved with the new volume.

Hopefully this gives you a few good ideas for your own environment.  Thanks for reading.

Using the Dell EqualLogic HIT for Linux

 

I’ve been a big fan of Dell EqualLogic Host Integration Tools for Microsoft (HIT/ME), so I was looking forward to seeing how the newly released HIT for Linux (HIT/LE) was going to pan out.  The HIT/ME and HIT/LE offer unique features when using guest attached volumes in your VM’s.  What’s the big deal about guest attached volumes?  Well, here is why I like them.

  • It keeps the footprint of the VM really small.  The VM can easily fit in your standard VMFS volumes.
  • Portable/replaceable.  Often times, systems serving up large volumes of unstructured data are hard to update.  Having the data as guest attached means that you can easily prepare a new VM presenting the data (via NFS, Samba, etc.), and cut it over without anyone knowing – especially when you are using DNS aliasing.
  • Easy and fast data recovery.  My “in the trenches” experience with the guest attached volumes in VM’s running Microsoft OS’s (and EqualLogic’s HIT/ME) have proven that recovering data off of guest attached volumes is just easier – whether you recover it from snapshot or replica, clone it for analysis, etc. 
  • Better visibility of performance. Thanks to the independent volume(s), one can easily see with SANHQ what the requirements of that data volume is. 
  • More flexible protection.  With guest attached volumes, it’s easy to crank up the frequency of snapshot and replica protection on just the data, without interfering with the VM that is serving up the data.
  • Efficient, tunable MPIO. 
  • Better utilization of space.  If you wanted to serve up a 2TB volume of storage using a VMDK, more than likely you’d have a 2TB VMFS volume, and something like a 1.6TB VMDK file to accommodate hypervisor snapshots.  With a native volume, you would be able to use the entire 2TB of space. 

The one “gotcha” about guest attached volumes is that they aren’t visible by the vCenter API, so commercial backup applications that rely on the visibility of these volumes via vCenter won’t be able to back them up.  If you use these commercial applications for protection, you may want to determine if guest attached volumes are a good fit, and if so, find alternate ways of protecting the volumes.    Others might contend that because the volumes aren’t seen by vCenter, one is making things more complex, not less.  I understand the reason for thinking this way, but my experience with them have proven quite the contrary.

Motive
I wasn’t trying out the HIT/LE because I ran out of things to do.  I needed it to solve a problem.  I had to serve up a large amount (several Terabytes) of flat file storage for our Software Development Team.  In fact, this was just the first of several large pools of storage that I need to serve up.  It would have been simple enough to deploy a typical VM with a second large VMDK, but managing such an arrangement would be more difficult.  If you are ever contemplating deployment decisions, remember that simplicity and flexibility of management should trump simplicity of deployment if it’s a close call.  Guest attached volumes align well with the “design as if you know it’s going to change” concept.  I knew from my experience with working with guest attached volumes for Windows VM’s, that they were very agile, and offered a tremendous amount of flexibility.

But wait… you might be asking, “If I’m doing nothing but presenting large amounts of raw storage, why not skip all of this and use Dell’s new EqualLogic FS7500 Multi-Protocol NAS solution?”  Great question!  I had the opportunity to see the FS7500 NAS head unit at this year’s Dell Storage Forum.  The FS7500 turns the EqualLogic block based storage accessible only on your SAN network into CIFS/NFS storage presentable to your LAN.  It is impressive.  It is also expensive.  Right now, using VM’s to present storage data is the solution that fits within my budget.  There are some downfalls (Samba not supporting SMB2), but for the most part, it falls in the “good enough” category.

I had visions of this post focusing on the performance tweaks and the unique abilities of the HIT/LE.  After implementing it, I was reminded that it was indeed a 1.0  product.  There were enough gaps in deployment information that I felt it necessary to provide information on exactly how I actually made the HIT for Linux work.  IT Generalists who I suspect make up a significant amount of the Dell EqualLogic customer base have learned to appreciate their philosophy of “if you can’t make it easy, don’t add the feature.”   Not everything can be made intuitive however, especially the first time around.

Deployment Assumptions 
The scenario and instructions are for a single VM that will be used to serve up a single large volume for storage. It could serve up many guest attached volumes, but for the sake of simplicity, we’ll just be connecting to a single volume.

  • VM with 3 total vNICs.  One used for LAN traffic, and the other two, used exclusively for SAN traffic.  The vNIC’s for the SAN will be assigned to the proper vswitch and portgroup, and will have static IP addresses.  The VM name in this example is “testvm”
  • A single data volume in your EqualLogic PS group, with an ACL that allows for the guest VM to connect to the volume using CHAP, IQN, or IP addresses.  (It may be easiest to first restrict it by IP address, as you won’t be able to determine your IQN until the HIT is installed).  The native volume name in this example is “nfs001” and the group IP address is 10.1.0.10
  • Guest attached volume will be automatically connected at boot, and will be accessible via NFS export.  In this example I will be configuring the system so that the volume is available via the “/data1” directory.
  • OS used will be RedHat Enterprise Linux (RHEL) 5.5. 
  • EqualLogic’s HIT 1.0

Each step below that starts with word “VERIFICATION” is not a necessary step, but it helps you understand the process, and will validate your findings.  For brevity, I’ve omitted some of the output of these commands.

Deploying and configuring the HIT for Linux
Here we go…

Prepping for Installation

1.     Verify installation of EqualLogic prerequisites (via rpm -q [pkgname]).  If not installed, run yum install [pkgname]

openssl                    (0.9.8e for RHEL 5.5)

libpcap                    (0.9.4 for RHEL 5.5)

iscsi-initiator-utils      (6.2.0.871 for RHEL 5.5)

device-mapper-multipath    (0.4.7 for RHEL 5.5)

python                                          (2.4 for RHEL 5.5.) 

dkms                       (1.9.5 for RHEL 5.5)

 

(dkms is not part of RedHat repo.  Need to download from http://linux.dell.com/dkms/ or via the "Extra Packages for LInux" epel repository.  I chose Dell website location because it was a newer version.  Simply download and execute RPM.). 

 

2.     Snapshot Linux machine so that if things go terribly wrong, it can be reversed

 

3.     Shutdown VM, and add NIC’s for guest access

Make sure to choose iSCSI network when adding to VM configuration

After startup, manually specify Static IP addresses and subnet mask for both.  (No default gateway!)

Activate NIC’s, and reboot

 

4.     Power up, then add the following lines to /etc/sysctl.conf  (for RHEL 5.5)

net.ipv4.conf.all.arp_ignore = 1

net.ipv4.conf.all.arp_announce = 2

 

5.     Establish NFS and related daemons to automatically boot

chkconfig portmap on

chkconfig nfs on

chkconfig nfslock on

 

6.     Establish directory which will ultimately be used to export for mounting.  In this example, the iSCSI device will mount to a directory called “eql2tbnfs001” in the /mnt directory. 

mkdir /mnt/eql2tbnfs001

 

7.     Make symbolic link called “data1” in the root of the file system.

ln -s /mnt/eql2tbnfs001 /data1 

 

Installation and configuration of the HIT

8.     Verify that the latest HIT Kit for Linux is being used for installation.  (V1.0.0 as of 9/2011)

 

9.     Import public key

      Download the public key from eql support site under HIT for Linux, and place in /tmp/ )

Add key:

rpm –import RPM-GPG-KEY-DELLEQL (docs show lower case, but file is upper case)

 

10.  Run installation

yum localinstall equallogic-host-tools-1.0.0-1.e15.x86_64.rpm

 

Note:  After HIT is installed, you may get the IQN for use of restricting volume access in the EqualLogic group manager by typing the following:

cat /etc/iscsi/initiatorname.iscsi.

 

11.  Run eqltune (verbose).  (Tip.  You may want to capture results to file for future reference and analysis)

            eqltune -v

 

12.  Make adjustments based on eqltune results.  (Items listed below were mine.  Yours may be different)

 

            NIC Settings

   Flow Control. 

ethtool -A eth1 autoneg off rx on tx on

ethtool -A eth2 autoneg off rx on tx on

 

(add the above lines to /etc/rc.d/rc.local to make persistent)

 

There may be a suggestion to use jumbo frames by increasing the MTU size from 1500 to 9000.  This has been omitted from the instructions, as it requires proper configuration of jumbos from end to end.  If you are uncertain, keep standard frames for the initial deployment.

 

   iSCSI Settings

   (make backup of /etc/iscsi/iscid.conf before changes)

 

      Change node.startup to manual.

   node.startup = manual

 

      Change FastAbort to the following:

   node.session.iscsi.FastAbort = No

 

      Change initial_login_retry to the following:

   node.session.initial_login_retry_max = 12

 

      Change number of queued iSCSI commands per session

   node.session.cmds_max = 1024

 

      Change device queue depth

   node.session.queue_depth = 128

 

13.  Re-run Eqltune -v to see if changes took affect

All changes took effect, minus the NIC settings added to the rc.local file.  Looks to be a syntax error from Eql documentation provided.  It has been corrected in the documentation above.

 

14.  Run command to view and modify MPIO settings

rswcli –mpio-parameters

 

This returns the results of:  (seems to be good for now)

Processing mpio-parameters command…

MPIO Parameters:

Max sessions per volume slice:: 2

Max sessions per entire volume:: 6

Minimum adapter speed:: 1000

Default load balancing policy configuration: Round Robin (RR)

IOs Per Path: 16

Use MPIO for snapshots: Yes

Internet Protocol: IPv4

The mpio-parameters command succeeded.

 

15.  Restrict MPIO to just the SAN interfaces

Exclude LAN traffic

            rswcli -E -network 192.168.0.0 -mask 255.255.255.0

 

VERIFICATION:  List status of includes/excludes to verify changes

            rswcli –L

 

VERIFICATION:  Verify Host connection Mgr is managing just two interfaces

      ehcmcli –d

 

16.  Discover targets

iscsiadm -m discovery -t st -p 10.1.0.10

(Make sure no unexpected volumes connect.  But note the IQN name presented.  You’ll need it for later.)

 

VERIFICATION:  shows iface

[root@testvm ~]# iscsiadm -m iface | sort

default tcp,<empty>,<empty>,<empty>,<empty>

eql.eth1_0 tcp,00:50:56:8B:1F:71,<empty>,<empty>,<empty>

eql.eth1_1 tcp,00:50:56:8B:1F:71,<empty>,<empty>,<empty>

eql.eth2_0 tcp,00:50:56:8B:57:97,<empty>,<empty>,<empty>

eql.eth2_1 tcp,00:50:56:8B:57:97,<empty>,<empty>,<empty>

iser iser,<empty>,<empty>,<empty>,<empty>

 

VERIFICATION:  Check connection sessions via iscsiadm -m session to show that no connections exist

[root@testvm ~]# iscsiadm -m session

iscsiadm: No active sessions.

 

VERIFICATION:  Check connection sessions via /dev/mapper to show that no connections exist

[root@testvm ~]# ls -la /dev/mapper

total 0

drwxr-xr-x  2 root root     60 Aug 26 09:59 .

drwxr-xr-x 10 root root   3740 Aug 26 10:01 ..

crw——-  1 root root 10, 63 Aug 26 09:59 control

 

VERIFICATION:  Check connection sessions via ehcmcli -d to show that no connections exist

[root@testvm ~]# ehcmcli -d

 

17.  Login just one of the iface paths of your liking (shown in red here).  Replace the IQN here (shown in green) with yours. The HIT will take care of the rest.

iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001 -I eql.eth1_0 -l

 

This returned:

[root@testvm ~]# iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001 -I eql.eth1_0 -l

Logging in to [iface: eql.eth1_0, target: iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001, portal: 10.1.0.10,3260]

Login to [iface: eql.eth1_0, target: iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001, portal: 10.1.0.10,3260] successful.

 

VERIFICATION:  Check connection sessions via iscsiadm -m session

[root@testvm ~]# iscsiadm -m session

tcp: [1] 10.1.0.10:3260,1 iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001

tcp: [2] 10.1.0.10:3260,1 iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001

 

VERIFICATION:  Check connection sessions via /dev/mapper.  This is going to give you the string you will need to use making and mounting the filesystem.

[root@testvm ~]# ls -la /dev/mapper

 

 

VERIFICATION:  Check connection sessions via ehcmcli -d

[root@testvm ~]# ehcmcli -d

 

18.  Make new file system from the dm-switch name.  Replace the IQN here (shown in green) with yours.  If this is an existing volume that has been used before (from a snapshot, or another machine) there is no need to perform this step.  Documentation will show this step without the “-j” switch, which will format it as a non-journaled ext2 file system.  The –j switch will format it as an ext3 file system.

mke2fs -j -v /dev/mapper/eql-0-8a0906-451da1609-2660013c7c34e45d-nfs001

 

19.  Mount the device to a directory

[root@testvm mnt]# mount /dev/mapper/eql-0-8a0906-451da1609-2660013c7c34e45d-nfs001 /mnt/eql2tbnfs001

 

20.  Establish iSCSI connection automatically

[root@testvm ~]# iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001 -I eql.eth1_0 -o update -n node.startup -v automatic

 

21.  Mount volume automatically

Change /etc/fstab, adding the following:

/dev/mapper/eql-0-8a0906-451da1609-2660013c7c34e45d-nfs001 /mnt/eql2tbnfs001 ext3 _netdev  0 0

Restart system to verify automatic connection and mounting.

 

Working with guest attached volumes
After you have things configured and operational, you’ll see how flexible guest iSCSI volumes are to work with.

  • Do you want to temporarily mount a snapshot to this same VM or another VM? Just turn the snapshot online, and make a connection inside the VM.
  • Do you need to archive your data volume to tape, but do not want to interfere with your production system? Mount a recent snapshot of the volume to another system, and perform the backup there.
  • Do you want to do a major update to that front end server presenting the data? Just build up a new VM, connect the new VM to that existing data volume, and change your DNS aliasing, (which you really should be using) and you’re done.
  • Do you need to analyze the I/O of the guest attached volumes? Just use SANHQ. You can easily see if that data should be living on some super fast pool of SAS drives, or a pool of PS4000e arrays.  You’ll be able to make better purchasing decisions because of this.

So, how did it measure up?

The good…
Right out of the gate, I noticed a few really great things about the HIT for Linux.

  • The prerequisites and installation.  No compiling or other unnecessary steps.  The installation package installed clean with no fuss.  That doesn’t happen every day.
  • Eqltune.  This little utility is magic.  Talk about reducing overhead in preparing a system for MPIO and all things related to guest based iSCSI volumes.  It gave me a complete set of adjustments to make, divided into 3 simple categories.  After I made the adjustments, I re-ran the utility, everything checked out okay.  Actually, all of the command line tools were extremely helpful.  Bravo!
  • One really impressive trait of the HIT/LE is how it handles the iSCSI sessions for you. Session build up and teardown is all taken care of by the HIT for Linux.

The not so good…
Almost as fast as the good shows up, you’ll notice a few limitations

  • Version 1.0 is only officially supported on RedHat Enterprise Linux (RHEL) 5.5 and 6.0 (no 6.1 as of this writing).  This might be news to Dell, but Debian based systems like Ubuntu are running in enterprises everywhere for it’s cost, solid behavior, and minimalist approach.  RedHat clones dominate much of the market; some commercial, and some free.  Personally, upstream Distributions such as Fedora are sketchy, and prone to breakage with each release (Note to Dell, I don’t blame you for not supporting these.  I wouldn’t either).  Other distributions are quirky for their own reasons of “improvement” and I can understand why these weren’t initially supported either.  A safer approach for Dell (and the more flexible approach for the customer) would be to 1.) Get out a version for Ubuntu as fast as possible, and 2.)  Extend the support of this version to RedHat’s, downstream, 100% binary compatible, very conservative distribution, CentOS.  For you Linux newbies, think of CentOS as being the RedHat installation but with the proprietary components stripped out, and nothing else added.  While my first production Linux server running the HIT is RedHat 5.5, all of my testing and early deployment occurred on a CentOS 5.5 Distribution, and it worked perfectly. 
  • No AutoSnapshot Manager (ASM) or equivalent.  I rely on ASM/ME on my Windows VM’s with guest attached volumes to provide me with a few key capabilities.  1.)  A mechanism to protect the volumes via snaphots and replicas.  2.)  Coordinating applications and I/O so that I/O is flushed properly.  Now, Linux does not have any built-in facility like Microsoft’s Volume Shadow Copy Services (VSS), so Dell can’t do much about that.  But perhaps some simple script templates might give the users ideas on how to flush and pause I/O of the guest attached volumes for snapshots.  Just having a utility to create Smart copies or mount them would be pretty nice. 

The forgotten…
A few things overlooked?  Yep.

  • I was initially encouraged by the looks of the documentation.  However, In order to come up with the above, I had to piece together information from a number of different resources.   Syntax and capitalization errors will kill you in a Linux shell environment.  Some of those inconsistencies and omissions showed up.  With a little triangulation, I was able to get things running correctly, but it quickly became a frustrating, time consuming exercise that I felt like I’ve been through before.  Hopefully the information provided here will help.
  • Somewhat related to the documentation issue is something that has come up with a few of the other EqualLogic tools;  Customers often don’t understand WHY one might want to use the tool.  Same thing goes with the HIT for Linux.  Nobody even gets to the “how” if they don’t understand the “why”.  But, I’m encouraged by the great work the Dell TechCenter has been doing with their white papers and videos.  It has become a great source for current information, and are moving in the right direction of customer education.   

Summary
I’m generally encouraged by what I see, and am hoping that Dell EqualLogic takes on the design queues of the HIT/ME to employ features like AutoSnapshot Manager, and an equivalent to eqlxcp (EqualLogic’s offloaded file copy command in Windows).  The HIT for Linux  helped me achieve exactly what I was trying to accomplish.  The foundation for another easy to use tool in the EqualLogic line up is certainly there, and I’m looking forward to how this can improve.

Helpful resources
Configuring and Deploying the Dell EqualLogic Host Integration Toolkit for Linux
http://en.community.dell.com/dell-groups/dtcmedia/m/mediagallery/19861419/download.aspx

Host Integration Tools for Linux – Installation and User Guide
https://www.equallogic.com/support/download_file.aspx?id=1046 (login required)

Getting more IOPS on workloads running RHEL and EQL HIT for Linux
http://en.community.dell.com/dell-blogs/enterprise/b/tech-center/archive/2011/08/17/getting-more-iops-on-your-oracle-workloads-running-on-red-hat-enterprise-linux-and-dell-equallogic-with-eql-hitkit.aspx 

RHEL5.x iSCSI configuration (Not originally authored by Dell, nor specific to EqualLogic)
http://www.equallogic.com/resourcecenter/assetview.aspx?id=8727 

User’s experience trying to use the HIT on RHEL 6.1, along with some other follies
http://www.linux.com/community/blogs/configuring-dell-equallogic-ps6500-array-to-work-with-redhat-linux-6-el.html 

Dell TechCenter website
http://DellTechCenter.com/ 

Dell TechCenter twitter handle
@DellTechCenter

Replication with an EqualLogic SAN; Part 5

 

Well, I’m happy to say that replication to my offsite facility is finally up and running now.  Let me share with you the final steps to get this project wrapped up. 

You might recall that in my previous offsite replication posts, I had a few extra challenges.  We were a single site organization, so in order to get replication up and running, an infrastructure at a second site needed to be designed and in place.  My topology still reflects what I described in the first installment, but simple pictures don’t describe the work getting this set up.  It was certainly a good exercise in keeping my networking skills sharp.  My appreciation for the folks who specialize in complex network configurations, and address management has been renewed.  They probably seldom hear words of thanks for say, that well designed sub netting strategy.  They are an underappreciated bunch for sure.

My replication has been running for some time now, but this was all within the same internal SAN network.  While other projects prevented me from completing this sooner, it gave me a good opportunity to observe how replication works.

Here is the way my topology looks fully deployed.

image

Most Collocations or Datacenters give you about 2 square feet to move around, (only a slight exaggeration on the truth) so it’s not the place you want to be contemplating reasons why something isn’t working.  It’s also no fun realizing you don’t have the remote access you need to make the necessary modifications, and you don’t, or can’t drive to the CoLo.  My plan for getting this second site running was simple.  Build up everything locally (switchgear, firewalls, SAN, etc.) and change my topology at my primary site to emulate my the 2nd site.

Here is the way it was running while I worked out the kinks.

image

All replication traffic occurs over TCP port 3260.  Both locations had to have accommodations for this.  I also had to ensure I could manage the array living offsite.  Testing this out with the modified infrastructure at my primary site allowed me to verify traffic was flowing correctly.

The steps taken to get two SAN replication partners transitioned from a single network to two networks (onsite) were:

  1. Verify that all replication is running correctly when the two replication partners are in the same SAN Network
  2. You will need a way to split the feed from your ISP, so if you don’t have one already, place a temporary switch at the primary site on the outside of your existing firewall.  This will allow you to emulate the physical topology of the real site, while having the convenience of all of the equipment located at the primary site. 
  3. After the 2nd firewall (destined for the CoLo) is built and configured, place it on that temporary switch at the primary site.
  4. Place something (a spare computer perhaps) on the SAN segment of the 2nd firewall so you can test basic connectivity (to ensure routing is functioning, etc) between the two SAN networks. 
  5. Pause replication on both ends, take the target array and it’s switchgear offline. 
  6. Plug the target array’s Ethernet ports to the SAN switchgear for the second site, then change the IP addressing of the array/group so that it’s running under the correct net block.
  7. Re-enable replication and run test replicas.  Starting out with the Group Manager.  Then to ASM/VE, then onto ASM/ME.

It would be crazy not to take one step at a time on this, as you learn a little on each step, and can identify issues more easily.  Step 3 introduced the most problems, because traffic has to traverse routers that also are secure gateways.  Not only does one have to consider a couple of firewalls, you now run into other considerations that may be undocumented.  For instance.

  • ASM/VE replication occurs courtesy of vCenter.  But ASM/ME replication is configured inside the VM.  Sure, it’s obvious, but so obvious it’s easy to overlook.  That means any topology changes will require adjustments in each VM that utilize guest attached volumes.  You will need to re-run the “Remote Setup Wizard” to adjust the IP address of the target group that you will be replicating to.
  • ASM/ME also uses a VSS control channel to talk with the array.  If you changed the target array’s group and interface IP addresses, you will probably need to adjust what IP range will be allowed for VSS control.
  • Not so fast though.  VM’s that use guest iSCSI initiated volumes typically have those iSCSi dedicated virtual network cards set with no default gateway.  You never want to enter more than one default gateway on this sort of situation.  The proper way to do this will be to add a persistent static route.  This needs to be done before you run the remote Setup Wizard above.  Fortunately the method to do this hasn’t changed for at least a decade.  Just type in

route –p add [destinationnetwork] [subnetmask] [gateway] [metric]

  • Certain kinds of traffic that passes almost without a trace across a layer 2 segment shows up right away when being pushed through very sophisticated firewalls who’s default stances are deny all unless explicitly allowed.  Fortunately, Dell puts out a nice document on their EqualLogic arrays.
  • If possible, it will be easiest to configure your firewalls with route relationships between the source SAN and the target SAN.  It may complicate your rulesets (NAT relationships are a little more intelligent when it comes to rulesets in TMG), but it simplifies how each node is seeing each other.  This is not to say that NAT won’t work, but it might introduce some issues that wouldn’t be documented.

Step 7 exposed an unexpected issue; terribly slow replicas.  Slow even though it wasn’t even going across a WAN link.  We’re talking VERY slow, as in 1/300th the speed I was expecting.  The good news is that this problem had nothing to do with the EqualLogic arrays.  It was an upstream switch that I was using to split my feed from my ISP.  The temporary switch was not negotiating correctly, and causing packet fragmentation.  Once that switch was replaced, all was good.

The other strange issue was that even though replication was running great in this test environment, I was getting errors with VSS.  ASM/ME at startup would indicate “No control volume detected.”  Even though replicas were running, the replica’s can’t be accessed, used, or managed in any way.  After a significant amount of experimentation, I eventually opened up a case with Dell Support.  Running out of time to troubleshoot, I decided to move the equipment offsite so that I could meet my deadline.  Well, when I came back to the office, VSS control magically worked.  I suspect that the array simply needed to be restarted after I had changed the IP addressing assigned to it. 

My CoLo facility is an impressive site.  Located in the Westin Building in Seattle, it is also where the Seattle Internet Exchange (SIX) is located.  Some might think of it as another insignificant building in Seattle’s skyline, but it plays an important part in efficient peering for major Service Providers.  Much of the building has been converted from a hotel to a top tier, highly secure datacenter and a location in which ISP’s get to bridge over to other ISP’s without hitting the backbone.  Dedicated water and power supplies, full facility fail-over, and elevator shafts that have been remodeled to provide nothing but risers for all of the cabling.  Having a CoLo facility that is also an Internet Exchange Point for your ISP is a nice combination.

Since I emulated the offsite topology internally, I was able to simply plug in the equipment, and turn it on, with the confidence that it will work.  It did. 

My early measurements on my feed to the CoLo are quite good.  Since the replication times include buildup and teardown of the sessions, one might get a more accurate measurement on sustained throughput on larger replicas.  The early numbers show that my 30mbps circuit is translating to replication rates that range in the neighborhood of 10 to 12GB per hour (205MB per min, or 3.4MB per sec.).  If multiple jobs are running at the same time, the rate will be affected by the other replication jobs, but the overall throughput appears to be about the same.  Also affecting speeds will be other traffic coming to and from our site.

There is still a bit of work to do.  I will monitor the resources, and tweak the scheduling to minimize the overlap on the replication jobs.  In past posts, I’ve mentioned that I’ve been considering the idea of separating the guest OS swap files from the VM’s, in an effort to reduce the replication size.  Apparently I’m not the only one thinking about this, as I stumbled upon this article.  It’s interesting, but a nice amount of work.  Not sure if I want to go down that road yet.

I hope this series helped someone with their plans to deploy replication.  Not only was it fun, but it is a relief to know that my data, and the VM’s that serve up that data, are being automatically replicated to an offsite location.

Replication with an EqualLogic SAN; Part 3

 

In parts one and two of my journey in deploying replication between two EqualLogic PS arrays, I described some of the factors that came into play on how my topology would be designed, and the preparation that needed to occur to get to the point of testing the replication functions. 

Since my primary objective of this project was to provide offsite protection of my VMs and data in the event of a disaster at my primary facility,  I’ve limited my tests to validating that the data is recoverable from or at the remote site.   The logistics of failing over to a remote site (via tools like Site Recovery Manager) is way outside the scope of what I’m attempting to accomplish right now.  That will certainly be a fun project to work on some day, but for now, I’ll be content with knowing my data is replicating offsite successfully.

With that out of the way, let the testing begin…

 

Replication using Group Manager 

Just like snapshots, replication using the EqualLogic Group Manager is pretty straight forward.  However, in my case, using this mechanism would not produce snapshots or replicas that are file-system consistent of VM datastores, and would only be reliable for data that was not being accessed, or VM’s that were turned off.  So for the sake of brevity, I’m going to skip these tests.

 

ASM/ME Replica creation.

My ASM/ME replication tests will simulate how I plan on replicating the guest attached volumes within VMs.  Remember, these are replicas of the guest attached volumes  only – not of the VM. 

On each VM where I have guest attached volumes and the HITKit installed (Exchange, SQL, file servers, etc.) I launched ASM/ME to configure and create the new replicas.  I’ve scheduled them to occur at a time separate from the daily snapshots.

image

As you can see, there are two different icons used; one represents snapshots, and the other representing replicas.  Each snapshot and replica will show that the guest attached volumes (in this case, “E:\” and “F:\” )  have been protected using the Exchange VSS writer.  The two drives are being captured because I created the job from a “Collection” which makes most sense for Exchange and SQL systems that have DB files and transaction log data that you’d want to capture at the exact same time.  For the time being, I’m just letting them run once a day to collect some data on replication sizes.  ASM/ME is where recovery tasks would be performed on the guest attached volumes.

A tip for those who are running ASM/ME for Smartcopy snapshots or replication.  Define in your schedules a “keep count” number of snapshots or replicas that fall within the amount of snapshot reserve you have for that volume.  Otherwise, ASM/ME may take a very long time to start  the console and reconcile the existing smart copies, and you will also find those old snapshots in the “broken” container of ASM/ME.    The startup delay can be so long, it almost looks as if the application has hung, but it has not, so be patient.  (By the way, ASM/VE version 2.0, which should be used to protect your VMs, does not have any sort of “keep count” mechanism.  Lets keep our fingers crossed for that feature in version 3.0)

 

ASM/ME Replica restores

Working with replicas using ASM/ME is about as easy as it gets.  Just highlight the replica, and click on “Mount as read-only.”  Unlike a snapshot, you do not have the option to “restore” over the existing volume when its a replica.

image

ASM/ME will ask for a drive letter to assign that cloned replica to.  Once it’s mounted, you may do with the data as you wish.  Note that it will be in a read only state.  This can be changed later if needed.

When you are finished with the replica, you can click on the “Unmount and Resume Replication…”

image

ASM/ME will ask you if you want to keep the replica around after you unmount it.  To keep it, uncheck the box next to “Delete snapshot from the PS Series group…”

 

ASM/VE replica creation

ASM/VE replication, which will be the tool I use to protect my VMs, took a bit more time to set up correctly due to the way that ASM/VE likes to work.  I somehow missed the fact that one needed a second ASM/VE server to run at the target/offsite location for the ASM/VE server at the primary site to communicate with.  ASM/VE also seems to be hyper-sensitive to the version of Java installed on the ASM/VE servers.  Don’t get too anxious on updating to the latest version of Java.   Stick with a version recommended by EqualLogic.  I’m not sure what that officially would be, but I have been told by Tech Support that version 1.6 Update 18 is safe.

Unlike creating Smartcopy snapshots in ASM/VE, you cannot use the “Virtual Machines” view in ASM/VE to create Smartcopy replicas.  Only Datastores, Datacenters, and Clusters support replicas.  In my case, I will click  “Datastores” view to create Replicas.  Since I made the adjustments to where my VM’s were placed in the datastores, (see part 2, under “Preparing VMs for Replication”) it will still be clear as to which VMs will be replicated. 

image

After creating a Smartcopy replica of one of the datastores, I went to see how it looked.  In ASM/VE it appeared to complete successfully, and in SANHQ it also seemed to indicate a successful replica.  ASM/VE then gave a message of “contacting ASM peer” in the “replica status” column.  I’ve seen this occur right after I kicked off a replication job, but on successful jobs, it will disappear shortly.  If it doesn’t disappear, this can be a configuration issue (user accounts used to establish the connection due to known issues with ASM/VE 2.0), or caused by Java.

 

ASM/VE replica restores

At first, ASM/VE Smartcopy replicas didn’t make much sense to me, especially when it came to restores.  Perhaps I was attempting to think of them as a long distance snapshot, or that they might behave in the same way as ASM/ME replicas.  They work a bit  differently than that.  It’s not complicated, just different.

To work with the Smartcopy replica, you must first log into the ASM/VE server at the remote site.  From there, click on “Replication” > “Inbound Replicas” highlighting the replica from the datastore you are interested in.  Then it will present you with the options of “Failover from replica” and “clone from replica”  If you attempt to do this from the ASM/VE server from the primary site, these options never present themselves.  It makes sense to me after the fact, but took me a few tries to figure that out.  For my testing purposes, I’m focusing exclusively on “clone from replica.”  The EqualLogic documentation has good information on when each option can be used.

When choosing “Clone from Replica” it will have a checkbox for “Register new virtual machines.”  In my case, I uncheck this box, as my remote site will have just a few hosts running ESXi, and will not have a vCenter server to contact.

image

 

Once it is complete, access will need to be granted for the remote host in which you will want to try to mount the volume.  This can be accomplished by logging into the Group Manager of the target/offsite SAN group, selecting the cloned volume, and entering CHAP credentials, the IP address of the remote host, or the iSCSI initiator name. 

image

 

Jump right on over to the vSphere client for the remote host, and under “Configuration” > “Storage Adapters”  right click on your iSCSI software adapter, and select “Rescan”  When complete, go to “Configuration” > “Storage” and you will notice that it the volume does NOT show up.  Click “Add Storage” > “Disk/LUN”

image

 

When a datastore is recognized as a snapshot, it will present you with the following options.  See http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf for more information on which option to choose.

image

 

Once completed, the datastore that was replicated to the remote site and cloned so that it can be made available to the remote ESX/i host, should now be visible in “Datastores.” 

image

From there just browse the Datastore, drilling down to the folder of the VM you wish to turn up, highlight and right click the .vmx file, and select “Add to inventory.”  Your replicated VM should now be ready for you to power up.

If you are going to be cloning a VM replica living on the target array to a datastore, you will need to do one additional step if any of the VM’s have guest attached volumes using the guest iSCSI initiator.  At the target location, open up Group Manager, and drill down to “Replication Partners” > “[partnername]” and highlight the “Inbound” tab.  Expand the volume(s) that are associated with that VM.  Highlight the replica that you want, then click on “Clone replica”

image

This will allow you to reattach a guest attached volume to that VM.  Remember that I’m using the cloning feature simply to verify that my VM’s and data are replicating as they should.  Turning up systems for offsite use is a completely different ballgame, and not my goal – for right now anyway.

Depending on how you have your security and topology set up, and how connected your ESX host is offsite, your test VM you just turned up at the remote site may have the ability to contact Active Directory at your primary site, or guest attached volumes at your primary site.  This can cause problems for obvious reasons, so be careful to not let either one of those happen.  

 

Summary

While demonstrating some of these capabilities recently to the company, the audience (Developers, Managers, etc.) was very impressed with the demonstration, but their questions reminded me of just how little they understood the new model of virtualization, and shared storage.  This can be especially frustrating for Software Developers, who generally consider that there isn’t anything in IT that they don’t understand or know about.  They walked away impressed, and confused.  Mission accomplished.

Now that I’ve confirmed that my data and VM’s are replicating correctly, I’ll be building up some of my physical topology so that the offsite equipment has something to hook up to.  That will give me a chance to collect some some statistics on replication, which I will share on the next post.

Replication with an EqualLogic SAN; Part 2

 

In part 1 of this series, I outlined the decisions made in order to build a replicated environment.  On to the next step.  Racking up the equipment, migrating my data, and laying some groundwork for testing replication.

While waiting for the new equipment to arrive, I wanted to take care of a few things first:

1.  Update my existing PS5000E array up to the latest firmware.  This has never been a problem, other than the times that I’ve forgotten to log in as the default  ‘grpadmin’ account (the only account allowed to do firmware updates).  The process is slick, with no perceived interruption.

2.  Map out how my connections should be hooked up on the switches.  Redundant switches can only be redundant if you plug everything in the correct way.

3.  IP addressing.  It’s all too easy just to randomly assign IP addresses to a SAN.  It may be it’s own isolated network, but in the spirit of “design as if you know its going to change”  it might just be worth observing good addressing practices.  My SAN is on a /24 net block.  But I configure my IP addresses to respect potential address boundaries within that address range.  This is so that I can subnet or VLAN them down (e.g. /28)  later on, as well as helping to simplify rule sets on my ISA server that are based on address boundaries, and not a scattering of addresses.

Preparing the new array

Once the equipment arrived, it made most sense to get the latest firmware on the new array.  The quickest way is to set it up temporarily using the “initialize PS series  array” feature in the “Remote Setup Wizard” of the EqualLogic HITKit on a machine that can access the array.  Make it it’s own group, update the firmware, then reset the array to the factory defaults.  After completing the update and  typing “reset”  up comes the most interesting confirmation prompt you’ll ever see.  Instead of “Reset this array to factory defaults?”  [Y/N]”  where a “Y” or “N” is required, the prompt is “Reset this array to factory defaults? [n/DeleteAllMyDataNow]”  You can’t say that isn’t clear.  I applaud EqualLogic for making this very clear.  Wiping a SAN array clean is serious stuff, and definitely should be harder than typing a “Y” after the word “reset.” 

After the unit was reset, I was ready to join it to the existing group temporarily so that I could evacuate all of the data from the old array, and have it placed on the new array.  I plugged all of the array ports into the SAN switches, and turned it on.  Using the Remote Setup Wizard, I initialized the array, joined it to the group, then assigned and activated the rest of the NICs.   To migrate all of the data from one array to another, highlight the member with the data on it, then  click on “Delete Member”  Perhaps EqualLogic will revisit this term.  “Delete” just implies way too many things that doesn’t relate to this task.

The process of migrating data chugs along nicely.  VM’s and end users are none-the-wiser.  Once it is complete, the old array will remove itself from the group, and reset itself to the factory defaults.  It’s really impressive.  Actually, the speed and simplicity of the process gave me confidence when we need to add additional storage.

When the old array was back to it’s factory defaults,  I went back to initialize the array, and set it up as a new member in a new group.  This would be my new group that would be used for some preliminary replication testing, and will eventually live at the offsite location.

As for how this process compares with competing products, I’m the wrong guy to ask.  I’ve had zero experience with Fiber Channel SANs, and iSCSI SANs from other vendors.  But what I can say is that it was easy, and fast.

After configuring the replication between the two group, which consisted of configuring a few shared passwords between the the two groups, and configuring replication to occur on each volume, I was ready to try it out  …Almost.

 

Snapshots, and replication.

It’s worth taking a step back to review a few things on snapshots and how the EqualLogic handles them.  Replicas appear to work in a similar (but not exact) manner to snapshots, so many of the same principals apply.  Remember that snapshots can be made in several ways.

1.  The most basic are snapshots created in the EqualLogic group Manager.  These do exactly as they say, making a snapshot of the volume.  The problem is that they are not file-system consistent of VM datastores, and would only  be suitable for datastores in which all of the VM’s were turned off at the time the snapshot was made.

2.  To protect VM’s, “Autosnapshot manager VMware Edition” (ASM/VE) provides and ability to create a point-in-time snapshot, leveraging vCenter through VMware’s API, then does some nice tricks to make this an independent snapshot (well, of the datastore anyway) that you see in the EqualLogic group manager, under each respective volume.

3.  For VM’s with guest iscsi attached drives, there is “Autosnapshot Manager Microsoft Edition” (ASM/ME).  This great tool is installed with the Host Integration Toolkit (HITkit).  This makes application aware snapshots by taking advantage of the Microsoft Volume Shadow Copy Service Provider.  This is key for protecting SQL databases, Exchange databases, and even flat-file storage residing on guest attached drives.  It insures that all I/O is flushed when the snapshot is created.  I’ve grown quite partial to this type of snapshot, as its nearly instant, no interruption to the end users or services, and provides easy recoverability.  The downside is that it can only protect data on iscsi attached drives within the VM’s guest iscsi initiator, and must have a VSS writer specific to an application (e.g. Exchange, SQL) in order for it to talk correctly.  You cannot protect the VM itself with this type of snapshot.  Also, vCenter is generally unaware of these types of guest attached drives, so VCB backups and other apps that rely on vCenter won’t include these types of volumes.

So just as I use ASM/ME for smartcopy snapshots of my guest attached drives, and ASM/VE for my VM snapshots, I will use these tools in the similar way to create VM and application aware replica’s of the VM’s and the data.

ASM/VE tip:  Smartcopy snapshots using ASM/VE give the option to “Include PS series volumes accessed by guest iSCSI initiators.”  I do not use this option for a few very good reasons, and rely completely on ASM/ME for properly capturing guest attached volumes. 

Default replication settings in EqualLogic Group Manager

When one first configures a volume for replication, some of the EqualLogic defaults are set very generous.  The two settings to look out for are the “Total replica reserve” and the “Local replication reserve.”  The result is that these very conservative settings can chew up a lot of your free space on your SAN.  Assuming you have a decent amount of free space in your storage pool, and you choose to stagger some of your replication to occur at various times of the day, you can reduce the “Local replication reserve” down to it’s minimum, then click the checkbox for “allow temporary use of free pool space.”  This will minimize the impact of enabling replication on your array.

 

Preparing VM’s for replication

There were a few things I needed to do to prepare my VM’s to be replicated.  I wasn’t going to tackle all optimization techniques at this time, but thought it be best to get some of the easy things out of the way first.

1.  Reconfigure VM’s so that swap file is NOT in the same directory as the other VM files.  (This is the swap file for the VM at the hypervisor level; not to be confused with the guest OS swap file.)  First I created a volume in the EqualLogic group manager that would be dedicated for VM swap files, then made sure it was visible to each ESX host.  Then, simply configure the swap location at the cluster level in vCenter, followed by changing the setting on each ESX host.  The final step will be to power off and power on of each VM.  (A restart/reboot will not work for this step).  Once this is completed, you’ve eliminated a sizeable amount of data that doesn’t need to be replicated.

2.  Revamp datastores to reflect good practices with ASM/VE.  (I’d say “best practices” but I’m not sure if they exist, or if these qualify as such).  This is a step that takes into consideration how ASM/VE works, and how I use ASM/VE.   I’ve chosen to make my datastores reflect how my VM’s are arranged in vCenter.    Below is a screenshot in vCenter of the folders that contain all of my VMs.

image

Each folder has VMs in it that reside in just one particular datastore.  So for instance, the “Prodsystems-Dev” has a half dozen VM’s exclusively for our Development team.  These all reside in one datastore called VMFS05DS.  When a scheduled snapshot of a vcenter folder (e.g. “Prodsystems-Dev”) using ASM/VE, it will only hit those VM’s in that vcenter folder, and the single datastore that they reside on.  If it is not done this way, an ASM/VE snapshot of a folder containing VM’s that reside in different datastores will generate snapshots in each datastore.  This becomes terribly confusing to administer, especially when trying to recover a VM.

Since I recreated many of my volumes and datastores, I also jumped on the opportunity to make these new datastores with a 4MB block size instead of the the default 1MB block size.  Not really necessary in my situation, but based on the link here, it seems like a a good idea.

Once the volumes and the datastores were created and sized the way I desired, I used the storage vmotion function in vCenter to move each VM into the appropriate datastore to mimic my arrangement of folders in vCenter.  Because I’m sizing my datastores for a functional purpose, I have a mix of large and small datastores.  I probably would have made these the same size if it weren’t for how ASM/VE works.

The datastores are in place, and now mimic the arrangement of folders of VM’s in vCenter.  Now I’m ready to do a little test replication.  I’ll save that for the next post.

Suggested reading

Michael Ellerbeck has some great posts on his experiences with EqualLogic, replication, Dell switches, and optimization.    A lot of good links within the posts.
http://michaelellerbeck.com/

The Dell/EqualLogic Document Center has some good overview documents on how these components work together.  Lots of pretty pictures. 
http://www.equallogic.com/resourcecenter/documentcenter.aspx