VDI for me. Part 4

 

Picking up where we left off in VDI for me. Part 3, we are at a point in which the components of View can be installed and configured.  As much as I’d like to walk you through each step, and offer explanations at each point, sticking to abbreviated steps is a better way to help you understand how the pieces of the puzzle fit together.  Besides, others have great posts on installing and configuring the View Connection servers, not to mention the VMware documentation, which is quite good.  The links at the end of the post will give you a good start.  My focus will be to hit on the main areas to configure to get View up and running.

Here is the relationship between the Connection Servers, the clients, and the systems running the agents in my environment.  The overall topology for my View environment can be found in VDI for me. Part 2.

For clients accessing View from the Internal LAN

image

For clients accessing View from offsite locations

image

Overview of steps
This is the order I used for deploying the View Components.  To simplify, you may wish to skip steps 3 and 4 until you get everything working on the inside of your network. 

  1. Install Composer on your vCenter Server.
  2. Build a VM and install View Connection Server intended for local LAN access only.
  3. Build a VM and install View Connection Server intended to talk with Security Server.
  4. Build a VM and install View Security Server in your DMZ.
  5. Install View Agent on a VM.
  6. Create a Pool for the VM, and entitle the pool to an user or group in AD.
  7. Connect to the VM using the View Client.

Configuring your first Connection Server (For Internal Access)
From the point that your first connection manager is installed, you may begin the configuration.

  1. Browse out to VMware View Administrator portal on your connection server (https://[yourconnectionserver]/admin) and enter the appropriate credentials.
  2. Drill down into View Configuration > Product Licensing and Usage > Edit License to add your license information.
  3. Register your vCenter Server by going to View Configuration > Servers > Add.  Fill out all of the details, but do not click “Enable View Composer” quite yet.  Click OK to exit.
  4. Go back into Edit the vCenter server configuration, and click “Enable View Composer and Click OK to exit.
  5. In the area where the listing of View Connection servers are listed, select the only View Connection Server on the list, and click “Edit”.  You will want to make sure both check boxes are unchecked, and use internal FQDN and IP addresses only.

image

Configuring your Second Connection Server (to be paired with Security Server)
During the installation of View on the second server, it will ask what type of Connection Server it will be.  Choose “Replica” from the list, and type in the name of your first Connection Server.

  1. Browse out to the View Administrator Portal, and you will now see a second connection server listed.  Highlight it, and click on Edit.
  2. Unlike the first connection server, this connection server needs to have both checkboxes checked.

image

Configuring your Security Server (to be paired with your second Connection Server)
Just a few easy steps will take care of your Security Server.

  1. Browse out to the View Administrator portal, highlight the Connection Server you want to pair with the security server, and click More Commands > Specify Security Server Pairing Password.
  2. Install the Connection Server install bits onto your new Security Server.  Choose “Security Server” for the type of Connection Server it will be.  It will then prompt you to enter the internal Connection Server to pair it to.  This is the internal FQDN of the server Connection Server.
  3. Enter the View pairing password established in step 1.  This will make the Security Server show up in the View Administrator Portal.
  4. Go back to the View Administrator portal, highlight the server that is listed under the Security Server, and click Edit.  This is where you will enter in the FQDN desired.  The PCoIP address should be the publicly registered IP address.  In my case, it is the address bound to the external interface of my firewall, but your topology might dictate otherwise.

image

 

After it is all done, in the View Administrator portal, you should see one entry for a vCenter server, two entries for the View Connection servers, and one entry for a Security Server.

image

From this point, it is just a matter of installing the View Agent on the VMs (or physical workstation with a PCoIP host card) you’d like to expose, create a pool, entitle a user or group, and you should be ready to connect.

Tuning
After you add the VMware View adm templates to Active Directory, a number of tunable settings will be available to you.  The good news in the tuning department is that while PCoIP is highly tunable, I don’t feel it has to be the first thing you need to address after the initial deployment.  With View 5, it works quite well out of the box.  I will defer to this post http://myvirtualcloud.net/?p=2061 on some common, View specific GPO settings you might want to adjust, especially in a WAN environment.  The two settings that will probably make the biggest impact are the “Maximum Frame Rate” settings, and the “Build to Lossless” toggle.  I applied these and a few others in order to accommodate our Development Team working on another continent deal with their 280+ms latency. 

The tools available to monitor, test, and debug PCoIP are improving almost by the day, and will be an interesting area to watch.  Take a look at the links for the PCoIP Log Viewer and the PCoIP Configuration utilities at the end of this post.

Tips and Observations
When running View, there is a noticeable increase on the dependence of vCenter, and the databases that support it and View Composer.  This is especially the case in smaller environments where the server running vCenter might be housing the vCenter database, and the database for View Composer.  Chris Wahl’s recent post Protecting the vCenter Database with SQL Log Shipping addresses this, and provides a good way to protect the vCenter databases through log shipping.  If you are a Dell EqualLogic user, it may be helpful to move your SQL DB and Log volumes off to guest attached volumes, and use their ASM/ME application to easily make snaps and replicas of the database.  Regardless of the adjustments that you choose to make, factor this in to your design criteria, especially if the desktops served up by View become critical to your business.

If your connection to a View VM terminates prematurely, don’t worry.  It seems to be a common occurrence during initial deployment that can happen for a number of reasons.  There are a lot of KB articles on how to diagnose them.  One that I ran across that wasn’t documented very much was that the VM may not have enough memory assigned to the video RAM.  The result can be that it works fine using RDP, but disconnects when using PCoIP.  I’ve had some VMs mysteriously reduce themselves back down to a default number that won’t support large or multiple screen resolutions.  Take a quick look the settings of your VM.  Once those initial issues have been resolved, I’ve found everything to work as expected.

In my mad rush to build out the second View environment at our CoLo, everything worked perfectly, except when it came to the View client validating the secured connection. All indicators pointed to SSL, and perhaps how the exported SSL certificate was applied to the VM running the Security Server. I checked, and rechecked everything, burning up a fair amount of time. It turned out it was a silly mistake (aren’t they all?). In C:\Program Files\VMware\VMware View\Server\sslgateway\conf there needs to be a file called locked.properties. This contains information on the exported certificate. Well, when I created the locked.properties file, Windows was nice enough to append the .txt to it (e.g. locked.properties.txt). The default settings in Windows left that extension hidden, so it didn’t show. By the way, I’ve always hated that default setting for hiding file extensions. It is controlled via GPO at my primary site, but didn’t have that set at the CoLo site.

Next up, I’ll be wrapping up this series with the final impressions of the project.  What worked well, what didn’t.  Perceptions from the users, and from those writing the checks.  Stay tuned.

Helpful Links
VMware View Documentation Portal.  A lot of good information here.
http://www.vmware.com/support/pubs/view_pubs.html

A couple of nice YouTube videos showing a step by step installation of View Composer
http://www.youtube.com/watch?v=JSFnkLW1ve4&feature=youtube_gdata_player
http://www.youtube.com/watch?v=SDc21h0uTkA&feature=youtube_gdata_player

How to apply View specific settings for systems via GPO (written for 4.5, but also applies to 5.0)
http://blog.vhowto.info/2010/11/25/vmware-view-4-5-active-directory-group-policies/

PCoIP disconnect codes
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2012101

PCoIP Log Viewer
http://mindfluxinc.net/?p=195

PCoIP Configuration utility (beta)
http://mindfluxinc.net/?p=338

More PCoIP tuning suggestions
http://mindfluxinc.net/?p=338

VDI for me. Part 3

 

In VDI for me. Part 2, I left off with how VMware View was going to be constructed in my environment.  We are almost at the point of installing and configuring the VMware View components, but before that is addressed, the most prudent step is to ensure that the right type of traffic can communicate across the different isolated network segments.  This post is simply going to focus on the security rules to do such a thing.  For me, access to these segments are managed by a Celestix MSA 5200i, 6 port Firewall running Microsoft ForeFront Threat Management Gateway (TMG) 2010.  While the screen captures are directly from TMG, much of the information here would apply to other security solutions.

Since all of the supporting components of VMware View will need to communicate across network segments anyway, I suggest making accommodations in your firewall before you start building the View components.  Sometimes this is not always practical, but in this case, I found that I only had to make a few adjustments before things were working perfectly with all of the components.

My network design was a fairly straightforward, 4 legged topology. (a pretty picture of this can be seen in Part 2)

Network Leg Contains
External All users connecting to our View environment.
LAN View connection server dedicated for access from the inside. 
View connection server dedicated for communication with the Security Server. 
Systems running the View agent software.
DMZ1 Externally facing View “Security Server”
DMZ4 vSphere Management Network.  vCenter, and SQL databases providing services for vCenter, and View Composer.

For those who have their vSphere Management network on a separate network by way of a simple VLAN, your rules will be simpler than mine.  For clarity, I will just show the rules that are used for getting VMware View to work.

Before you get started, make sure you have planned out all of the system names and IP addresses of the various Connection Servers, VM’s running the View Agent.  It will make the work later on easier. 

Creating Custom Protocols for VMware View in TMG 2010
In order to build the rules properly, you will first need to define some “User-Defined” protocols.  For the sake of keeping track of all of the user defined protocols, I always included the name “View” (to remember it’s purpose), the direction, type, and the port number.  Here was the list (as I named them) that was used as a part of my rule sets.

VMware View Inbound TCP&UDP (4172)
VMware View Outbound (32111)
VMware View Outbound (4001)
VMware View Outbound (8009)
VMware View Outbound (9427)
VMware ViewComposer Inbound (18443)
VMware ViewComposer Outbound (18443)
VMware ViewPCoIP Outbound (4172)
VMware ViewPCoIP SendReceiveUDP (4172)

Page 19 of the VMware View Security Reference will detail the ports and access needed.  I appreciate the detail, and it is all technically correct, but it can be a little confusing.  Hopefully, what I provide will help bridge the gap on anything confusing in the manual.  My implementation at this time does not include a View Transfer Server, so if your deployment includes this, please refer to the installation guide.

Creating Access Rules for VMware View in TMG 2010
The next step will be to build some access rules.  Access rules are typically defining access in a From/To arrangement.  Here are what my rules looked like for a successful implementation of VMware View in TMG 2010.

image

Creating Publishing rules for VMware View in TMG 2010
In the screen above, near the bottom, you see two Publishing rules.  These are for the purposes of securely exposing a server that you want visible to the outside world.  In this case, that would be the View Security Server.  The server will still have its private address as it resides in the DMZ, but would take on one of the assigned public IP addresses bound to the external interface of the TMG appliance.  To make View work, you will need two publishing rules.  One for HTTPS, and the other for PCoIP.  A View session with the display setting of RDP will use only the HTTPS publisher.  A View session with the display setting of PCoIP will use both of the publishing rules.  Page 65 of the View 5 Architecture Planning Guide illustrates this pretty well.

In the PCoIP publishing rule, notice how you need both TCP and UDP, and of course, the correct direction.

image

My friend Richard Hicks had some great information on his ForeFront TMG blog that was pertinent to this project. ForeFront TMG 2010 Protocol Direction Explained is a good reminder of what you will need to know when defining custom protocols, and the rule sets that use them.  The other was the nuances of using RDP with the “Web Site Publishing Rule” generator.  Let me explain.

TMG has a “Web Site Publishing Rule” generator that allows for a convenient way of exposing HTTP and HTTPS related traffic to the intended target. This publisher’s role is to protect by inspection. It terminates the session, decrypts, inspects, then repackages for delivery onto its destination. This is great for many protocols inside of SSL such as HTTP, but protocols like RDP inside SSL do not like it. This is what I was running into during deployment. View connections using PCoIP worked fine. View connections using RDP did not. Rich was able to help me better understand what the problem was, and how to work around it. The fix was simply to create a “Non-Web Server Protocol Publishing Rule” instead, choosing HTTPS as the protocol type.  For all of you TMG users out there, this is the reason why I haven’t described how to create a “Web Listener” to be used with a traditional “Web Site Publishing Rule.”  There is no need for one.

A few tips in with implementing you’re your new firewall rules.  Again, most of these apply to any Firewall you choose to use.

1.  Even if you have the intent of granular lockdown (as you should), it may be easiest to initially define the rule sets a little broader.  Use things like entire network segments instead of individually assigned machine objects  You can tighten the screws down later (remember to do so), and it is easier to diagnose issues.

2.  Watch those firewall logs.  Its easy to mess something up along the way, and your real time firewall logs will be your best friend.  But be careful not to get too fancy with the filtering.  You may be missing some denied traffic that doesn’t necessarily match up with your filter.

3.  You will probably need to create custom protocols.  Name them in such a way that they are clear that they are an incoming or outgoing protocol, and perhaps whether they are TCP, or UDP.  Otherwise, it can get a little confusing when it comes to direction of traffic.  Rule sets have a direction, as do the protocols that are contained in them.

4.  Stay disciplined to rule set taxonomy.  You will need to understand what the rule is trying to do.  Consistency is key.  You may find it more helpful to name the computer objects  the role that they are playing, rather than their actual server name.  It helps with understanding the flow of the rules.

5.  Add some identifier to your rules defined for View.  That way, when you are troubleshooting, you can enter “View” in the search function, and it quickly shows you only the rule sets you need to deal with.

6.  Stick the the best practices when it comes to placement of the View rules into your overall rule sets.  TMG processes the rules by order, so there is some methods to make the processing most efficient.  They remain unchanged from it’s predecessor, ISA 2006.  Here is a good article on better understanding the order.

7.  TMG 2010 has a nice feature of grouping rules.  This gives the ability of a set of contiguous rules to be seen as one logical unit.  You might find this helpful in most of your View based rule creation.  I would probably recommend having your access rules for View in a different group than your publishing rules.  This is so that you can maintain best practices on placement/priority of rule types.

8.  When you get to the point diagnosing what appear to be connection problems between clients, agents, and connection servers, give VMware a call.  They have a few tools that will help in your efforts.  Unfortunately, I can’t provide any more information about the tools at this time, but I can say that for the purposes of diagnosing connectivity issues, they are really nice.

I also stumbled upon an interesting (and apparently little known) issue when you have a system with multiple NICs that is also running the View agent.  For me, this issue arose on the high powered physical workstation with PCoIP host card, using View as the connection broker.  This system had two additional NICs that connected to the iSCSI SAN.  The PCoIP based connections worked, but RDP sessions through View failed, even when standard RDP worked without issue.  Shut off the other NICs, and everything worked fine.  VMware KB article 1026498 addresses this.  The fix is simply adding a registry entry

On the host with the PCoIP card, open regedit and add the following entry:
HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VDM\Node Manager

Add the REG_SZ value:
172.16.128.0/24

If you experience issues connecting to one system running the View Agent, but not the other, a common practice is to remove and reinstall the View Agent.  Any time the VMware Tools on the VM are updated, you will also need to reinstall the agent. 

More experimentation and feedback
As promised in part 1 of this series, I wanted to keep you posted of feedback that I was getting from end users, and observations I had along the way. 

The group of users allowed to connect continued to be impressed to the point that using it was a part of their workday.  I found myself not being able to experiment quite the way I had planned, because users were depending on the service almost immediately.  So much for that idea of it being a pilot project.

The experimentation with serving up a physical system with PCoIP using VMware as a connection broker has continued to be an interesting one. There are pretty significant market segments that demands high powered GPU processing. CAD/CAE, visualization, animation, graphic design, etc have all historically relied on client side GPUs. So it is a provocative thought to be able to serve up high powered graphics workstations without it sitting under a desk somewhere.  The elegance of this arrangement is that once a physical system has a PCoIP host card in it, and the View Agent installed, it is really no different than the lower powered VM’s served up by the cluster.  Access is the same, and so is the experience.  Just a heck of a lot more power.  Since it is all host based rendering, you can make the remote system as powerful as your budget allows.  Get ready for anyone who accesses a high powered workstation like this to be spoiled easily.  Before you know it, they will ask if they can have 48GB of RAM on their VM as well. 

Running View from any client (Windows Workstation, Mac, Tablets, Ubuntu, and a Wyse P20 zero client) proved to give basically the same experience.  It was easy for the end users to connect.  Since I have a dual name space (“view.mycompany.com” from the outside, and “view.local.lan” from the inside), the biggest confusion has been for laptop users remembering which address to use.  That, and reminding them to not use the VPN to connect.  A few firewall rules blocking access will help guide them. 

One of my late experiments came after I met all of my other acceptance criteria for the project.  I wanted to see how VMware View worked with linked clones.  Setting up linked clones was pretty easy.  However, I didn’t realize until late in the project that a linked clone arrangement of View really requires you to run a Microsoft KMS licensing server.  Otherwise, your trusty MAK license keys might be fully depleted in no time.  There is a VMware KB Article describing a possible workaround, but it also warns you of the risks.  Accommodating for KMS licensing is not a difficult matter to address (except for extremely small organizations who don’t qualify for KMS licensing), but it was something I didn’t anticipate.

I had the chance to do this entire design and implementation not once, but twice.  No, it wasn’t because everything blew up and I had no backups.  My intention was to build a pilot out at my Primary facility first, then build the same arrangement (as much as possible) at the CoLocation facility.  What made this so fast and easy?  As I did my deployment at my Primary facility, I did all of my step by step design documentation in Microsoft OneNote; my favorite non-technical application.  Step by step deployment of the systems, issues, and other oddball events were all documented the first time around.  It made the second go really quick and easy.  Whether it be my Firewall configuration, or the configurations of the various Connection Servers, the time spent documenting paid off quickly.

Next up, I’ll be going over some basic configuration settings of your connection servers, and maybe a little tuning.

VDI for me. Part 2

 

In VDI for me.  Part 1, I described a little bit about our particular interests and use cases for VDI, and how we wanted to deploy a pilot project to learn more about it.  So now what?  Where does a person start?  Well, deployment guides and white papers for VMware View will tell you all of the possibilities.  They are thorough, but perhaps a bit tough to decipher when trying to understand what a simple VDI arrangement might look like.  So I’m just going to focus on how I built out a simple arrangement of VMware View, and how the components of VDI listed in my first post fit together.

Topology of VMware View
The Topology of View should fit in very nicely with what what you already have for a virtualized infrastructure.  If you are already running a vSphere cluster in your environment, its really just a matter of building your supporting components, which in a very simple or small deployment might be an instance of Composer, and a View Connection server or two.

image

VMware deployment guides will have all of the details of these components, so I will try to keep it brief. 

Connection Server.  This is the VM that is the broker that your View Client software will connect to, and will be available to users inside your network to connect to (e.g. “view.corp.lan”).  This is a domain joined system, and should probably reside on your primary LAN network. 

Security Server.  This is another VM that is running the Connection Server component but in a slightly different mode.  It’s purpose is to act as the secure proxy for external clients to connect to your internal systems presented by View.  It should only be used for clients outside of your network to connect to (e.g. “view.yourcompany.com”).  This server should not be joined to the domain, and should reside in a secure DMZ segment of your network that you have full granular control of ingress and egress traffic.

Composer.  This is the server that is behind the magic of using a minimal storage footprint to issue out many Virtual Desktops – by way of linked clones.  For my simple deployment, I chose to install it on my VM running vCenter.  I also chose to stick to the basics when it came to configuring Composer, as I really wanted to focus on the behavior of the View Client itself, not experiment how best to deploy hundreds of desktops with linked clones.  Composer needs a SQL Server, so you can use the database server that serves up vCenter, or something else if you wish.

Transfer Server.  This is a dedicated server that is used for “offline mode” laptop users or “Bring your Own PC” (BYoPC) arrangements.  Its optional, but gives the ability for a laptop user who normally uses a VDI VM that traditionally lives on the infrastructure, to be “checked out” to the laptop.  It uses all of the local resources of the laptop, and when/if it has the ability to phone home via an internet connection, it can send and update of the VM.  It is an optional feature, but a compelling one for those who might ask, “but what if I need my Virtual Desktop, and I have no internet connection?” or if you are considering a BYOPC arrangement.  It does not have to be joined to the domain, and its placement depends on the use case.

View Agent.  This is the tiny bit of software installed on a system that you want VMware View to present to a user.  In its simplest form, it can just be loaded onto an existing VM, or something served up from Composer.  If you need really high end horsepower with perhaps intensive graphics, it can be installed on a physical workstation with a PCoIP host card, so that VMware View can serve it up just like any other resource (more on this in a later post).  Right now, the agent software can only be installed on a Windows based system. 

View Client.  This is the application used by the end user to connect to their Virtual Desktop.  Currently the official client software is limited to Windows based machines, and iPads.  However, there is a Technology Preview edition for Mac OSX, and some distributions of Linux.  Note that if you are using a PCoIP Zero client, such as a Wyse P20, there is no software that you have to install on it.  Just connect it to the network, and connect to the internal View Connection Server, and you are good to go.

Interestingly enough, If you have a need to access your VDI systems from outside of your LAN, (thus requiring you to run the Security Server in a DMZ), you will need a separate/additional VM (domain joined) on the inside of your LAN that is running the Connection Server software, and is used exclusively by the Security Server.  The reason it cannot use the Connection Server above that was designated for internal use is that there are toggles that must be set based on the type of connection that is going to occur.  These toggles are mutually exclusive. (I’ll address this more on later posts)  You can try to use just one Connection Server coupled with a Security Server for both internal and external access, but traffic will not flow efficiently, and your performance will be affected by this. 

I think VMware might need to review their architecture of the Connection Servers.  Having three VM’s (which doesn’t even include the View Transfer Server) serve up internal and external access probably doesn’t seem like much when you have hundreds of clients, but for a pilot or a smaller deployment, it is overkill, and could be reduced to at least two with some tweaks in the Connection Server software.  Perhaps it wouldn’t be that big of a deal if they were just VM appliances.  (hint hint…)

The Software
I chose the “View Premier Starter Pack” which included all of the things needed to get started. View Composer, ThinApp, View Persona Management, vShield Endpoint, and “View Client for Local Mode”  Also included is a license of vSphere Enterprise Plus for Desktop Standalone licenses, and vCenter (special licensing considerations apply).  The result is that you get to try out VDI on a single vSphere host, with up to 10 VDI clients, and all of the tools necessary to experiment with the features of View.  From there, you would just purchase additional client licenses or vSphere host licenses as necessary.  It is really a great bundle, and an easy way to begin a pilot project with everything you need.

Placement of systems
More than likely, you will be presenting these systems to the outside world for some of your staff.  This is going to involve some non-VMware related decisions and configurations to get things running correctly.  Modifications will need to be made with your Firewall, DNS, SQL, vSphere, and AD.  If you are a smaller organization, the Administrator may be the same person (you).  Daunting, but at least you cut out the middle man.  That’s the way I like to think of it.  Nevertheless, you will want to at least exercise some care in the placement of systems to minimize performance issues, making sure traffic is running efficiently, in a secure manner.

There is going to be quite a bit of communication between the Connection servers, vCenter, the systems running the agent, and the clients running the View Client.  VMware publishes a nice document on all of the ports needed for each type of connection.  It’s all technically there, but does take a little time to decipher.  I stared at this document endlessly, while keeping a close I on my real-time firewall logs as I attempted to make various connections.  I really think I needed the two in order to get it working. 

As mentioned earlier, you should really have the Security Manager VM sitting off in a protected network segment off of your firewall.  For me, my primary edge security device is a Microsoft ForeFront Threat Management Gateway (TMG) Firewall software running on a nicely packaged appliance by Celestix Networks.  I’ve said it before, and I’ll say it again.  It is such a shame that Microsoft does not get its due credit on this.  My security friends in high places have consistently stated that there is simply nothing better when it comes to robust, flexible protection across the protocol and application stacks than TMG.  I feel fortunate to be working with the good product that it is.  The information that I share throughout this series will be based on use of TMG, but the overall principals will be similar to other edge security solutions.

How much you need to modify your firewall depends on how you place some of these systems.  Since these View related components will be interacting with vCenter, you will have to plan accordingly.  Some vSphere environments simply have their vSphere Management Network (aka Service Console network) simply behind a VLAN and routed directly to the LAN, while others have it on a separate network leg only accessible by a physical router or firewall.  I fall in the latter category, as my old switches never had the capability to do interVLAN routing.  Happily, these old switches have since been replaced, but I my vSphere Management network stills resides in an isolated network leg, and my Firewall settings for VMware View will accommodate for that.

Early impressions
While I wanted to deploy the systems correctly, and minimize floundering, I also wanted to see some early results.  After I built up the systems to Manage the environment, and worked out the connectivity issues, I thought I’d go ahead and install the View Agent on a few of the existing VMs used by our Software Developers located on another continent.  These Developers have played a very important role to the companies development efforts, so anything I could do to improve their user experience was a plus.  Previously, their work environment consisted of their own laptop connected to our network via a PPTP based VPN, then using RDP to connect up to a VM provisioned for them.  Their VM has all of their Development applications and tools, which allows for the data to be kept in the datacenter.  Oh, and their typical latency?  Around 280ms on average.  Yeah, their connection is that terrible. 

So my first tests really involved nothing more than installing the VMware View agent on their VM, then giving them the web address to download the VMware View Client, and a few steps on how to connect.  I also reminded them not to log into the VPN, as that was no longer necessary.  I asked them to try both RDP and PCoIP over their connection, along with some anecdotal feedback whenever they had a chance. 

  • Within 20 minutes they responded saying things like, “At first glance it works awesome” and “Much much faster than VPN + RDP.
  • That same day they told the rest of their team members about it, and during a Development review, stated, “please don’t take it away!
  • Further feedback included comments like “This feels at least 10x faster than the older method” along with “I wouldn’t dream of playing a video on the old RDP, and rotating a 3D view in our software would’ve made you want to bang your head against a wall.  This [new] way is great.
  • Their impressions over their highly latent connection was that using the View client with PCoIP was much more responsive than the View client with RDP.

What is really interesting was that I didn’t make a single modification to tune PCoIP, nor did I adjust their VMs in any way.  That is the power of a new approach to remote display rendering.  I do not know the exact thresholds that Teradici (the makers of the PCoIP protocol) were shooting for when it came to latency, but I’m willing to bet that nearly 300ms of latency was outside of their typical use case.  It is quite the compliment to the power of PCoIP.  Needless to say, once it was tested, I couldn’t bear to take this away from the Remote Developers, so I did my best to work my deployment around their work days.  This was really exciting because these were just normal VM’s that I was serving up.  I didn’t have a chance at this point to serve up the high end workstation with the VMware View Agent installed on it – something I was really looking forward to.

Some more lessons and observations
I’ve witnessed what others have experienced.  It is far too easy to let a pilot project turn into a production environment.  FAST.  With reactions like I shared above, it happens right before your eyes.  The neat features that were demonstrated were must-haves almost overnight.  No wonder so many small VDI deployments ultimately suffer in some form or another as they grow.  So if you are contemplating even a pilot deployment, keep this in mind.  I’m doing my best to contain expectations until I have one of the key ingredients in place; fast storage targeted for IOPS hungry VDI deployments.  For me this means a Dell EqualLogic PS6100XV hybrid array.  Until that time, I am not going to even consider expanding this beyond the 10 instances that comes with the View Premier Bundle.

It was nice to see that during my early deployment, that VMware released a “Technology Preview” edition of the View 5.0 Client for Mac OSX, as well as a version for Ubuntu.  It is a nice step forward, but it doesn’t go far enough.  A fully supported View Client needs to be provided for Redhat clones, and debian based linux distributions.  Likewise, the View Agent (the bit of software installed on the VM or the physical workstation that makes the system available via View) is limited to only Windows Operating Systems.  It is my position that having an agent for Linux distributions would serve VMware quite well – much better than they even know.  I’m hoping this is in their feature backlog.

Next up, I’ll be going over some typical View Administrator configuration settings, Firewall settings to get View and PCoIP flowing correctly, as well as my experimentation with a physical workstation with a PCoIP host card and a VMware View Agent installed so that it can be brokered by VMware View.

 

Helpful links
A recently released post by VMware describing exactly the benefit with hardware based systems that we are trying to exploit.
http://blogs.vmware.com/euc/2012/01/enhancing-graphics-processing-with-teradici-pcoip-host-cards-and-vmware-view.html

VMware View 5.0 Documentation Portal
http://www.vmware.com/support/pubs/view_pubs.html

A comprehensive guide to ports needed to be opened with View 5 and it’s related services
http://pubs.vmware.com/view-50/topic/com.vmware.ICbase/PDF/view-50-security.pdf

The readme for the View Client Tech Preview for Mac OSX
http://communities.vmware.com/docs/DOC-17916

EqualLogic hybrid arrays in VDI testing
http://en.community.dell.com/techcenter/b/techcenter/archive/2011/02/23/equallogic-ps6000xvs-hybrid-arrays-shine-in-vdi-testing.aspx

VDI for me. Part 1

 

It doesn’t take long for those in IT to see the letters “VDI” show up on the radar.  Maybe a morning digest of RSS or twitter feeds might be all it takes.  But the subject of Desktop Virtualization can be a difficult puzzle to piece together if you haven’t already invested in it.  There is a lot to sort through, even if you are contemplating just a small pilot project.  VDI deployments can be a highly visible IT project (in user expectations, experiences, and financial commitment), so if you are contemplating a pilot project, you’ll want to stack the odds of success in your favor, and make that first impression a good one.  As I embarked on my own VDI project, I noticed that while there was much to be found on the net regarding VDI in general, one thing seemed remarkably absent; information on how pilot projects are approached, and actually deployed.  My posts won’t be dissecting any CAPEX or OPEX numbers, nor will it attempt to prove that VDI is the perfect solution for every scenario.  But rather, I want to give you a glimpse to what a VDI pilot project actually looks like. 

As I mentioned on previous posts, my recent efforts to upgrade my hypervisor to vSphere 5 was for one real purpose.  I wanted to deploy a Proof of Concept (PoC) for VDI in our environment using VMware View 5.  The company I work for makes industry leading data visualization and simulation analytics software.  Our customers, who are often scientists and engineers, need to understand simulation results to make smart design decisions.  The software is typically installed on a user’s local workstation (Windows, Linux, Mac, or Unix) and interacts with data that is either on the network, or if performance dictates, their local system.  The nature of the problems that are trying to be solved, along with our software, can demand high computational horsepower with high-end graphics.  Many in the CAE/CAD industries are familiar with the demands of the local workstation performing the work.  But there are some scenarios with VDI that make it quite compelling that are not brought up very often.  We wanted to understand those benefits better.

Much of the interest internally transpired from a technology lunch-and-learn session I gave to our staff.  My motives for doing such a thing were two-fold.  1.)  I wanted everyone interested (not just key stakeholders, but software developers, technical support, etc.) to be aware of some things I knew they weren’t aware of, and 2.), I thought it would be fun.  I covered anything from how virtualization has changed the Datacenter, to demonstrating how common assumptions about our own internal systems, as well as our customers, may be highly misguided based off of new technologies.  While the focus was the Datacenter, I translated much of that into how these changed might affect end users.  It was a tremendous success, and even required an encore presentation for those who missed it.  This piqued the interest of the key stakeholders, as they understood it could have a significant impact on our product roadmap.  Combine this with the impressive “technology previews” of things like “AppBlast” and “Octopus” at the 2011 VMworld, as well as recent partnerships announced by VMware and NVIDIA for providing hardware based GPU acceleration to hypervisors, and we had something we needed to look into.

Objectives 
Due to the type of software we make, we had some unique use cases that we not only wanted to understand better ourselves, but perhaps provide our customers with solutions on a better way to do things.  I had several objectives I wanted to achieve.

  • Understand how we could improve on our own internal efficiencies and toolset to our staff.  We have taken a fairly aggressive approach already to provide ubiquitous access to everyone in our office (tablets for everyone, and their use encouraged in meetings, etc.).  We also have a great team of software developers on another continent, and are always looking for new ways to integrate them with our systems.  We needed to look at tools that not only help that, but help mitigate the the operational expenses and security challenges of traditional workstations to end users.  These seem to be the mostly common reasons cited when organizations are looking into VDI.
  • Demonstrate to our customers possibly “a better way” to work.  Believe it or not, many consumers of applications that have large data still either download the data locally, then work on it, or rely on local high speed networks to transmit the data.  What happens if the data, and the system used to present the data lived in the Datacenter?  These days, under a virtualized infrastructure, network communications may never even see an Ethernet cable or a physical switch, and the results can be impressive.  This is the benefit that most of us have enjoyed while virtualizing our server infrastructure.  So if the new type of client server arrangement only had to deal with transmitting screen data, then a world of possibilities opens up.
  • Demonstrate our software.  We rely heavily on feedback from new and existing customers on our products, and we need them to try our software.  If your customer base runs on extremely sensitive or secure networks, you’ll find that a pattern emerges.  They won’t let you install trial or pre-release software on any of their machines.  We’ve resorted to some pretty crude methods to help them test our software (shipping some old laptops), but obviously, this doesn’t scale well, and frankly, doesn’t give a good impression for a company who strives for innovation.  
  • Understand new display protocols, and how this could impact how datacenters are constructed.  Remote displays are a big concern in the visualization, CAE, and CAD worlds.  Years of dealing with various display protocols have resulted in similar experiences.  Most share the trait of using connection oriented protocols to transmit rendering instructions.  They tend to suffer on connections beyond the LAN that may have high latency and packet loss.  There are new approaches to help change this.

GPUs, Remote Displays, and Client Server
Most heavy hitting graphics software packages have historically relied HEAVILY on GPU power provided by a real video card.  This has been under the assumption that they’d always be there, and that GPUs are incredibly powerful.  Well, as you can imagine, that doesn’t play to the strengths of traditional virtualization, where video has barely been an afterthought (at least so far).  Not a big deal when you were virtualizing your server infrastructure, but a pretty big deal when you are trying to virtualized desktops.  Task workers may never notice the difference, but users of high end graphics will.

Remote display protocols come in a variety of forms.  Some are definitely better than others, but most suffer from fundamental issues of being based on TCP.  The rules of TCP guarantee reliable and orderly delivery of packets to be reassembled at the target.  Network communication wouldn’t have gotten very far if it didn’t have a way to guarantee delivery, and while this is great for most things, it doesn’t shine when it comes to video, animations, or high speed screen refreshes.

So why would a software company be so curious about this new set of technologies? Technologies that for the most part, we have no control of?  The answer is that it can change how one architects software. For example:

  • Sometimes there are calls for a true two-piece, client server software, where the client application and the server application are working together to perform their own dedicated tasks, but in different locations.  The goal with this type of client server arrangement was to use recourses as efficiently as possible. It is not as common as it once was, and history has demonstrated that this approach can be complex, wrought with problems, and very expensive to develop and maintain, especially when dealing with heterogeneous environments.  It may work in some environments, but not in others.  It may or may not use chatty protocols that simply don’t work well over poor network conditions. The further the two are separated, the more you introduce issues.  Firewalls, network traversal, etc. all complicate matters.
  • The other form of client server that people are most familiar with is where the application and computing power is in one location (e.g. The desktop) while the files being accessed are on a file server.  While it is very clear what is doing the work, it begins to falter with large data, or traversing geographic boundaries.  Look at the traffic generated by opening up a modest sized spreadsheet over a VPN connection, and you’ll see what I’m talking about.

With the trends on virtualizing infrastructures, the datacenters have collapsed to be more centrally located. If the client system (a workstation, or in this case, a VM) is now in the same virtual space as where the data lives (the server, or in this case, a VM), it can have an impact on application design.  Instead of using “client rendering” the model uses “on-host” rendering, where all of the heavy lifting is performed inside the Datacenter. Meanwhile, the endpoint devices, such as laptops, tablets, zero or thin clients, which do not need to have much horsepower, are really just acting in a presentation type role. It’s a bit like wondering how much CPU power your TV has.  Answer… It doesn’t matter, as the score of the baseball game is being rendered on-host.

The Plan
I wanted to make sure that I wasn’t claiming that VDI will change everything, and that every single person will be working off of a zero client in 12 months.  Nor was I suggesting that running a virtual desktop on a tablet is the ideal interface for interacting with a desktop.  That wasn’t my point.  It is ultimately how applications are delivered to the end user.  I wanted to help everyone understand that for the majority of our customers, they simply work on the systems that are provided to them by their IT Department.  Rarely does an end user get line-item-veto power on what they use for a computer, and since we know IT infrastructures are changing, the evidence suggests that what the customers will be using is going to change as well.  In fact, there is pretty good likelihood that we will have a customer who shows up one day to their office with nothing more than a zero client in front of them.

The idea behind the PoC was to invest a minimum amount to provide as much information possible to make smart decisions in the future.  In that spirit, I also plan on revealing results on these posts along the way as well.  I knew my initial project wasn’t going to include the time to look into every feature of a fully deployed VDI.  I was going to stick to the basics, and see how they work.

  • Test access to VM’s using VMware View as the connection broker (via PCoIP, and RDP)
  • Test access to high end physical workstations with VMware View as the connection broker. (via PCoIP, and RDP)
  • Test these systems from a variety of endpoints.  From existing PC’s, to zero clients, to Linux desktops and Mac clients, to wireless tablet devices and smart phones.
  • Test these systems from a variety of connection scenarios.  A connection to something on the LAN is very different than something on another continent. 

Bullet number 2 may have caught your eye.  Creating a unified remote display experience isn’t limited to just virtual machines.  With the power of PCoIP, and using VMware View as a connection broker, one can house high end workstations IN the Datacenter.  They would be a 1:1 assignment (1 active user to 1 workstation) like a traditional workstation, but they’d be close to the storage (perhaps even directly connected to the SAN if desired), and offer the full GPU capabilities of the workstation.  Access to them would be no different than if they were VMs.  In fact, the end user may never know if it is physical, or virtual.  It’s a provocative thought for users of CAD, solid modeling, or simulation analytics software.

Components of my Pilot Project

  • VDI software.  VMware View 5 Premier (their 10 client “starter pack”) running on vSphere 5.0 infrastructure.  This would offer the abilities of the connection broker, the variety of connection protocols (PCoIP, and RDP) among other features.
  • Back-end storage.  For our small pilot project, this was limited to our existing Dell EqualLogic SAN arrays.  They would be fine for this small pilot.  However, I have no illusions about the storage I/O requirements of VDI at a larger scale, and hope that if things go well, a super-fast EqualLogic hybrid SSD/SAS array is in our future.  More on the subject matter of storage and it’s importance to the success of VDI in future posts.
  • Firewalls.  Not to be overlooked, this plays a significant role in how you can present, and secure content.  I have the good fortune to be working with what I believe to be the best of breed.  Microsoft Forefront Threat Management Gateway (TMG) 2010 running on a Celestix MSA appliance.  (more on this in future posts)
  • HTML5 presenter.  While View 5 doesn’t natively support HTML5, I wanted to see what this was like.  Not only would give aid the ability to evaluate our software, but give our developers some insight on how HTML5 may play a part in virtualized, application delivery. (e.g. AppBlast.  Gee, can you tell I’m antsy to get me hands on this?) For this experiment, I’ll be using Ericom’s Access Now for VMware View.
  • Zero Client.  For this, I will be using a Wyse P20.
  • PCoIP host card.  This is an eVGA HD02 PCoIP host card installed on a higher end workstation I will be using to performance against a high end workstation sitting remotely in the Datacenter, and brokered by VMware View.
  • My Primary Site, and my Colocation site.  The CoLo site is not used for anything other than where my offsite SAN replicas go to. My plan is to change that.  Long term intentions are to house services that are more appropriate for that location, while in the near term, I will be housing part of my VDI pilot there.

Early lessons learned
If you are reading this post, more than likely you are well entrenched in the world of virtualization.  Never underestimate that the lack of knowledge in this arena by your coworkers or stakeholders.  About 3 years ago, I started giving a monthly IT review to everyone in our company on what is going on in IT.  This helps dispel the myths behind the giant IT curtain, and possible gives some insight as to the complexities of modern environments.  But no matter how much information I provide, I’m constantly challenged by these technologies, with the occasional question of, “now who is VMware, and what do they do?”  or “What’s a SAN?"  Be prepared to repeat your message several times over.

I would also emphasize VDI as not being an either/or scenario.  It is another form of a computing environment that provides unique capabilities to deliver applications and content.  We know that there are many vehicles for this already, and it continues to evolve.  So in other words, no need to make bold claims about VDI.  This also keeps you out of the business of predicting the future – not a favorable occupation in my book. 

As you have the opportunity to have users try out various use cases, you may have to throttle any over exuberance on the user’s behalf.  Large deployments are different than pilots.  The end user may see the brilliance of the solution, while the budget line owners see nothing but the large capital investment. 

Coming up
In upcoming posts, I’ll share how I chose to design a pilot VDI arrangement for our testing internally, externally, and how we plan to use it for our own internal needs, as well as our customers.  I have no idea how many parts this series is going to be, so bear with me.  What I hope to do is to give you a better understanding of a VDI pilot project in the real world, providing enough detail to be helpful with your own planning.

Resources
The VMware, NVIDIA Partnership announcement
http://www.vmware.com/company/news/releases/vmw-vmworld-emea-nvidia-joint-10-19-11.html

Planning for VDI has little to do with the Desktop
http://whiteboardninja.wordpress.com/2011/01/24/planning-for-vdi-has-little-to-do-with-the-desktop/ 

VMware vSphere 4.1 Networking Performance
http://www.vmware.com/files/pdf/techpaper/Performance-Networking-vSphere4-1-WP.pdf

PCoIP FAQ’s
http://www.teradici.com/pcoip/pcoip-technology/pcoip-faqs.php

Tips for using Dell’s updated EqualLogic Host Integration Tools – VMware Edition (HIT/VE)

Ever since my series of posts on replication with a Dell EqualLogic SAN, I’ve had a lot of interest from other users wondering how I actually use the built-in tools provided by Dell EqualLogic to protect my environment.  This is one of the reasons why I’ve written so much about ASM/ME, ASM/LE, and SANHQ.  Well, it’s been a while since I’ve touched on any information about ASM/VE, and since I’ve updated my infrastructure to vSphere 5.0 and the HIT/VE 3.1, I thought I’d share a few pointers that have helped me work with this tool in my environment.

The first generation of HIT/VE was really nothing more than a single tool referred to as “Auto-Snapshot Manager / VMware Edition” or ASM/VE.  A lot has changed, as it is now part of a larger suite of VMware-centric tools from EqualLogic called the Host Integration Tools / VMware Edition or HIT/VE.  This consists of the following; EqualLogic Auto-Snapshot Manager, EqualLogic Datastore Manager, and the EqualLogic Virtual Desktop Deployment Utility.  HIT/VE is one of three Host Integration toolsets.  The others being HIT/ME and HIT/LE for Microsoft and Linux respectively.

Ever since HIT/VE 3.0, Dell EqualLogic thankfully transitioned toward and appliance/plug-in model.  This reduced overhead, complexity, and removed some of the quirks with the older implementations.  Because I had been lagging behind in updating vSphere, I was still using 2.x up until recently, and skipped right over 3.0 to 3.1.  Surprisingly, many of the same practices that have served me well with the older version adopt quite well to the new version.

Let me preface that these are just my suggestions off of personal use with all versions of the HIT over the past 3 years.  Just as with any solution, there are a number of different ways to achieve the same result.  The information provided may or may not align with best practices from Dell, or your own practices.  But the tips I provide have stood up to the rigors of a production environment, and have actually worked in real recovery scenarios.  Whatever decisions you make should compliment your larger protection strategies, as this is just one piece of the puzzle.

Tips for Configuring and working with the  HIT/VE appliance

1.  The initial configuration will ask for registration in vCenter (configuration item #8 on the appliance).  You may only register one HIT/VE appliance in vCenter.

2.  The HIT/VE appliance was designed to integrate with vCenter.  But it also offers the flexibility of access.  After the initial configuration, you can verify and modify settings in the respective ASM appliances by browsing directly to their IP address, FQDN, or DNS alias name.  You may type in: http://%5BapplianceFQDN%5D or for the Auto-Snapshot Manager, type in http://%5BapplianceFQDN%5D/vmsnaptool.html

3.  Configuration of the storage management network on the appliance is optional, and depending on your topology, may not be needed.

4.  When setting up replication partners, ASM will ask for a “Server URL”  This implies you should enter an “http://” or “https://”  Just enter in the IP address or FQDN without the http:// prefix.  A true URL as it implies will not work.

5.  After you have configured your HIT/VE appliances, run back through and double check the settings.  I had two of them mysteriously reset some DNS configuration during the initial deployment.  It’s been fine since that time.  It might have been my mistake (twice), but it might not.

6. For just regular (local) SmartCopies, create one HIT/VE appliance.  Have the appliance sit in its own small datastore.  Make sure you do not protect this volume via ASM. Dell warns you about this.  For environments where replication needs to occur, set up a second HIT/VE appliance at the remote site.  The same rules apply there.

7.  Log files on the appliance are accessible via Samba.  I didn’t discover this until I was working through the configuration and some issues I was running into.  What a pleasant way to to pull the log data off of the appliance.  Nice work!

Tips for ASM/VE

8.  Just as I learned and recommended in 2.x, the most important suggestion I have to successfully utilizing ASM/VE in your environment is to arrange vCenter folders to represent the contents of your datastores.  Include in the name some indicated of the referencing volume/datastore (seen in the screen capture below, where “103” refers to a datastore called VMFS103.  The reason for this is so that you can keep your smartcopy snapshots straight during creation.  If you don’t do this, when you make a SmartCopy of a folder containing VM’s that reside in multiple datastores, you will see SAN snapshots in each one of those volumes, but they didn’t necessarily capture all of the data correctly.  You will get really confused, and confusion is not what you need when understanding the what and how of recovering systems or data.

image

9.  Schedule or manually create SmartCopy Snapshots by Folder.  Schedule or manually create SmartCopy Replicas by dataStore.  Replicas cannot be created by vCenter Folder.  This strategy has been the most effective for me, but if you didn’t feel like re-arranging your folders in vCenter, you could schedule or manually create SmartCopy Snapshots by datastore as well.

10.  Do not schedule or create Smartcopies by individual machine.  This will get confusing (see above), and may interfere with your planning of snapshot retention periods.  If you want to protect a system against some short term step (e.g. installing service pack, etc.), just use a hypervisor snapshot, and remove when complete.

11.  ASM/VE 2.x was limited to making smart copies of VM’s that had vmdk files all in the same location.  3.x does not have this limitation.  This offers up quite a bit of flexibility if you have VM’s with independent vmdks in other datastores.

12. Test, and document. Create a couple of small volumes, large enough to hold 2 test VM’s in each.  Make a SmartCopy of the VMWare folder where those VM’s reside.  Do a few more SmartCopies, then attempt a restore.  Test.  Add a vmdk in another datastore to one of the VM’s then test again.  This is the best way to not only understand what is going on, but to have no surprises or trepidation when you have to do it for real.  It is especially important to understand how the other VM’s in the same datastore will behave, and how VM’s with multiple vmdks in different datastores will act, as well as what a “restore by rollback” is.  And while you’re add it, make a OneNote or Word document outlining the exact steps for recovery, and what to expect.  Create one for local SmartCopies, and another for remote replicas.  This avoids not thinking clearly under the heat of the moment.  Your goal is to make things better by a restore, not worse. Oh, and if you can’t find the time to document the process, don’t worry, I’m sure the next guy who replaces you will find the time.

13.  Set snapshot and replication retention numbers in ASM/VE.  This much needed feature was added to the 3.0 version.  Tweak each snapshot reserve to a level that you feel comfortable with, and that also matches up against your overall protection policies.  There will be some tuning for each volume so that you can offer the protection times needed, without allocating too much space to snapshot reserves.  ASM will only be able to manage the snapshots that it creates, so if you have some older snaps of a particular datastore, you may need to do a little cleanup work.

14.  Watch the frequency!!!  The only thing worse than not having a backup of a system or data, is to have several bad copies of it, and to realize that the last good one just purged itself out.  A great example of this is something going wrong on a Friday night.  You maybe don’t notice it mid-day on Monday.  But your high frequency SmartCopies only had room for two days worth of changed data.  With ASM/VE, I tend to prefer very modest frequencies.  Once a day is fine with me on many of my systems.  Most of the others that I like to have more frequent SmartCopies of have the actual data on guest attached volumes.  Early on in my use, I had a series of events that were almost disastrous, all because I was overzealous on the frequency, but not mindful enough of the retention.  Don’t be a victim of the ease at cranking up the frequency at the expense of retention.  This is something you’ll never find in a deployment or operations guide, and applies to all aspects of data protection.

15.  If you are creating SmartCopy snapshots and SmartCopy replicas, use your scheduling an opportunity to shrink the window of vulnerability.  Instead of running a replica right after a snapshot each once a day, right after eachother, split the difference so that the replica runs in between the the last SmartCopy snapshot, and the next one.

16.  Keep your SmartCopy and replica frequencies and scheduling as simple as possible.  If you can’t understand it, who will?  Perhaps start with a frequency rate of just once a day for all of your datastores, then go from there.  You might find a frequency such as once a day might work for 99% of your systems.  I’ve found that for most of my data that I need to protect at more frequent intervals, those are on guest attached volumes anyway, and I schedule those up via ASM/ME to meet my needs.

17.  For SmartCopy snapshots, I tend to schedule them so that there is only one job on one datastore at a time.  With the next one scheduled say 5 minutes afterward.  For SmartCopy replicas, if you choose to use free pool space, instead of replica reserve (as I do), you might want to offset those more, so that the replica has time to fully complete in order for the space held by the invisible local replica can be reclaimed for the next job.  Generally this isn’t too much of an issue, unless you are really tight on space.

18.  The SmartCopy defaults have been changed a bit since ASM/VE 2.x. No need to tick any of the checkboxes such as “Perform virtual machine memory dump” and “Set created PS Series snapshots online” In fact, I would untick the “Included PS Series volumes access by guest iSCSI initiators” More info on why below.

19.  ASM/VE still gives you the option to snapshot volumes that are attached to that VM via guest iSCSI initiators.  In general, don’t do it.  Why? If you chose to use this option for Microsoft based VM’s, it would indeed make a snapshot, giving you the impression that all is well, but these would not be coordinated with the internal VSS writer inside the VM, so they are not truly application consistent snapshots of the guest volumes.  Sure, they might work, but they might not.  They may also interfere with your retention policies in ASM/ME.  Do you really want to take that chance with your Exchange or SQL databases, or flat file storage?  If you think flat file storage isn’t important to quiesce, remember that source code control systems like Subversion typically use file systems, and not a database.  It is my position that the only time you should use this option is if you are protecting a Linux VM with guest attached volumes.  Linux has no equivalent to VSS, so you get a free pass on using this option.  However, because this option is a per-job definition, you’ll want to separate Windows based VM’s with guest volumes from Linux based VM’s with guest volumes.  If you wanted to avoid that, you could just rely on on a crash consistent copy of that linux guest attached volume via a scheduled snapshot in the Group Manager GUI.  So the moral of the story is this.  To protect your guest attached volumes in VM’s running Windows, rely entirely on ASM/ME to create a SmartCopy SAN snapshot of your guest attached volumes.

20.  If you need to cherry-pick a file off of a snapshot, or look at an old registry setting, consider restoring or cloning to another volume, and make sure that the restored VM does not have any direct access to the same network that the primary system is running.  Having a special portgroup in vCenter that is just for this purpose works nice.  Many times this method can be the least disruptive to your environment.

21.  I still like to have my DC’s in individual datastores, on their own, and create SmartCopy schedules that do not occur simultaneously.  I found that in practice, our very sensitive automated code compiling system which has dozens (if not hundreds) of active ssh sessions ran into less interference this way compared to when I initially had them in one datastore, or intertwined in datastores with other VMs.  Depending on the number of DCs you have, you might be able to group a couple together, with perhaps splitting off the DC running the PDC emulator role into a separate datastore.  Beware that the SmartCopy for your DC should just be considered as a way to protect the system, not AD.  More info on my post about protecting Active Directory here.

Tips for DataStore Manager

22.  The Datastore Manager in vCenter is one of my favorite new ways to view my PS Group.  Not only do you get a quick check on how my datastores look (limiting the view to just VMFS volumes), but it also shows which volumes have replicas in flight.  It has quickly become one of my most used items in vCenter.

23.  Use the ACL policies feature in Datastore Manager. With the new integration between vCenter and the Group Manager, you can easily create volumes. The ACL policies feature in the HITVE is a way for you to save a predetermined set of ACL’s for your hosts (CHAP, IP, or IQN).  While I personally prefer using IQN’s, any combination of the three will work.  Having an ACL policy is a great way to provision the access to a volume quickly.  If you are using manually configured multi-pathing, it is important to note that creating datastores by this way will using a default pathing of “VMWare fixed.”  You will need to manually change that to “VMWare Round Robin.”  I am told that if you are using the EqualLogic Multi-pathing Extension Module (MEM), that this will be set to the proper setting.  I don’t know that for sure because MEM hasn’t been released for vSphere 5.0 as of this writing.

24.  VMFS5 offers some really great features, but many of them are only available if they were natively created (not upgraded from VMFS3).  If you choose to recreate them by doing a little juggling with Storage vMotion (as I am), remember that this might wreak havoc on your replication, as you will need to re-seed the volumes.  But if you can, you are exposed to many great features of VMFS5.  You might also use this as an opportunity to revisit your datastores and re-arrange if necessary.

25.  If you are going to redo some of your volumes from scratch (to take full advantage of VMFS5), if they are replicated, redo the volumes with the highest change rate first.  They’re already pushing a lot of data through your pipe, so you might as well get them taken care of first.  And who knows, your replicas might be improved with the new volume.

Hopefully this gives you a few good ideas for your own environment.  Thanks for reading.

Upgrading to vSphere 5.0 by starting from scratch. …well, sort of.

It is never any fun getting left behind in IT.  Major upgrades every year or two might not be a big deal if you only had to deal with one piece of software, but take a look at most software inventories, and you’ll see possibly dozens of enterprise level applications and supporting services that all contribute to the chaos.  It can be overwhelming for just one person to handle.  While you may be perfectly justified in holding off on specific upgrades, there still seems to be a bit of guilt around doing so.  You might have ample business and technical factors to support such decisions, and a well crafted message providing clear reasons to stakeholders.  The business and political pressures ultimately win out, and you find yourself addressing the more customer/user facing application upgrades before the behind-the-scenes tools that power it all.

That is pretty much where I stood with my virtualized infrastructure.  My last major upgrade was to vSphere 4.0.  Sure, I had visions of keeping up with every update and patch, but a little time passed, and several hundred distractions later, I found myself left behind.  When vSphere 4.1 came out, I also had every intention of upgrading.  However, I was one of the legions of users who had a vCenter server running on a 32bit OS, and that complicated matters a little bit.  I looked at the various publications and posts on the upgrade paths and experiences.  Nothing seemed quite as easy as I was hoping for, so I did what came easiest to my already packed schedule; nothing.  I wondered just how many Administrators found themselves in the same predicament; not touching an aging, albeit perfectly fine running system.  

My ESX 4.0 cluster served my organization well, but times change, and so do needs.  A few things come up to kick-start the desire to upgrade.

  • I needed to deploy a pilot VDI project, fast.  (more about this in later posts)
  • We were a victim of our own success with virtualization, and I needed to squeeze even more power and efficiency out of our investment in our infrastructure.

Both are pretty good reasons to upgrade, and while I would have loved to do my typical due diligence on every possible option, I needed a fast track.  My move to vSphere 5.0 was really just a prerequisite of sorts to my work with VDI. 

But how should I go about an upgrade?

Do I update my 4.0 hosts to the latest update that would be eligible for an upgrade path to 5.0, and if so, how much work would that be?  Should I transition to a new vCenter server, migrating the database, then run a mixed environment of ESX hosts running with different versions?  What sort of problems would that introduce?  After conferring with a trusted colleague of mine who always seems to have pragmatic sensibilities when it comes to virtualization, I decided which option was going to be the best for me.  I opted not to do any upgrade, and simply transition to a pristine new cluster.  It looked something like this:

  • Take a host (either new, or by removing an existing one from the cluster), and build it up with ESXi 5.0.
  • Build up a new 64bit VM for running a brand new vCenter, and configure as needed.
  • Remove one VM at a time from the old cluster by powering them down, remove from inventory, add to the new cluster.
  • Once enough VM’s have been removed, take another host, remove from the old cluster, rebuild as ESXi 5.0, and add to the new cluster.
  • Repeat until finished.

For me, the decision to start from scratch won out.  Why?

  • I could build up a pristine vCenter server, with a database that wasn’t going to carry over any unwanted artifacts of my previous installation.
  • I could easily set up the new vCenter to emulate my old settings.  Folders, EVC settings, resource pools, etc.
  • I could transition or build up my supporting VM’s or appliances to my new infrastructure to make sure they worked before committing to the transition.
  • I could afford a simple restart of each VM as I transitioned it to a new cluster.  I used this as an opportunity to update the VMware Tools when added to the new inventory.
  • I was willing to give up historical data in my old vSphere 4.0 cluster for the sake of simplicity of the plan and cleanliness of the configuration.
  • Predictability.  I didn’t have to read a single white paper or discussion thread on database migrations or troubles with DSNs.
  • I have a well documented ESX host configuration that is not terribly complex, and easy to recreate across 6 hosts.
  • I just happened to have purchased an additional blade and license of ESX, so it was an ideal time to introduce it to my environment.
  • I could get my entire setup working, then get my licensing figured out after it’s all complete.

You’ll notice that one option similar to this approach would have been to simply remove a host of running VM’s out of the existing cluster, and add it to the new cluster.  This may have been just as good of a plan, as it would have avoided the need to manually shut down and remove each VM one at a time during the transition.  However, I would have needed to run a mix of ESX 4.0 and 5.0 hosts in the new cluster.  I didn’t want to carry anything over from the old setup.  I would have needed to upgrade or rebuild the host anyway, and I had to restart each VM to make sure it was running the latest tools.  If for nothing other than clarity of mind, my approach seemed best for me.

Prior to beginning the transition, I needed to update my Dell EqualLogic firmware to 5.1.2.  A collection of very nice improvements made this a nice upgrade, but a requirement for what I wanted to do.  While the upgrade itself went smoothly, it did re-introduce an issue or two.  The folks at Dell EqualLogic are aware of this, and are working to address it hopefully in their next release.  The combination of the firmware upgrade, and vSphere 5 allowed me to use the latest and greatest tools from EqualLogic, primarily the Host Integration Tools VMWare Edition (HIT/VE) and the storage integration in vSphere thanks to VASA.  Although, as of this writing, EqualLogic does not have a full production release of their MultiPathing Extension Module (MEM) for vSphere 5.0.  The EPA version was just released, but I’ll probably wait for the full release of MEM to come out before I apply it to the hosts in the cluster.

While I was eager to finish the transition, I didn’t want to prematurely create any problems.  I took a page from my own lessons learned during my upgrade to ESX 4.0, and exercised some restraint when it came to updating my Virtual Hardware for each VM to version 8.  My last update of Virtual Hardware levels in each VM caused some unexpected results, as I shared in “Side effects of upgrading VM’s to Virtual Hardware 7 in vSphere”   Apparently, I wasn’t the only one who ran into issues, because that post has statistically been my all time most popular post.  The abilities of Virtual Hardware 8 powered VMs are pretty neat, but I’m in no rush to make any virtual hardware changes to some of my key production systems, especially those noted. 

So, how did it work out?  The actual process completed without a single major hang-up, and am thrilled with the result.  The irony here is that even though vSphere provides most of the intelligence behind my entire infrastructure, and does things that are mind bogglingly cool, it was so much easier to upgrade than say, SharePoint, AD, Exchange, or some other enterprise software.  Great technologies are great because they work like you think they should.  No exception here.  If you are considering a move to vSphere 5.0, and are a little behind on your old infrastructure, this upgrade approach might be worth considering.

Now, onto that little VDI project…

Resources

A great resource on setting up SQL 2008 R2 for vCenter
How to Install Microsoft SQL Server 2008 R2 for VMware vCenter 5

Installing vCenter 5 Best Practices
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2003790

A little VMFS 5.0 info
http://www.yellow-bricks.com/2011/07/13/vsphere-5-0-what-has-changed-for-vmfs/

Information on the EqualLogic Multipathing Extension Module (MEM), and if you are an EqualLogic customer, why you should care.
https://whiteboardninja.wordpress.com/2011/02/01/equallogic-mem-and-vstorage-apis/

Using the Dell EqualLogic HIT for Linux

 

I’ve been a big fan of Dell EqualLogic Host Integration Tools for Microsoft (HIT/ME), so I was looking forward to seeing how the newly released HIT for Linux (HIT/LE) was going to pan out.  The HIT/ME and HIT/LE offer unique features when using guest attached volumes in your VM’s.  What’s the big deal about guest attached volumes?  Well, here is why I like them.

  • It keeps the footprint of the VM really small.  The VM can easily fit in your standard VMFS volumes.
  • Portable/replaceable.  Often times, systems serving up large volumes of unstructured data are hard to update.  Having the data as guest attached means that you can easily prepare a new VM presenting the data (via NFS, Samba, etc.), and cut it over without anyone knowing – especially when you are using DNS aliasing.
  • Easy and fast data recovery.  My “in the trenches” experience with the guest attached volumes in VM’s running Microsoft OS’s (and EqualLogic’s HIT/ME) have proven that recovering data off of guest attached volumes is just easier – whether you recover it from snapshot or replica, clone it for analysis, etc. 
  • Better visibility of performance. Thanks to the independent volume(s), one can easily see with SANHQ what the requirements of that data volume is. 
  • More flexible protection.  With guest attached volumes, it’s easy to crank up the frequency of snapshot and replica protection on just the data, without interfering with the VM that is serving up the data.
  • Efficient, tunable MPIO. 
  • Better utilization of space.  If you wanted to serve up a 2TB volume of storage using a VMDK, more than likely you’d have a 2TB VMFS volume, and something like a 1.6TB VMDK file to accommodate hypervisor snapshots.  With a native volume, you would be able to use the entire 2TB of space. 

The one “gotcha” about guest attached volumes is that they aren’t visible by the vCenter API, so commercial backup applications that rely on the visibility of these volumes via vCenter won’t be able to back them up.  If you use these commercial applications for protection, you may want to determine if guest attached volumes are a good fit, and if so, find alternate ways of protecting the volumes.    Others might contend that because the volumes aren’t seen by vCenter, one is making things more complex, not less.  I understand the reason for thinking this way, but my experience with them have proven quite the contrary.

Motive
I wasn’t trying out the HIT/LE because I ran out of things to do.  I needed it to solve a problem.  I had to serve up a large amount (several Terabytes) of flat file storage for our Software Development Team.  In fact, this was just the first of several large pools of storage that I need to serve up.  It would have been simple enough to deploy a typical VM with a second large VMDK, but managing such an arrangement would be more difficult.  If you are ever contemplating deployment decisions, remember that simplicity and flexibility of management should trump simplicity of deployment if it’s a close call.  Guest attached volumes align well with the “design as if you know it’s going to change” concept.  I knew from my experience with working with guest attached volumes for Windows VM’s, that they were very agile, and offered a tremendous amount of flexibility.

But wait… you might be asking, “If I’m doing nothing but presenting large amounts of raw storage, why not skip all of this and use Dell’s new EqualLogic FS7500 Multi-Protocol NAS solution?”  Great question!  I had the opportunity to see the FS7500 NAS head unit at this year’s Dell Storage Forum.  The FS7500 turns the EqualLogic block based storage accessible only on your SAN network into CIFS/NFS storage presentable to your LAN.  It is impressive.  It is also expensive.  Right now, using VM’s to present storage data is the solution that fits within my budget.  There are some downfalls (Samba not supporting SMB2), but for the most part, it falls in the “good enough” category.

I had visions of this post focusing on the performance tweaks and the unique abilities of the HIT/LE.  After implementing it, I was reminded that it was indeed a 1.0  product.  There were enough gaps in deployment information that I felt it necessary to provide information on exactly how I actually made the HIT for Linux work.  IT Generalists who I suspect make up a significant amount of the Dell EqualLogic customer base have learned to appreciate their philosophy of “if you can’t make it easy, don’t add the feature.”   Not everything can be made intuitive however, especially the first time around.

Deployment Assumptions 
The scenario and instructions are for a single VM that will be used to serve up a single large volume for storage. It could serve up many guest attached volumes, but for the sake of simplicity, we’ll just be connecting to a single volume.

  • VM with 3 total vNICs.  One used for LAN traffic, and the other two, used exclusively for SAN traffic.  The vNIC’s for the SAN will be assigned to the proper vswitch and portgroup, and will have static IP addresses.  The VM name in this example is “testvm”
  • A single data volume in your EqualLogic PS group, with an ACL that allows for the guest VM to connect to the volume using CHAP, IQN, or IP addresses.  (It may be easiest to first restrict it by IP address, as you won’t be able to determine your IQN until the HIT is installed).  The native volume name in this example is “nfs001” and the group IP address is 10.1.0.10
  • Guest attached volume will be automatically connected at boot, and will be accessible via NFS export.  In this example I will be configuring the system so that the volume is available via the “/data1” directory.
  • OS used will be RedHat Enterprise Linux (RHEL) 5.5. 
  • EqualLogic’s HIT 1.0

Each step below that starts with word “VERIFICATION” is not a necessary step, but it helps you understand the process, and will validate your findings.  For brevity, I’ve omitted some of the output of these commands.

Deploying and configuring the HIT for Linux
Here we go…

Prepping for Installation

1.     Verify installation of EqualLogic prerequisites (via rpm -q [pkgname]).  If not installed, run yum install [pkgname]

openssl                    (0.9.8e for RHEL 5.5)

libpcap                    (0.9.4 for RHEL 5.5)

iscsi-initiator-utils      (6.2.0.871 for RHEL 5.5)

device-mapper-multipath    (0.4.7 for RHEL 5.5)

python                                          (2.4 for RHEL 5.5.) 

dkms                       (1.9.5 for RHEL 5.5)

 

(dkms is not part of RedHat repo.  Need to download from http://linux.dell.com/dkms/ or via the "Extra Packages for LInux" epel repository.  I chose Dell website location because it was a newer version.  Simply download and execute RPM.). 

 

2.     Snapshot Linux machine so that if things go terribly wrong, it can be reversed

 

3.     Shutdown VM, and add NIC’s for guest access

Make sure to choose iSCSI network when adding to VM configuration

After startup, manually specify Static IP addresses and subnet mask for both.  (No default gateway!)

Activate NIC’s, and reboot

 

4.     Power up, then add the following lines to /etc/sysctl.conf  (for RHEL 5.5)

net.ipv4.conf.all.arp_ignore = 1

net.ipv4.conf.all.arp_announce = 2

 

5.     Establish NFS and related daemons to automatically boot

chkconfig portmap on

chkconfig nfs on

chkconfig nfslock on

 

6.     Establish directory which will ultimately be used to export for mounting.  In this example, the iSCSI device will mount to a directory called “eql2tbnfs001” in the /mnt directory. 

mkdir /mnt/eql2tbnfs001

 

7.     Make symbolic link called “data1” in the root of the file system.

ln -s /mnt/eql2tbnfs001 /data1 

 

Installation and configuration of the HIT

8.     Verify that the latest HIT Kit for Linux is being used for installation.  (V1.0.0 as of 9/2011)

 

9.     Import public key

      Download the public key from eql support site under HIT for Linux, and place in /tmp/ )

Add key:

rpm –import RPM-GPG-KEY-DELLEQL (docs show lower case, but file is upper case)

 

10.  Run installation

yum localinstall equallogic-host-tools-1.0.0-1.e15.x86_64.rpm

 

Note:  After HIT is installed, you may get the IQN for use of restricting volume access in the EqualLogic group manager by typing the following:

cat /etc/iscsi/initiatorname.iscsi.

 

11.  Run eqltune (verbose).  (Tip.  You may want to capture results to file for future reference and analysis)

            eqltune -v

 

12.  Make adjustments based on eqltune results.  (Items listed below were mine.  Yours may be different)

 

            NIC Settings

   Flow Control. 

ethtool -A eth1 autoneg off rx on tx on

ethtool -A eth2 autoneg off rx on tx on

 

(add the above lines to /etc/rc.d/rc.local to make persistent)

 

There may be a suggestion to use jumbo frames by increasing the MTU size from 1500 to 9000.  This has been omitted from the instructions, as it requires proper configuration of jumbos from end to end.  If you are uncertain, keep standard frames for the initial deployment.

 

   iSCSI Settings

   (make backup of /etc/iscsi/iscid.conf before changes)

 

      Change node.startup to manual.

   node.startup = manual

 

      Change FastAbort to the following:

   node.session.iscsi.FastAbort = No

 

      Change initial_login_retry to the following:

   node.session.initial_login_retry_max = 12

 

      Change number of queued iSCSI commands per session

   node.session.cmds_max = 1024

 

      Change device queue depth

   node.session.queue_depth = 128

 

13.  Re-run Eqltune -v to see if changes took affect

All changes took effect, minus the NIC settings added to the rc.local file.  Looks to be a syntax error from Eql documentation provided.  It has been corrected in the documentation above.

 

14.  Run command to view and modify MPIO settings

rswcli –mpio-parameters

 

This returns the results of:  (seems to be good for now)

Processing mpio-parameters command…

MPIO Parameters:

Max sessions per volume slice:: 2

Max sessions per entire volume:: 6

Minimum adapter speed:: 1000

Default load balancing policy configuration: Round Robin (RR)

IOs Per Path: 16

Use MPIO for snapshots: Yes

Internet Protocol: IPv4

The mpio-parameters command succeeded.

 

15.  Restrict MPIO to just the SAN interfaces

Exclude LAN traffic

            rswcli -E -network 192.168.0.0 -mask 255.255.255.0

 

VERIFICATION:  List status of includes/excludes to verify changes

            rswcli –L

 

VERIFICATION:  Verify Host connection Mgr is managing just two interfaces

      ehcmcli –d

 

16.  Discover targets

iscsiadm -m discovery -t st -p 10.1.0.10

(Make sure no unexpected volumes connect.  But note the IQN name presented.  You’ll need it for later.)

 

VERIFICATION:  shows iface

[root@testvm ~]# iscsiadm -m iface | sort

default tcp,<empty>,<empty>,<empty>,<empty>

eql.eth1_0 tcp,00:50:56:8B:1F:71,<empty>,<empty>,<empty>

eql.eth1_1 tcp,00:50:56:8B:1F:71,<empty>,<empty>,<empty>

eql.eth2_0 tcp,00:50:56:8B:57:97,<empty>,<empty>,<empty>

eql.eth2_1 tcp,00:50:56:8B:57:97,<empty>,<empty>,<empty>

iser iser,<empty>,<empty>,<empty>,<empty>

 

VERIFICATION:  Check connection sessions via iscsiadm -m session to show that no connections exist

[root@testvm ~]# iscsiadm -m session

iscsiadm: No active sessions.

 

VERIFICATION:  Check connection sessions via /dev/mapper to show that no connections exist

[root@testvm ~]# ls -la /dev/mapper

total 0

drwxr-xr-x  2 root root     60 Aug 26 09:59 .

drwxr-xr-x 10 root root   3740 Aug 26 10:01 ..

crw——-  1 root root 10, 63 Aug 26 09:59 control

 

VERIFICATION:  Check connection sessions via ehcmcli -d to show that no connections exist

[root@testvm ~]# ehcmcli -d

 

17.  Login just one of the iface paths of your liking (shown in red here).  Replace the IQN here (shown in green) with yours. The HIT will take care of the rest.

iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001 -I eql.eth1_0 -l

 

This returned:

[root@testvm ~]# iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001 -I eql.eth1_0 -l

Logging in to [iface: eql.eth1_0, target: iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001, portal: 10.1.0.10,3260]

Login to [iface: eql.eth1_0, target: iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001, portal: 10.1.0.10,3260] successful.

 

VERIFICATION:  Check connection sessions via iscsiadm -m session

[root@testvm ~]# iscsiadm -m session

tcp: [1] 10.1.0.10:3260,1 iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001

tcp: [2] 10.1.0.10:3260,1 iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001

 

VERIFICATION:  Check connection sessions via /dev/mapper.  This is going to give you the string you will need to use making and mounting the filesystem.

[root@testvm ~]# ls -la /dev/mapper

 

 

VERIFICATION:  Check connection sessions via ehcmcli -d

[root@testvm ~]# ehcmcli -d

 

18.  Make new file system from the dm-switch name.  Replace the IQN here (shown in green) with yours.  If this is an existing volume that has been used before (from a snapshot, or another machine) there is no need to perform this step.  Documentation will show this step without the “-j” switch, which will format it as a non-journaled ext2 file system.  The –j switch will format it as an ext3 file system.

mke2fs -j -v /dev/mapper/eql-0-8a0906-451da1609-2660013c7c34e45d-nfs001

 

19.  Mount the device to a directory

[root@testvm mnt]# mount /dev/mapper/eql-0-8a0906-451da1609-2660013c7c34e45d-nfs001 /mnt/eql2tbnfs001

 

20.  Establish iSCSI connection automatically

[root@testvm ~]# iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-451da1609-2660013c7c34e45d-nfs001 -I eql.eth1_0 -o update -n node.startup -v automatic

 

21.  Mount volume automatically

Change /etc/fstab, adding the following:

/dev/mapper/eql-0-8a0906-451da1609-2660013c7c34e45d-nfs001 /mnt/eql2tbnfs001 ext3 _netdev  0 0

Restart system to verify automatic connection and mounting.

 

Working with guest attached volumes
After you have things configured and operational, you’ll see how flexible guest iSCSI volumes are to work with.

  • Do you want to temporarily mount a snapshot to this same VM or another VM? Just turn the snapshot online, and make a connection inside the VM.
  • Do you need to archive your data volume to tape, but do not want to interfere with your production system? Mount a recent snapshot of the volume to another system, and perform the backup there.
  • Do you want to do a major update to that front end server presenting the data? Just build up a new VM, connect the new VM to that existing data volume, and change your DNS aliasing, (which you really should be using) and you’re done.
  • Do you need to analyze the I/O of the guest attached volumes? Just use SANHQ. You can easily see if that data should be living on some super fast pool of SAS drives, or a pool of PS4000e arrays.  You’ll be able to make better purchasing decisions because of this.

So, how did it measure up?

The good…
Right out of the gate, I noticed a few really great things about the HIT for Linux.

  • The prerequisites and installation.  No compiling or other unnecessary steps.  The installation package installed clean with no fuss.  That doesn’t happen every day.
  • Eqltune.  This little utility is magic.  Talk about reducing overhead in preparing a system for MPIO and all things related to guest based iSCSI volumes.  It gave me a complete set of adjustments to make, divided into 3 simple categories.  After I made the adjustments, I re-ran the utility, everything checked out okay.  Actually, all of the command line tools were extremely helpful.  Bravo!
  • One really impressive trait of the HIT/LE is how it handles the iSCSI sessions for you. Session build up and teardown is all taken care of by the HIT for Linux.

The not so good…
Almost as fast as the good shows up, you’ll notice a few limitations

  • Version 1.0 is only officially supported on RedHat Enterprise Linux (RHEL) 5.5 and 6.0 (no 6.1 as of this writing).  This might be news to Dell, but Debian based systems like Ubuntu are running in enterprises everywhere for it’s cost, solid behavior, and minimalist approach.  RedHat clones dominate much of the market; some commercial, and some free.  Personally, upstream Distributions such as Fedora are sketchy, and prone to breakage with each release (Note to Dell, I don’t blame you for not supporting these.  I wouldn’t either).  Other distributions are quirky for their own reasons of “improvement” and I can understand why these weren’t initially supported either.  A safer approach for Dell (and the more flexible approach for the customer) would be to 1.) Get out a version for Ubuntu as fast as possible, and 2.)  Extend the support of this version to RedHat’s, downstream, 100% binary compatible, very conservative distribution, CentOS.  For you Linux newbies, think of CentOS as being the RedHat installation but with the proprietary components stripped out, and nothing else added.  While my first production Linux server running the HIT is RedHat 5.5, all of my testing and early deployment occurred on a CentOS 5.5 Distribution, and it worked perfectly. 
  • No AutoSnapshot Manager (ASM) or equivalent.  I rely on ASM/ME on my Windows VM’s with guest attached volumes to provide me with a few key capabilities.  1.)  A mechanism to protect the volumes via snaphots and replicas.  2.)  Coordinating applications and I/O so that I/O is flushed properly.  Now, Linux does not have any built-in facility like Microsoft’s Volume Shadow Copy Services (VSS), so Dell can’t do much about that.  But perhaps some simple script templates might give the users ideas on how to flush and pause I/O of the guest attached volumes for snapshots.  Just having a utility to create Smart copies or mount them would be pretty nice. 

The forgotten…
A few things overlooked?  Yep.

  • I was initially encouraged by the looks of the documentation.  However, In order to come up with the above, I had to piece together information from a number of different resources.   Syntax and capitalization errors will kill you in a Linux shell environment.  Some of those inconsistencies and omissions showed up.  With a little triangulation, I was able to get things running correctly, but it quickly became a frustrating, time consuming exercise that I felt like I’ve been through before.  Hopefully the information provided here will help.
  • Somewhat related to the documentation issue is something that has come up with a few of the other EqualLogic tools;  Customers often don’t understand WHY one might want to use the tool.  Same thing goes with the HIT for Linux.  Nobody even gets to the “how” if they don’t understand the “why”.  But, I’m encouraged by the great work the Dell TechCenter has been doing with their white papers and videos.  It has become a great source for current information, and are moving in the right direction of customer education.   

Summary
I’m generally encouraged by what I see, and am hoping that Dell EqualLogic takes on the design queues of the HIT/ME to employ features like AutoSnapshot Manager, and an equivalent to eqlxcp (EqualLogic’s offloaded file copy command in Windows).  The HIT for Linux  helped me achieve exactly what I was trying to accomplish.  The foundation for another easy to use tool in the EqualLogic line up is certainly there, and I’m looking forward to how this can improve.

Helpful resources
Configuring and Deploying the Dell EqualLogic Host Integration Toolkit for Linux
http://en.community.dell.com/dell-groups/dtcmedia/m/mediagallery/19861419/download.aspx

Host Integration Tools for Linux – Installation and User Guide
https://www.equallogic.com/support/download_file.aspx?id=1046 (login required)

Getting more IOPS on workloads running RHEL and EQL HIT for Linux
http://en.community.dell.com/dell-blogs/enterprise/b/tech-center/archive/2011/08/17/getting-more-iops-on-your-oracle-workloads-running-on-red-hat-enterprise-linux-and-dell-equallogic-with-eql-hitkit.aspx 

RHEL5.x iSCSI configuration (Not originally authored by Dell, nor specific to EqualLogic)
http://www.equallogic.com/resourcecenter/assetview.aspx?id=8727 

User’s experience trying to use the HIT on RHEL 6.1, along with some other follies
http://www.linux.com/community/blogs/configuring-dell-equallogic-ps6500-array-to-work-with-redhat-linux-6-el.html 

Dell TechCenter website
http://DellTechCenter.com/ 

Dell TechCenter twitter handle
@DellTechCenter