Diagnosing a failed iSCSI switch interconnect in a vSphere environment

The beauty of a well constructed, highly redundant environment is that if a single point fails, systems should continue to operate without issue.  Sometimes knowing what exactly failed is more challenging than it first appears.  This was what I ran into recently, and wanted to share what happened, how it was diagnosed, and ultimately corrected.

A group of two EqualLogic arrays were running happily against a pair of stacked Dell PowerConnect 6224 switches, serving up a 7 node vSphere cluster.  The switches were rebuilt over a year ago, and since that time they have been rock solid.  Suddenly, the arrays started spitting out all kinds of different errors.  Many of the messages looked similar to these:

iSCSI login to target ’10.10.0.65:3260, iqn.2001-05.com.equallogic:0-8a0906-b6cc21609-d200014832f4ecfb-vmfs001′ from initiator ’10.10.0.10:52155, iqn.1998-01.com.vmware:esx1-70a98577′ failed for the following reason:
Initiator disconnected from target during login.

Some of the earliest errors on the array looked like this:

10/1/2012 1:01:11 AM to 10/1/2012 1:01:11 AM
Warning: Member PS6000e network port cannot be reached. Unable to obtain network performance data for the member.
Warning: Member PS6100e network port cannot be reached. Unable to obtain network performance data for the member.
10/1/2012 1:01:11 AM to 10/1/2012 1:01:11 AM
Caution: Some SNMP requests to member PS6100e for disk drive information timed out.
Caution: Some SNMP requests for information about member PS6100e disk drives timed out.

VMs that had guest attached volumes were generating errors similar to this:

Subject: ASMME smartcopy from SVR001: MPIO Reconfiguration Request IPC Error – iqn.2001-05.com.equallogic:0-8a0906-bd5d27503-7ef000ed5d54a8c1-ntfs001 on host SVR001

[01:01:11] MPIO failure during reconfiguration request for target iqn.2001-05.com.equallogic:0-8a0906-476f6bd06-0c500008a0c4c41f-ntfs002 with error status 0×16000000.

[01:01:11] MPIO failure during reconfiguration request for target iqn.2001-05.com.equallogic:0-8a0906-dc0da1609-2fe0014145f4e931-ntfs001 with error status 0×80070006.

Before I had a chance to look at anything, I suspected something was wrong with the SAN switch stack, but was uncertain beyond that.  I jumped into vCenter to see if anything obvious showed up.  But vSphere and all of the VMs were motoring along just like normal.  No failed uplink errors, or anything else noticeable.  I didn’t do much vSphere log fishing at this point because all things were pointing to something on the storage side, and I had a number of tools that could narrow down the problem.  With all things related to storage traffic, I wanted to be extra cautious and prevent making matters worse with reckless attempts to resolve.

First, some background on how EqualLogic arrays work.  All arrays have two controllers, working in an active/passive arrangement.  Depending on the model of array, each controller will have between two and four ethernet ports per controller, with each port having an IP address assigned to it.  Additionally, there will be a single IP address to define the “group” the member array is a part of.  (The Group IP is single IP used by systems looking for an iSCSI target, to let the intelligence of the arrays figure out how to distribute traffic across interfaces.)  If some of the interfaces can’t be contacted (e.g. disconnected cable, switch failure, etc.), the EqualLogic arrays will be smart enough to distribute across the active links.

The ports of each EqualLogic array are connected to the stacked SAN switches in a meshed arrangement for redundancy.  If there ware a switch failure, then one wouldn’t be able to contact the IP addresses of the ethernet ports connected to one of the switches.  But using a VM with guest attached volumes (which have direct access to the SAN), I could successfully ping all four interfaces (eth0 through eth3) on each array.  Hmm…

So then I decided to SSH into the array and see if I could perform the same test.  The idea would be to test from one IP on one of the arrays to see if a ping would be successful on eth0 through eth3 on the other array.  The key to doing this is to use an IP of one of the individual interfaces as the source, and not the Group IP.  Controlling the source and the target during this test will tell you a lot.  After connecting to the array via SSH, the syntax for testing the interfaces on the target array would be this:

ping –I “[sourceIP] [destinationIP]”  (quotes are needed!)

From one of the arrays, pinging all four interfaces on the second array revealed that only two of the four ports succeeded.  But the earlier test from the VM proved that I could ping all interfaces, so I chose to change the source IP as one of the interfaces living on the other switch.  Performed the same test, and the opposite results occurred.  The ports that failed on the last test passed on this test, and the ports that passed the last test, failed on this time.  This seemed to indicate that both switches were up, but the communication between switches were down. 

While I’ve never seen these errors on switches using stacking modules, I have seen the MPIO errors above on a trunked arrangement.  One might run into these issues more with trunking, as it tends to leave more opportunity for issues caused by configuration errors.  I knew that in this case, the switch configurations had not been touched for quite some time.  The status of the switches via the serial console stated the following:

SANSTACK>show switch
Management Standby Preconfig Plugged-in Switch Code
SW Status Status Model ID Model ID Status Version
1 Mgmt Sw PCT6224 PCT6224 OK 3.2.1.3
2 Unassigned PCT6224 Not Present 0.0.0.0

The result above wasn’t totally surprising, in that if the stacking module was down, the master switch wouldn’t be able to be able to gather the information from the other switch.

Dell also has an interesting little tool call “Lasso.”  The Dell Lasso Tool will help grab general diagnostics data from a variety of sources (servers, switches, storage arrays).  But in this case, I found it convenient to test connectivity from the array group itself.  The screen capture below seems to confirm what I learned through the testing above.

image

So the next step was trying to figure out what to do about it.  I wanted to reboot/reload the slave switch, but knowing both switches were potentially passing live data, I didn’t want to do anything to compromise the traffic.  So I employed an often overlooked, but convenient way of manipulating traffic to the arrays; turning off the interfaces on the array that are connected to the SAN switch that needs to be restarted.  If one turns off the interfaces on each array connected to the switch that needs the maintenance, then there will not be any live data passing through that switch.  Be warned that you better have a nice, accurate wiring schematic of your infrastructure so that you know which interfaces can be disabled.  You want to make things better, not worse.

After a restart of the second switch, the interconnect reestablished itself.  The interfaces on the arrays were re-enabled, with all errors disappearing.  I’m not entirely sure why the interconnect went down, but the primary objective was diagnosing and correcting in a safe, deliberate, yet speedy way.  No VMs were down, and the only side effect of the issue was the errors generated, and some degraded performance.  Hopefully this will help you in case you see similar symptoms in your environment.

Helpful Links

Dell Lasso Tool

http://www.dell.com/support/drivers/us/en/555/DriverDetails?driverId=4T3Y6&c=us&l=en&s=biz

Reworking my PowerConnect 6200 switches for my iSCSI SAN

http://vmpete.com/2011/06/26/reworking-my-powerconnect-6200-switches-for-my-iscsi-san/

Dell TechCenter.  A great resource all things related to Dell in the Enterprise.

http://en.community.dell.com/techcenter/b/techcenter/default.aspx

Multipathing in vSphere with the Dell EqualLogic Multipathing Extension Module (MEM)

There is a lot to be said about the Dell EqualLogic Multipathing Extension Module (MEM) for vSphere.  One is that it is an impressive component that without a doubt will improve the performance of your vSphere environment.  The other is that it often not installed by organizations large and small.  This probably stems from a few reasons.

  • The typical user is uncertain of the value it brings.
  • vSphere Administrators might be under the assumption that VMware’s Round Robin will perform the same thing.
  • The assumption that if they don’t have vSphere Enterprise Plus licensing, they can’t use MEM.
  • It’s command line only

What a shame, because the MEM gives you optimal performance of vSphere against your EqualLogic array, and is frankly easier to configure.  Let me clarify, easier to configure correctly.  iSCSI will work seemingly okay with not much effort.  But that lack of effort initially can catch up with you later; resulting in no multipathing, poor performance, and possibly prone to error.

There are a number of good articles that outline the advantages of using the MEM.  There is no need for me to repeat, so I’ll just stand on the shoulder’s of their posts, and provide the links at the end of my rambling.

The tool can be configured in a number of different ways to accommodate all types of scenarios; all of which is well documented in the Deployment Guide.  The flexibility  in deployment options might be why it seems a little intimidating to some users. 

I’m going to show you how to set up the MEM in a very simple, typical fashion.  Let’s get started.  We’ll be working under the following assumptions:

  • vSphere 5 and the EqualLogic MEM  *
  • An ESXi host Management IP of 192.168.199.11
  • Host account of root with a password of mypassword
  • a standard vSwitch for iSCSI traffic will be used with two physical uplinks (vmnic4 & vmnic5)   The vSwitch created will be a standard vSwitch, but it can easily be a Distributed vSwitch as well.
  • Three IP addresses for each host; two for iSCSI vmkernels (192.168.198.11 & 192.168.198.21), and one for Storage Heartbeat. (192.168.198.31)
  • Jumbo frames (9000 bytes) have been configured on your SAN switchgear, and will be used on your ESXi hosts.
  • A desire to accommodate VMs that used guest attached volumes.
  • EqualLogic Group IP address of:  192.168.198.65
  • Storage network range of 192.168.198.0 /24

When applying to your environment, just tailor the settings to reflect your environment.

Download and preparation

1.  Download the MEM from the Dell EqualLogic customer web portal to a workstation that has the vSphere CLI installed

2.  Extract the MEM so that it resides in a C:\MEM directory. You should see a setup.pl file in the root of C:\MEM, along with a dell-eql-mem-esx5-[version].zip Keep this zip file, as it will be needed during the installation process.

Update ESXi host

1.  Put ESXi host in Maintenance Mode

2.  Delete any previous vSwitch that goes into the pNICs for iSCSI. Will also need to remove any previous port bindings.

3.  Initiate script against first host:

setup.pl –configure –server=192.168.199.11 –username=root –password=mypassword

This will walk you through a series of variables you need to enter.  It’s all pretty straightforward, but I’ve found the more practical way is to include it all as one string.  This minimizes mistakes, improves documentation, and allows you to just cut and paste into the vSphere CLI.  The complete string would look like this.

setup.pl –configure –server=192.168.199.11 –vswitch=vSwitchISCSI –mtu=9000 –nics=vmnic4,vmnic5 –ips=192.168.198.11,192.168.198.21 –heartbeat=192.168.198.31 –netmask=255.255.255.0 –vmkernel=iSCSI –nohwiscsi –groupip=192.168.198.65

It will prompt for a user name and password before it runs through the installation.  Near the end of the installation, it will return:

No Dell EqualLogic Multipathing Extension Module found. Continue your setup by installing the module with the ‘esxcli software vib install’ command or through vCenter Update Manager

This is because the MEM VIB has not been installed yet.  MEM will work but only using the default pathing policies.  The MEM VIB can be installed by typing in the following:

setup.pl –install –server=192.168.199.11 –username=root –password=mypassword

If you look in vCenter, you’ll now see the vSwitch and vmkernel ports created and configured properly, with the port bindings configured correctly as well.  You can verify it with the following

setup.pl –query –server=192.168.199.11 –username=root –password=mypassword

But you aren’t quite done yet.  If you are using guest attached volumes, you will need to create Port Groups on that same vSwitch so that the guest volumes can connect to the array.  To do it properly in which the two vNICs inside the guest OS can multipath to the volume properly, you will need to create two Port Groups.  When complete, your vSwitch may look something like this:

image

Take a look at the VMkernel ports created by MEM, you will see the NIC Teaming Switch Failover Order has been set so that one vmnic is set to “Active” while the other is set to “Unused”  The other VMkernel port has the same settings, but with the vmnics reversed in their “Active” and “Unused” state.The Port Groups you create for VMs using Guest attached volumes will take a similar approach.  Each Port Group will have one “Active” and one “Standby” adapter (“Standby” not “unused” like the VMkernel).  Each Port Group has the vmnics reversed.  When configuring a VM’s NICs for guest attached volume access, you will want to assign one vmnic to one Port Group, while the other is assigned to the other Port Group.  Confused?  Great.  Take a look at Will Urban’s post on how to configure Port Groups for guest attached volumes correctly. 

Adjusting your existing environment.
If you need to rework your existing setup, simply put each host into Maintenance Mode one at a time and perform the steps above with your appropriate information.Next, take a look at your existing Datastores, and if they are using one of the built in Path Selection Policy methods (“Fixed” “Round Robin” etc.), change them over to “DELL_PSP_EQL_ROUTED”If you have VMs that leverage guest attached volumes off of a single teamed Port Group, you may wish to temporarily create this Port Group under the exact same name so the existing VMs have don’t get confused.  Remove this temporary Port Group once you’ve had the opportunity to change the VM’s properties.So there you have it.  A simple example of how to install and deploy Dell’s MEM for vSphere 5.  Don’t leave performance and ease of management on the shelf.  Get MEM installed and running in your vSphere environment.

UPDATE (9/25/2012)
The instructions provided was under the assumption that vSphere 5 was being used.  Under vSphere 5.1 and the latest version of MEM, the storage heartbeat is no longer needed.  I have modified the post to accommodate, including the link below that references the latest Dell EqualLogic MEM documentation.  I’d like to thank the Dell EqualLogic Engineering team for pointing out this important distinction.

Helpful Links

A great summary on the EqualLogic MEM and vStorage APIs

http://whiteboardninja.wordpress.com/2011/02/01/equallogic-mem-and-vstorage-apis/

Comac Hogan’s great post on how the MEM works it’s magic.

http://blogs.vmware.com/vsphere/2011/11/dells-multipath-extension-module-for-equallogic-now-supports-vsphere-50.html

Some performance testing of the MEM

http://www.spoonapedia.com/2010/07/dell-equallogic-multipathing-extension.html

Official Documentation for the EqualLogic MEM (Rev 1.2, which covers vSphere 5.1)

http://www.equallogic.com/WorkArea/DownloadAsset.aspx?id=11000

Three Labs for three reasons

Chalk up 2012 as the year I started paying attention to what everyone in the Virtualization world was doing for their vSphere home/portable labs.  Well, it was 2011 to be more precise, but I just didn’t act on it until this year.  Since I decided to dive into a lab head first, I thought I’d share with you what I have, and how its been working.

I was reminded of the power of a lab environment last year while I was building out a CoLo site for DR, offsite hosting and VDI for my company.  It was more than once that I thought, “gee, this is nice that I’m not playing around with the production site.”  I knew that it was time to start thinking about what I wanted for a lab.

If you start digging into the whole home/portable lab movement, and you find out that the ideal home lab is really dependent on what your needs are.  Some are perfectly satisfied with a vSphere lab nested inside of VMware Workstation, while others have physical labs that will make the lights dim.  Both are viable options, but I’ll tell you what I settled on for mine, and how everything has been working.  Regardless of what you end up choosing, some great hardware and software can produce some pretty fantastic lab environments.

But first, time for a change…

So why the need for all of this? Well, in June of 2012, I decided to take my career in a slightly different direction. After 13 years as the Senior Systems Administrator for a software company in Bellevue, WA. I took a position with Mosaic Technology, a Solution Provider and Channel Partner for VMware, Dell, and others.  As a Senior Systems Engineer, I now get the opportunity to design and implement virtualization solutions, while applying my practical experience in the trenches with technologies and solutions in every corner of IT. This is extremely exciting for me, as I get to focus more the very software that altered the course of my career just 5 years ago. I get to work with a great team at Mosaic, many of whom I’ve known for quite some time, including my good friend Tim Antonowicz.  Over the years I’ve also had the opportunity to establish relationships with many at Dell, including Dell EqualLogic, and Dell TechCenter.  I get to continue to work with these folks, but in a different capacity.

 

The Home Lab

For my home lab, I wanted to simulate some sense of a real world environment.  For that you need real hardware; real NICs, and real resources.  I also had the desire to eventually run a nested environment in each physical host, so I didn’t want to skimp on physical resources too much. 

As much as I wanted real hardware, I also didn’t want my house to sound or feel like a datacenter, so for me, being mindful of power consumption was as much about heat generation as it was about noise, and cost.  Another goal of mine was to avoid scenarios where I’d buy something twice where the first time met a certain price point, and the second time to get what you really needed the first time around.  For most people, that usually means RAM.  Buy it once and be done with it.

Compute:
My physical hosts (qty. 2) most closely reflect the setup that Chris Wahl has described in his home lab.  The main differences are: 1.) I dumped 32GB of RAM in each host, and 2.) I threw in two dual port NICs, and a single port NIC.  I wanted a minimum of 6 functioning NICs for vSphere.  The SuperMicro motherboard comes with two onboard NICs (not including the IPMI port).  One is an Intel 82574L which is supported by vSphere 5, but the other is an Intel 82574LM and isn’t easily recognized.  You can get it to work, but it was worth $20 for the additional NIC, especially during host rebuilds.  A few notes about this setup

  • If you go with this particular SuperMicro motherboard, keep an eye out for ECC Unbuffered DDR3 DIMMs, as this motherboard requires it.  They are down to about $90 a stick as of the time of this writing, so about $360 to populate the host with 32GB of RAM.
  • IPMI has proven to be extremely valuable.  Not only will it allow for some nice remote rebuilds of the hosts, but is a perfect match for DPM in vSphere.

Storage:
For storage, I settled on a Synology DS1512+ NAS unit.  I populated the enclosure with two, 128GB Crucial M4 SSDs, and three 2TB 5400 RPM SATA drives.  The interesting feature of the DS1512+ is that it has two NICs on the back.  This offers up a little flexibility for multipathing iSCSI, or splitting off NFS to a different interface/network.

Synology has been getting a lot of press with the Home Lab crowd, and if you use it, you’ll understand why.  The DSM (their OS) provides an easy, flexible way to serve up NFS, CIFS, or iSCSI, including VAAI support.  Plenty of other features will keep you entertained as well.  The enclosure should fit my needs even as the drives themselves may change.

Switching:
With my desires of so many NIC ports, I knew that they’d get sucked up pretty fast.  So a 20 port switch was the minimum.  My other requirement was that it had to be fanless.  1U anything with a fan seems to be nothing but a noise maker, I didn’t want that.  I settled on a Cisco SG300-20 switch.  This is a full layer 3 managed switch that is a real gem.  While it doesn’t run IOS, it does have a CLI, and supports just about everything you’d want in a home lab.  Inter-VLAN routing.  CDP, LLDP, jumbo frames, etc.  It’s been fantastic.  I feed this switch to a Cisco/Linksys WRT-160NL flashed with DD-WRT so that my lab has internet access.

So, how much power does all this draw?  All together, around 160 watts or about 170VA.  Yeah, that’s right, under light load, the entire thing is drawing minimal power, with minimal heat, and just a whisper of noise.  Considering that a Dell Precision T5500 workstation alone pulls about the same amount of power, I’m extremely happy with the result.  Here is how the running load works out.

Device Running Watts Running VA
Switch 10 17
NAS 38 34
Host1 57 60
Host2 57 60
Total 162 171

My only complaint is the goof-ball form factors of the NAS, the Lian Li chassis and SG-300-20 switch.  No amount of reorganizing them have resulted in an orderly arrangement of systems.  I’ll have to build something to accommodate.

The Portable Lab

Due to the job change, I also had the need to have a portable lab (running nested vSphere inside of VMware Workstation); something I could guarantee to spin up if needed.  Wifi access wouldn’t necessarily be available where I needed a lab, so a portable lab on a laptop solved this problem.  But a portable lab does require some real horsepower, so that is where a Dell Precision M6600 comes into play.  This 17” laptop is a beast in every sense.  Yeah, it’s a tank to lug around, but it has a 17” screen, quad core i7 processor, and 16GB of RAM (expandable to 32).  I have a 256GB Crucial M4 SSD to run the OS and some of the VMs, while a second internal SATA drive carries the bulk of VMs, user data, etc.

I set up the lab in VMware Workstation a few different ways.  A few times on my own, from scratch, then a few times using the “AutoLab”  Both ways will end up with similar results; a functioning nested vSphere environment.  The AutoLab definitely saves time when it comes to the rebuild process.  I’ve found that the perfect mix so far has been to install the AutoLab, then tweak where I see fit (typically networking changes) based on personal preferences, or requirements.

With the assortment of powered VMs up required to run the nested lab, I typically use around 10GB of RAM on my system.  That doesn’t leave much left over to run additional VMs, but with a little creativity, its workable.

 

The Verdict

So which one do I like better?  Honestly, they both are extremely valuable in their own ways.  But here are some generalizations.

  • If you are testing anything related to networking, nothing seems to beat the physical lab, as it is going to mimic the real deal. 
  • If you are pushing any sizable workload with VM’s (in quantity, or allocation size of VMs), the physical lab shines.  With a portable lab, even with 16GB of RAM on a workstation, you really have to trim up the VMs.  But then again, it’s a lab, not production.
  • For professional development, study, documentation, experimenting, and customer demonstrations, the portable lab is second to none when it comes to convenience and accessibility.
  • The speed at which you can test a setting of one’s ESXi hosts, vCenter, or play with some scripting, is just fantastic.  The availability of the lab makes it incredibly valuable. 

The comparison is almost similar to the SLR versus point-n-shoot camera debate.  One might be technically superior, but if you don’t have it with you because it’s too bulky, etc. then what good is it?  This analogy is where the portable nested lab on my laptop proves to be incredibly valuable.  You will find those who have run a portable lab and went physical because they were tired of nesting ESXi, and others who have ran physical and moved to a portable setup.  It really depends on what your needs are.  I continue to use both as the needs dictate.

The third Lab

The third lab might be my most important.  This little guy is the most reliable lab I’ve ever had.  Here he is on high alert guarding my other lab. 

cq

Resources:

AutoLab by Alastair Cooke and Nick Marshall

http://www.labguides.com/

Hersey Cartwright’s Lab setup

http://www.vhersey.com/2011/12/my-home-vmware-lab/

Chris Wahl’s lab posts

http://wahlnetwork.com/tag/lab/
 

Tim’s portable lab

http://whiteboardninja.wordpress.com/2011/10/13/building-a-portable-vsphere-lab/

Synology’s DSM 4.0 support of VAAI in vSphere 5

http://www.kendrickcoleman.com/index.php/Tech-Blog/synology-dsm-40-supports-vaai-in-vsphere-5-for-home-labs.html

A detailed build out of a home lab

http://boerlowie.wordpress.com/2011/11/30/building-the-ultimate-vsphere-lab-part-1-the-story/

Review of the 10 port version of my Cisco SG300-20 switch

http://www.vladan.fr/home-lab-gear-cisco-sg300-10-layer-3-switch-gets-new-firmware/

Dell EqualLogic’s newest Host Integration Tools for Linux (v1.1)

 

It was just last September that I wrote about Using the Dell EqualLogic HIT for Linux (HIT/LE) Version 1.0.  At the time, the HIT/LE was beginning to play an important role in how we housed large volumes of data, and I wanted to share with others what I learned in the process.  While it has been running well in our environment, it was definitely a 1.0 product when it came to features and configuration, so I was anxious to see what was in store for the next version.  Version 1.1 was released in April of 2012, and it addressed some of the observations I had about HIT/LE 1.0.  Here are a few highlights.

  • Better distribution support.  CentOS, the binary compatible/clone to RHEL is now supported.  Versions 5.7 through 6.2 of CentOS are now supported.  According to the documentation, RHEL 5.5 is no longer supported, which is a change from the previous edition.  Suse Enterprise Linux is also supported.
  • Auto Snapshot Manager, Linux Edition. (ASM/LE).  A new feature that will allow you to create, manage, and schedule volume snapshots (Smart Copies), clones, and replicas from inside of the guest.  This is huge improvement. 
  • A new installer and configuration process. 
  • Better documentation.  This wasn’t listed in the release notes, but was immediately a noticeable improvement.

Version 1.0 did a good job applying the benefits of guest volumes to Linux based Operating Systems.  The problem was that it left out key abilities that prevented an automated way to manage those snapshots for specific purposes.  The biggest challenge I had was finding an automated way to take snapshots of these Linux guest attached volumes, and mount them to a Linux media server so that the data could be archived onto tape.  No amount of glue or duct tape helped in bridging the functions needed with snapshot manipulation inside the guests.

Configuring and Connecting
The configuration and connection of volumes seems to be greatly simplified.  Below demonstrates a simplified method of connecting an existing volume to a VM running the new HIT/LE 1.1.  Compare this to the instructions I provided on my post about HIT/LE 1.0, and you’ll see quite a difference.

  1. Add access to a PS Series Group called "MYEQLGRP”
    rswcli –add-group-access –gn MYEQLGRP –gip 10.10.10.100

    VERIFICATION: List the group added above
    rswcli –l

  2. Discover iSCSI targets
    iscsiadm -m discoverydb

    VERIFICATION: Confirm by viewing current list of discovered targets
    iscsiadm -m node | sort –u

    (returns the iqn needed in the next step)

  3. Log into a volume name and automatically connect at boot:
    ehcmcli login –target iqn.2001-05.com.equallogic:0-8a0906-3a7da1609-e720013e5c54e679-nfs100 –login-at-boot

    (returns the new device bound to a subdirectory below /dev/eql)

    VERIFICATION: Confirm device connection:
    ehcmcli status

  4. Mount it (and add to fstab for automatic mounting if desired)
    mount /dev/eql/nfs100 /mnt/myexport

In-Guest Volume Snapshots (Smart Copies)
The old version of HIT/LE didn’t offer any way of creating a snapshot inside the guest.  One could create volume snapshots from the Group Manager GUI, and even schedule them.  However, when it came to manipulating that snapshot from a guest, such as turning it online, or connecting to it, there was no way to do so.  Since the snapshots generate their own unique IQN, one needed a way to query for, and pass these variables as parameters. 

The new version offers a complete command set that fills the void.  At the root of the new found intelligence is the “asmcli” command.  The asmcli help command will provide you with a complete listing of options.  I’m not going to dive into each option, but rather, provide a simple example of how one can create a smart copy, and mount it if needed.

Before you get started, you may wish to choose or create a dedicated account on your PS Group that has volume administrator privileges.  Each system that has ASM/LE installed needs an account to interact with the volumes, and this offers the least privilege necessary to interact with the Smart Copies. The example below uses an account named “asmleadmin”

  1. Create PS group access (one time configuration step)
    asmcli create group-access –-name MYEQLGRP –-ip-address 10.10.10.100 –-user-name asmleadmin 

    VERIFICATION:  Confirm group access is set the way you want it.
    asmcli list group-access

  2. Create Smart Copy of the guest attached volume mounted to /mnt/myexport
    asmcli create smart-copy –-source /mnt/myexport

    VERIFICATION:  List all available Smart Copies
    asmcli list smart-copy –verbose 2

    (this will provide the object ID used in the next step)

  3. Mount a Smart Copy to a temporary location of /mnt/smartcopy
    asmcli mount smart-copy –-source /mnt/myexport –-object \f-f6a7e0-234b7ce30-d9c3f81bedbb96ba –-destination /mnt/smartcopy

  4. Unmount a Smart Copy mounted in the previous step
    asmcli unmount smart-copy –-object f-f6a7e0-234b7ce30-d9c3f81bedbb96ba –source /mnt/smartcopy

When documentation becomes a feature
The combination of a refined product, and improved documentation allowed for complete configuration and operation by just reading the manual.  It contained real examples of commands and actions, and even a few best practices.  No fumbling around due to an absence of detail or accuracy.  No need to search the net or call Technical Support this time.  Installation and configuration procedures reflected exactly what I experienced when testing out the new version.  What a nice surprise.  I wish this was more common.

More Tips for using the HIT/LE
Since my initial deployment of the HIT/LE, I had to do a fair amount of testing with these Linux systems running guest attached volumes to make sure they were satisfying performance needs; in particular, file I/O.  From that testing, and observations of the systems in production, here are a few things worth noting.

  • Getting data that lives on guest attached volumes onto traditional backup media does require extra thought and consideration, as traditional backup solutions that use the vCenter API can’t see these volumes. Take this into consideration when deciding use cases for guest attached volumes.
  • Don’t skimp on Linux VM memory.  Linux file I/O can be really impressive, but only if it has enough RAM.  If you have a lot of file I/O, linux will need the RAM.  I found going with anything less that 2GB of RAM had a pretty big impact on performance.
  • Review the role of the Linux VM so that it can be right sized.  I ran into a case where I was replacing a very important physical server with a Linux VM for our Development group, but unbeknownst to me, it was performing duties I was not aware of.
  • Make sure there aren’t traditional routines that unnecessarily manipulate that data on the guest volume.  This is reflected as changed block data, and could dramatically reduce the number of snaphots or replicas you can retain at any given time.
  • Take a quick look at your vSwitch and port group configuration in vSphere for your guest attached volumes to make sure you are getting the most out of MPIO.  Will Urban has written a great post Data Drives in VMware which addresses this topic. 

In summary, the newest edition of the HIT/LE is definitely new. In fact, it feels like a complete re-write, and leaves me baffled as to why it didn’t warrant a 2.0 version designator. Nevertheless, the specific features added allow for real protection workflows to be achieved. I need to spend some more time with it to incorporate many of the new features into our environment.  If you were interested in guest attached volumes in Linux, but were intimidated by the complexity of the old version, give HIT/LE 1.1 a try.

VDI for me. Part 5 – The wrap up

 

Now that VMware View is up and running, you might be curious to know how it is working.  Well, you’re in luck, because this post is about how View worked, and what was learned from this pilot project.  But first, here is a quick recap of what has been covered so far.

VDI for me. Part 1 – Why the interest 
VDI for me. Part 2 – The Plan
VDI for me. Part 3 – Firewall considerations
VDI for me. Part 4 – Connection Servers and tuning

I was given the opportunity to try VMware View for a few different reasons (found here).  I wasn’t entirely sure what to expect, but was determined to get a good feel for what VDI in 2012 could do.  Hopefully this series has helped you gain an understanding as well. 

The user experience
Once things were operational, the ease and ubiquity of access to the systems was impressive.  One of our most frequent users often stated that he simply forgot where the work was actually being performed.  Comments like that are a good indicator of success.  From a remote interaction standpoint, the improvements most often showed up where it was really needed; remote display over highly latent connections, with convenience of access.  Being able to access a remote system from behind one corporate network to another was as productive as it was cool. 

It was interesting to observe how some interpreted the technology.  Some embraced it for what it was (an appliance to be more productive), while others chose to be more suspicious.  You may have users who complain about their existing computers, but are apprehensive at the notion of it being taken away for something that isn’t tangible.  Don’t underestimate this emotional connection between user and computer.  It’s a weird, but very real aspect of a deployment like this. 

Virtualization Administrators know that good performance is often a result of a collection of components (storage, network, CPU, hypervisor) working well together through a good design.  Those of us who have virtualized our infrastructures are accustomed to this.  Users are not.  As VMs become more exposed to the end users (whether they be for VDI, or other user-facing needs), your technical users may become overly curious by what’s “under the hood” with their VM.  This can be a problem.  Comparisons between their physical machine and the VM are inevitable, and they may interpret a VM with half the processors and RAM as their physical machine to provide only half of the experience.  You might even be able to demonstrate that the VM is indeed better performing in many ways, yet the response might be that they still don’t have enough RAM, CPU, etc.  The end user knows nothing about hypervisors or IOPS, but they will pay attention to some of the common specifications general consumers of technology have been taught to care about; RAM and CPUs.

So in other words, there will be aspects of a project like this that have everything to do with virtualization, yet nothing to do with virtualization.  It can be as much of a people issue as it is a technical issue.

Other Impressions
The PCoIP protocol is very nice, and really shines in certain situations. I love the fact that it is a tunable, non-connection oriented protocol that leaves all of the rendering up to the host. It just makes so much sense for remote displays. But it can have characteristics that make it feel different to the end user. The old “window shake” test might redraw itself slightly different than in a native display, or using something like RDP. This is something that the user may or may not notice.

The pilot program included the trial of a PCoIP based Zero Client. The Wyse P20 didn’t disappoint. Whether it was connecting to a VM brokered by View, or a physical workstation with a PCoIP host card brokered by View, the experience was clean and easy. Hook up a couple of monitors to it, and turn it on. It finds the connection server, so all you need to do is enter your credentials, and you are in. The zero client was limited to just PCoIP, so if you need flexibility in that department, perhaps a thin client might be more appropriate for you. I wanted to see what no hassle was really like.

As far as feedback, the top three questions I usually received from users went something like this:

“Does it run on Linux?”

“How come it doesn’t run on Linux?”

“When is it going to run on Linux?”

And they weren’t just talking about the View Client (which as of this date will run on Ubuntu 11.04), but more importantly, the View Agent.  There are entire infrastructures out there that use frameworks and solutions that run on nothing but Linux.  This is true especially in arenas like Software Development, CAE and Scientific communities.  Even many of VMware’s offerings are built off of frameworks that have little to do with Windows.  The impression that the supported platforms of View gave to our end users was that VMware’s family of solutions were just Windows based.  Most of you reading this know that simply isn’t true.  I hope VMware takes a look at getting View agents and clients out for Linux.

Serving up physical systems using View as the connection broker is an interesting tangent to the whole VDI experience.  But of course, this is a one user to one workstation arrangement – its just that the workstation isn’t stuffed under a desk somewhere.  I suspect that VMware and its competitors are going to have to tackle the problem of how to harness GPU power through the hypervisor so that all but the most demanding of high end systems can be virtualized.  Will it happen with specialized video cards likely to come from the VMware/NVIDIA partnership announced in October of 2011?  Will it happen with some sort of SR-IOV?  The need for GPU performance is there.  How it will be a achieved, I don’t know.  In the short term, if you need big time GPU power, a physical workstation with a PCoIP host card will work fine.

The performance and wow factor of running a View VM on a tablet is high as well.  If you want to impress anyone, just show this setup on a tablet.  Two or three taps on the tablet and you are in.  But we all know that User Experience (UX) designs for desktop applications were meant for a large screen, mouse, and a keyboard.  It will be interesting to see how the evolution of these technologies continue, so that UX can hit mobile devices in a more appropriate way.  Variations of application virtualization is perhaps the next step.  Again, another exciting unknown.

Also a worthwhile note is competition, not only in classically defined VDI solutions, but access to systems.  A compelling aspect of using View is that it pairs a solution for remote display, and brokering secure remote access into one package.  But other competing solutions do not necessarily have to take that approach.  Microsoft’s “Direct Access” allows for secure RDP sessions to occur without a traditional VPN.  I have not had an opportunity yet to try their Unified Access Gateway (UAG) solution, but it gets rave reviews from those who implement it, and use it.  Remote Desktop Session Host (RDSH) in Windows Server 8 promises big things (if you only use Windows of course).

Among the other challenges is how to implement such technologies in a way that is cost effective.  Up front costs associated with going beyond a pilot phase might be a bit tough to swallow, as technical challenges such as storage I/O deserve attention.  I suspect with the new wave of SSD and SSD hybrid SAN arrays out there, that it might make the technical and financial challenges more palatable.  I wish that I had the opportunity to demonstrate how well these systems would work on an SSD or hybrid array, but the word “pilot” typically means “keep the costs down.”  So no SSD array until we move forward with a larger deployment.

There seems to be a rush by many to take a position on whether VDI is the wave of the future, or a bust that will never happen.  I don’t think its necessary to think that way.  It is what it is; a technology that might benefit you or the business you work for, or it might not.  What I do know is that it is rewarding and fun to plan and deploy innovative solutions that help end users, while addressing classic challenges within IT.  This was one of those projects.

Those who have done these types of implementations will tell you that successful VDI implementations always pay acute attention to the infrastructure, especially storage.  (Reading about failed implementations seems to confirm this).  I believe it.  I was almost happy that my licensing forced me to keep this deployment small, as I could focus on the product rather than some of the implications with storage I/O that would inevitably come up with a larger deployment.  Economies of scale makes VDI intriguing in deployment and operation.  However, it appears to be that scaling is the tricky part. 

What might also need a timely update is Windows licensing.  There is usually little time left in the day to understand the nuances of EULAs in general – especially Windows licensing.  VDI adds an extra twist to this.  A few links at the end of this post will help explain why.

None of these items above discount the power and potential of VDI.  While my deployments were very small, I did get a taste of its ability to consolidate corporate assets back to the data center.  The idea of provisioning, maintaining, and protecting end user systems seems possible again, and in certain environments could have a profound improvement.  It is easy to envision smaller branch office greatly reducing, or eliminating servers at their location.   AD designs simplify.  Assets simplify, as does access control – all with providing a more flexible work environment.  Not a bad combination.

Thanks for reading.

Helpful Links:
Two links on Windows 7 SPLA and VDI

http://www.brianmadden.com/blogs/brianmadden/archive/2011/03/02/why-microsoft-hates-vdi.aspx


http://www.brianmadden.com/blogs/gabeknuth/archive/2012/03/09/gasp-turns-out-onlive-really-isn-t-in-compliance-with-microsoft-licensing.aspx

RDSH in Windows Server 2008

http://searchvirtualdesktop.techtarget.com/tip/RDSH-and-RemoteFX-in-Windows-Server-8-to-improve-VDI-user-experience

VDI has little to do with the Desktop

http://whiteboardninja.wordpress.com/2011/01/24/planning-for-vdi-has-little-to-do-with-the-desktop/

Scott Lowe’s interesting post on SR-IOV

http://blog.scottlowe.org/2012/03/19/why-sr-iov-on-vsphere/

Improving density of VMs per host with Teradici’s PCoIP Offload card for VMware View

http://www.teradici.com/pcoip/pcoip-products/teradici-apex-2800.php

VDI for me. Part 4

 

Picking up where we left off in VDI for me. Part 3, we are at a point in which the components of View can be installed and configured.  As much as I’d like to walk you through each step, and offer explanations at each point, sticking to abbreviated steps is a better way to help you understand how the pieces of the puzzle fit together.  Besides, others have great posts on installing and configuring the View Connection servers, not to mention the VMware documentation, which is quite good.  The links at the end of the post will give you a good start.  My focus will be to hit on the main areas to configure to get View up and running.

Here is the relationship between the Connection Servers, the clients, and the systems running the agents in my environment.  The overall topology for my View environment can be found in VDI for me. Part 2.

For clients accessing View from the Internal LAN

image

For clients accessing View from offsite locations

image

Overview of steps
This is the order I used for deploying the View Components.  To simplify, you may wish to skip steps 3 and 4 until you get everything working on the inside of your network. 

  1. Install Composer on your vCenter Server.
  2. Build a VM and install View Connection Server intended for local LAN access only.
  3. Build a VM and install View Connection Server intended to talk with Security Server.
  4. Build a VM and install View Security Server in your DMZ.
  5. Install View Agent on a VM.
  6. Create a Pool for the VM, and entitle the pool to an user or group in AD.
  7. Connect to the VM using the View Client.

Configuring your first Connection Server (For Internal Access)
From the point that your first connection manager is installed, you may begin the configuration.

  1. Browse out to VMware View Administrator portal on your connection server (
    https://[yourconnectionserver]/admin
    ) and enter the appropriate credentials.
  2. Drill down into View Configuration > Product Licensing and Usage > Edit License to add your license information.
  3. Register your vCenter Server by going to View Configuration > Servers > Add.  Fill out all of the details, but do not click “Enable View Composer” quite yet.  Click OK to exit.
  4. Go back into Edit the vCenter server configuration, and click “Enable View Composer and Click OK to exit.
  5. In the area where the listing of View Connection servers are listed, select the only View Connection Server on the list, and click “Edit”.  You will want to make sure both check boxes are unchecked, and use internal FQDN and IP addresses only.

image

Configuring your Second Connection Server (to be paired with Security Server)
During the installation of View on the second server, it will ask what type of Connection Server it will be.  Choose “Replica” from the list, and type in the name of your first Connection Server.

  1. Browse out to the View Administrator Portal, and you will now see a second connection server listed.  Highlight it, and click on Edit.
  2. Unlike the first connection server, this connection server needs to have both checkboxes checked.

image

Configuring your Security Server (to be paired with your second Connection Server)
Just a few easy steps will take care of your Security Server.

  1. Browse out to the View Administrator portal, highlight the Connection Server you want to pair with the security server, and click More Commands > Specify Security Server Pairing Password.
  2. Install the Connection Server install bits onto your new Security Server.  Choose “Security Server” for the type of Connection Server it will be.  It will then prompt you to enter the internal Connection Server to pair it to.  This is the internal FQDN of the server Connection Server.
  3. Enter the View pairing password established in step 1.  This will make the Security Server show up in the View Administrator Portal.
  4. Go back to the View Administrator portal, highlight the server that is listed under the Security Server, and click Edit.  This is where you will enter in the FQDN desired.  The PCoIP address should be the publicly registered IP address.  In my case, it is the address bound to the external interface of my firewall, but your topology might dictate otherwise.

image

 

After it is all done, in the View Administrator portal, you should see one entry for a vCenter server, two entries for the View Connection servers, and one entry for a Security Server.

image

From this point, it is just a matter of installing the View Agent on the VMs (or physical workstation with a PCoIP host card) you’d like to expose, create a pool, entitle a user or group, and you should be ready to connect.

Tuning
After you add the VMware View adm templates to Active Directory, a number of tunable settings will be available to you.  The good news in the tuning department is that while PCoIP is highly tunable, I don’t feel it has to be the first thing you need to address after the initial deployment.  With View 5, it works quite well out of the box.  I will defer to this post
http://myvirtualcloud.net/?p=2061
on some common, View specific GPO settings you might want to adjust, especially in a WAN environment.  The two settings that will probably make the biggest impact are the “Maximum Frame Rate” settings, and the “Build to Lossless” toggle.  I applied these and a few others in order to accommodate our Development Team working on another continent deal with their 280+ms latency. 

The tools available to monitor, test, and debug PCoIP are improving almost by the day, and will be an interesting area to watch.  Take a look at the links for the PCoIP Log Viewer and the PCoIP Configuration utilities at the end of this post.

Tips and Observations
When running View, there is a noticeable increase on the dependence of vCenter, and the databases that support it and View Composer.  This is especially the case in smaller environments where the server running vCenter might be housing the vCenter database, and the database for View Composer.  Chris Wahl’s recent post Protecting the vCenter Database with SQL Log Shipping addresses this, and provides a good way to protect the vCenter databases through log shipping.  If you are a Dell EqualLogic user, it may be helpful to move your SQL DB and Log volumes off to guest attached volumes, and use their ASM/ME application to easily make snaps and replicas of the database.  Regardless of the adjustments that you choose to make, factor this in to your design criteria, especially if the desktops served up by View become critical to your business.

If your connection to a View VM terminates prematurely, don’t worry.  It seems to be a common occurrence during initial deployment that can happen for a number of reasons.  There are a lot of KB articles on how to diagnose them.  One that I ran across that wasn’t documented very much was that the VM may not have enough memory assigned to the video RAM.  The result can be that it works fine using RDP, but disconnects when using PCoIP.  I’ve had some VMs mysteriously reduce themselves back down to a default number that won’t support large or multiple screen resolutions.  Take a quick look the settings of your VM.  Once those initial issues have been resolved, I’ve found everything to work as expected.

In my mad rush to build out the second View environment at our CoLo, everything worked perfectly, except when it came to the View client validating the secured connection. All indicators pointed to SSL, and perhaps how the exported SSL certificate was applied to the VM running the Security Server. I checked, and rechecked everything, burning up a fair amount of time. It turned out it was a silly mistake (aren’t they all?). In C:\Program Files\VMware\VMware View\Server\sslgateway\conf there needs to be a file called locked.properties. This contains information on the exported certificate. Well, when I created the locked.properties file, Windows was nice enough to append the .txt to it (e.g. locked.properties.txt). The default settings in Windows left that extension hidden, so it didn’t show. By the way, I’ve always hated that default setting for hiding file extensions. It is controlled via GPO at my primary site, but didn’t have that set at the CoLo site.

Next up, I’ll be wrapping up this series with the final impressions of the project.  What worked well, what didn’t.  Perceptions from the users, and from those writing the checks.  Stay tuned.

Helpful Links
VMware View Documentation Portal.  A lot of good information here.

http://www.vmware.com/support/pubs/view_pubs.html

A couple of nice YouTube videos showing a step by step installation of View Composer





How to apply View specific settings for systems via GPO (written for 4.5, but also applies to 5.0)

http://blog.vhowto.info/2010/11/25/vmware-view-4-5-active-directory-group-policies/

PCoIP disconnect codes

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2012101

PCoIP Log Viewer

http://mindfluxinc.net/?p=195

PCoIP Configuration utility (beta)

http://mindfluxinc.net/?p=338

More PCoIP tuning suggestions

http://mindfluxinc.net/?p=338

VDI for me. Part 3

 

In VDI for me. Part 2, I left off with how VMware View was going to be constructed in my environment.  We are almost at the point of installing and configuring the VMware View components, but before that is addressed, the most prudent step is to ensure that the right type of traffic can communicate across the different isolated network segments.  This post is simply going to focus on the security rules to do such a thing.  For me, access to these segments are managed by a Celestix MSA 5200i, 6 port Firewall running Microsoft ForeFront Threat Management Gateway (TMG) 2010.  While the screen captures are directly from TMG, much of the information here would apply to other security solutions.

Since all of the supporting components of VMware View will need to communicate across network segments anyway, I suggest making accommodations in your firewall before you start building the View components.  Sometimes this is not always practical, but in this case, I found that I only had to make a few adjustments before things were working perfectly with all of the components.

My network design was a fairly straightforward, 4 legged topology. (a pretty picture of this can be seen in Part 2)

Network Leg Contains
External All users connecting to our View environment.
LAN View connection server dedicated for access from the inside. 
View connection server dedicated for communication with the Security Server. 
Systems running the View agent software.
DMZ1 Externally facing View “Security Server”
DMZ4 vSphere Management Network.  vCenter, and SQL databases providing services for vCenter, and View Composer.

For those who have their vSphere Management network on a separate network by way of a simple VLAN, your rules will be simpler than mine.  For clarity, I will just show the rules that are used for getting VMware View to work.

Before you get started, make sure you have planned out all of the system names and IP addresses of the various Connection Servers, VM’s running the View Agent.  It will make the work later on easier. 

Creating Custom Protocols for VMware View in TMG 2010
In order to build the rules properly, you will first need to define some “User-Defined” protocols.  For the sake of keeping track of all of the user defined protocols, I always included the name “View” (to remember it’s purpose), the direction, type, and the port number.  Here was the list (as I named them) that was used as a part of my rule sets.

VMware View Inbound TCP&UDP (4172)
VMware View Outbound (32111)
VMware View Outbound (4001)
VMware View Outbound (8009)
VMware View Outbound (9427)
VMware ViewComposer Inbound (18443)
VMware ViewComposer Outbound (18443)
VMware ViewPCoIP Outbound (4172)
VMware ViewPCoIP SendReceiveUDP (4172)

Page 19 of the VMware View Security Reference will detail the ports and access needed.  I appreciate the detail, and it is all technically correct, but it can be a little confusing.  Hopefully, what I provide will help bridge the gap on anything confusing in the manual.  My implementation at this time does not include a View Transfer Server, so if your deployment includes this, please refer to the installation guide.

Creating Access Rules for VMware View in TMG 2010
The next step will be to build some access rules.  Access rules are typically defining access in a From/To arrangement.  Here are what my rules looked like for a successful implementation of VMware View in TMG 2010.

image

Creating Publishing rules for VMware View in TMG 2010
In the screen above, near the bottom, you see two Publishing rules.  These are for the purposes of securely exposing a server that you want visible to the outside world.  In this case, that would be the View Security Server.  The server will still have its private address as it resides in the DMZ, but would take on one of the assigned public IP addresses bound to the external interface of the TMG appliance.  To make View work, you will need two publishing rules.  One for HTTPS, and the other for PCoIP.  A View session with the display setting of RDP will use only the HTTPS publisher.  A View session with the display setting of PCoIP will use both of the publishing rules.  Page 65 of the View 5 Architecture Planning Guide illustrates this pretty well.

In the PCoIP publishing rule, notice how you need both TCP and UDP, and of course, the correct direction.

image

My friend Richard Hicks had some great information on his ForeFront TMG blog that was pertinent to this project. ForeFront TMG 2010 Protocol Direction Explained is a good reminder of what you will need to know when defining custom protocols, and the rule sets that use them.  The other was the nuances of using RDP with the “Web Site Publishing Rule” generator.  Let me explain.

TMG has a “Web Site Publishing Rule” generator that allows for a convenient way of exposing HTTP and HTTPS related traffic to the intended target. This publisher’s role is to protect by inspection. It terminates the session, decrypts, inspects, then repackages for delivery onto its destination. This is great for many protocols inside of SSL such as HTTP, but protocols like RDP inside SSL do not like it. This is what I was running into during deployment. View connections using PCoIP worked fine. View connections using RDP did not. Rich was able to help me better understand what the problem was, and how to work around it. The fix was simply to create a “Non-Web Server Protocol Publishing Rule” instead, choosing HTTPS as the protocol type.  For all of you TMG users out there, this is the reason why I haven’t described how to create a “Web Listener” to be used with a traditional “Web Site Publishing Rule.”  There is no need for one.

A few tips in with implementing you’re your new firewall rules.  Again, most of these apply to any Firewall you choose to use.

1.  Even if you have the intent of granular lockdown (as you should), it may be easiest to initially define the rule sets a little broader.  Use things like entire network segments instead of individually assigned machine objects  You can tighten the screws down later (remember to do so), and it is easier to diagnose issues.

2.  Watch those firewall logs.  Its easy to mess something up along the way, and your real time firewall logs will be your best friend.  But be careful not to get too fancy with the filtering.  You may be missing some denied traffic that doesn’t necessarily match up with your filter.

3.  You will probably need to create custom protocols.  Name them in such a way that they are clear that they are an incoming or outgoing protocol, and perhaps whether they are TCP, or UDP.  Otherwise, it can get a little confusing when it comes to direction of traffic.  Rule sets have a direction, as do the protocols that are contained in them.

4.  Stay disciplined to rule set taxonomy.  You will need to understand what the rule is trying to do.  Consistency is key.  You may find it more helpful to name the computer objects  the role that they are playing, rather than their actual server name.  It helps with understanding the flow of the rules.

5.  Add some identifier to your rules defined for View.  That way, when you are troubleshooting, you can enter “View” in the search function, and it quickly shows you only the rule sets you need to deal with.

6.  Stick the the best practices when it comes to placement of the View rules into your overall rule sets.  TMG processes the rules by order, so there is some methods to make the processing most efficient.  They remain unchanged from it’s predecessor, ISA 2006.  Here is a good article on better understanding the order.

7.  TMG 2010 has a nice feature of grouping rules.  This gives the ability of a set of contiguous rules to be seen as one logical unit.  You might find this helpful in most of your View based rule creation.  I would probably recommend having your access rules for View in a different group than your publishing rules.  This is so that you can maintain best practices on placement/priority of rule types.

8.  When you get to the point diagnosing what appear to be connection problems between clients, agents, and connection servers, give VMware a call.  They have a few tools that will help in your efforts.  Unfortunately, I can’t provide any more information about the tools at this time, but I can say that for the purposes of diagnosing connectivity issues, they are really nice.

I also stumbled upon an interesting (and apparently little known) issue when you have a system with multiple NICs that is also running the View agent.  For me, this issue arose on the high powered physical workstation with PCoIP host card, using View as the connection broker.  This system had two additional NICs that connected to the iSCSI SAN.  The PCoIP based connections worked, but RDP sessions through View failed, even when standard RDP worked without issue.  Shut off the other NICs, and everything worked fine.  VMware KB article 1026498 addresses this.  The fix is simply adding a registry entry

On the host with the PCoIP card, open regedit and add the following entry:
HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VDM\Node Manager

Add the REG_SZ value:
172.16.128.0/24

If you experience issues connecting to one system running the View Agent, but not the other, a common practice is to remove and reinstall the View Agent.  Any time the VMware Tools on the VM are updated, you will also need to reinstall the agent. 

More experimentation and feedback
As promised in part 1 of this series, I wanted to keep you posted of feedback that I was getting from end users, and observations I had along the way. 

The group of users allowed to connect continued to be impressed to the point that using it was a part of their workday.  I found myself not being able to experiment quite the way I had planned, because users were depending on the service almost immediately.  So much for that idea of it being a pilot project.

The experimentation with serving up a physical system with PCoIP using VMware as a connection broker has continued to be an interesting one. There are pretty significant market segments that demands high powered GPU processing. CAD/CAE, visualization, animation, graphic design, etc have all historically relied on client side GPUs. So it is a provocative thought to be able to serve up high powered graphics workstations without it sitting under a desk somewhere.  The elegance of this arrangement is that once a physical system has a PCoIP host card in it, and the View Agent installed, it is really no different than the lower powered VM’s served up by the cluster.  Access is the same, and so is the experience.  Just a heck of a lot more power.  Since it is all host based rendering, you can make the remote system as powerful as your budget allows.  Get ready for anyone who accesses a high powered workstation like this to be spoiled easily.  Before you know it, they will ask if they can have 48GB of RAM on their VM as well. 

Running View from any client (Windows Workstation, Mac, Tablets, Ubuntu, and a Wyse P20 zero client) proved to give basically the same experience.  It was easy for the end users to connect.  Since I have a dual name space (“view.mycompany.com” from the outside, and “view.local.lan” from the inside), the biggest confusion has been for laptop users remembering which address to use.  That, and reminding them to not use the VPN to connect.  A few firewall rules blocking access will help guide them. 

One of my late experiments came after I met all of my other acceptance criteria for the project.  I wanted to see how VMware View worked with linked clones.  Setting up linked clones was pretty easy.  However, I didn’t realize until late in the project that a linked clone arrangement of View really requires you to run a Microsoft KMS licensing server.  Otherwise, your trusty MAK license keys might be fully depleted in no time.  There is a VMware KB Article describing a possible workaround, but it also warns you of the risks.  Accommodating for KMS licensing is not a difficult matter to address (except for extremely small organizations who don’t qualify for KMS licensing), but it was something I didn’t anticipate.

I had the chance to do this entire design and implementation not once, but twice.  No, it wasn’t because everything blew up and I had no backups.  My intention was to build a pilot out at my Primary facility first, then build the same arrangement (as much as possible) at the CoLocation facility.  What made this so fast and easy?  As I did my deployment at my Primary facility, I did all of my step by step design documentation in Microsoft OneNote; my favorite non-technical application.  Step by step deployment of the systems, issues, and other oddball events were all documented the first time around.  It made the second go really quick and easy.  Whether it be my Firewall configuration, or the configurations of the various Connection Servers, the time spent documenting paid off quickly.

Next up, I’ll be going over some basic configuration settings of your connection servers, and maybe a little tuning.

Follow

Get every new post delivered to your Inbox.

Join 307 other followers