Configuring a VM for SNMP monitoring using Cacti

There are a number of things that I don’t miss with old physical infrastructures.  One near the top of the list is a general lack of visibility for each and every system.  Horribly underutilized hardware running happily along side overtaxed or misconfigured systems, and it all looked the same.  Fortunately, virtualization has changed much of that nonsense, and performance trending data of VMs and hosts are a given.

Partners in the VMware ecosystem are able to take advantage of the extensibility by offering useful tools to improve management and monitoring of other components throughout the stack.  The Dell Management Plug-in for VMware vCenter is a great example of that. It does a good job of integrating side-band management and event driven alerting inside of vCenter.  However, in many cases you still need to look at performance trending data of devices that may not inherently have that ability on it’s own.  Switchgear is a great example of a resource that can be left in the dark.  SNMP can be used to monitor switchgear and other types of devices, but it’s use is almost always absent in smaller environments.  But there are simple options to help provide better visibility even for the smallest of shops.  This post will provide what you need to know to get started.

In this example, I will be setting up a general purpose SNMP management system running Cacti to monitor the performance of some Dell PowerConnect switchgear.  Cacti leverages RRDTool’s framework to deliver time based performance monitoring and graphing.  It can monitor a number of different types of systems supporting SNMP, but switchgear provides the best example that most everyone can relate to.  At a very affordable price (free), Cacti will work just fine in helping with these visibility gaps.  

Monitoring VM
The first thing to do is to build a simple Linux VM for the purpose of SNMP management.  One would think there would be a free Virtual Appliance out on the VMware Virtual Appliance Marektplace for this purpose, but if there is, I couldn’t find it.  Any distribution will work, but my instructions will cater toward the Debian distributions – particularly Ubuntu, or a Ubuntu clone like Linux Mint (my personal favorite).  Set it for 1vCPU and 512 MB of RAM.  Assign it a static address on your network management VLAN (if you have one).  Otherwise, your production LAN will be fine.  While it is a single purpose built VM, you still have to live with it, so no need to punish yourself by leaving it bare bones.  Go ahead and install the typical packages (e.g. vim, ssh, ntp, etc.) for convenience or functionality.

Templates are an option that extend the functionality in Cacti.  In the case of the PowerConnect switches, the template will assist in providing information on CPU, memory, and temperature.  A template for the PowerConnect 6200 line of switches can be found here.  The instructions below will include how to install this.

Prepping SNMP on the switchgear

In the simplest of configurations (which I will show here), there really isn’t much to SNMP.  For this scenario, one will be providing read-only access of SNMP via a shared community name. The monitoring VM will poll these devices and update the database accordingly.

If your switchgear is isolated, as your SAN switchgear might be, then there are a few options to make the switches visible in the right way. Regardless of what option you use, the key is to make sure that your iSCSI storage traffic lives on a different VLAN from your management interface of the device.  I outline a good way to do this at “Reworking my PowerConnect 6200 switches for my iSCSI SAN

There are a couple of options in connecting the isolated storage switches to gather SNMP data: 

Option 1:  Connect a dedicated management port on your SAN switch stack back to your LAN switch stack.

Option 2:  Expose the SAN switch management VLAN using a port group on your iSCSI vSwitch. 

I prefer option 1, but regardless, if it is iSCSI switches you are dealing with, you will want to make sure that management traffic is on a different VLAN than your iSCSI traffic to maintain the proper isolation of iSCSI traffic. 

Once the communication is in place, just make a few changes to your PowerConnect switchgear.  Note that community names are case sensitive, so decide on a name, and stick with it.

enable

configure

snmp-server location "Headquarters"

snmp-server contact "IT"

snmp-server community mycompany ro ipaddress 192.168.10.12

Monitoring VM – Pre Cacti configuration
Perform the following steps on the VM you will be using to install Cacti.

1.  Install and configure SNMPD

apt-get update

mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf.old

2.  Create a new /etc/snmp/snmpd.conf with the following contents:

rocommunity mycompanyt

syslocation Headquarters

syscontact IT

3.  Edit /etc/default/snmpd to allow snmpd to listen on all interfaces and use the config file.  Comment out the first line below and replace it with the second line:

SNMPDOPTS=’-Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid 127.0.0.1′

SNMPDOPTS=’-Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid -c /etc/snmp/snmpd.conf’

4.  Restart the snmpd daemon.

sudo /etc/init.d/snmpd restart

5.  Install additional perl packages:

apt-get install libsnmp-perl

apt-get install libnet-snmp-perl

Monitoring VM – Cacti Installation
6.  Perform the following steps on the VM you will be using to install Cacti.

apt-get update

apt-get install cacti

During the installation process, MySQL will be installed, and the installation will ask what you would like the MySQL root password to be. Then the installer will ask what you would like cacti’s MySQL password to be.  Choose passwords as desired.

Now, the Cacti installation is available via http://[cactiservername]/cacti with a username and password of "admin" Cacti will now ask you to change the admin password.  Choose whatever you wish.

7.  Download PowerConnect add-on from http://docs.cacti.net/usertemplate:host:dell:powerconnect:62xx and unpack both zip files

8.  Import the host template via the GUI interface.  Log into Cacti, and go to Console > Import Templates, select the desired file (in this case, cacti_host_template_dell_powerconnect_62xx_switch.xml), and click Import.

9.  Copy the 62xx_cpu.pl script into the Cacti script directory on server (/usr/share/cacti/site/scripts).  This may need executable permissions.  If you downloaded it to a Windows machine, but need to copy it to the Linux VM, WinSCP works nicely for this.

10.  Depending on how things were copied, there might be some line endings in the .pl file.  You can clean up that 62xx_cpu.pl file by running the following:

dos2unix 62xx_cpu.pl

Using Cacti
You are now ready to run Cacti so that you can connect and monitor your devices. This example shows how to add the device to Cacti, then monitor CPU and a specific data port on the switch.

1.  Launch Cacti from your workstation by browsing out to http://[cactiservername]/cacti  and enter your credentials.

2.  Create a new Graph Tree via Console > Graph Trees > Add.  You can call it something like “Switches” then click Create.

3.  Create a new device via Console > Devices > Add.  Give it a friendly description, and the host name of the device.  Enter the SNMP Community name you decided upon earlier.  In my example above, I show the community name as being “mycompany” but choose whatever fits.  Remember that community names are case sensitive.

4.  To create a graph for monitoring CPU of the switch, click Console > Create New Graphs.  In the host box, select the device you just added.   In the “Create” box, select “Dell Powerconnect 62xx – CPU” and click Create to complete.

5.  To create a graph for monitoring a specific Ethernet port, click Console > Create New Graphs.  In the Host box, select the device you just added.  Put a check mark next to the port number desired, and select In/Out bits with total bandwidth.  Click Create > Create to complete. 

6.  To add the chart to the proper graph tree, click Console > Graph Management.  Put a check mark next to the Graphs desired, and change the “Choose and action” box to “Place on a Tree [Tree name]

Now when you click on Graphs, you will see your two items to be monitored

image

By clicking on the magnifying glass icon, or by the “Graph Filters” near the top of the screen, one can easily zoom or zoom out to various sampling periods to suite your needs.

Conclusion
Using SNMP and a tool like Cacti can provide historical performance data for non virtualized devices and systems in ways you’ve grown accustomed to in vSphere environments.  How hard are your switches running?  How much internet bandwidth does your organization use?  This will tell you.  Give it a try.  You might be surprised at what you find.

Vroom! Scaling up Virtual Machines in vSphere to meet performance requirements–Part 2

In my original post, Scaling up Virtual Machines in vSphere to meet performance requirements, I described a unique need for the Software Development Team to have a lot of horsepower to improve the speed of their already virtualized code compiling systems.  My plan of attack was simple.  Address the CPU bound systems with more powerful blades, and scale up the VMs accordingly.  Budget constraints axed the storage array included in my proposal, and also kept this effort limited to keeping the same number of vSphere hosts for the task. 

The four new Dell M620 blades arrived and were quickly built up with vSphere 5.0 U2 (Enterprise Plus Licensing) with the EqualLogic MEM installed.  A separate cluster was created to insure all build systems were kept separate, and so that I didn’t have to mask any CPU features to make them work with previous generation blades.  Next up was to make sure each build VM was running VM hardware level 8.  Prior to vSphere 5, the guest VM was unaware of the NUMA architecture behind it.  Without the guest OS understanding memory locality, one could introduce problems into otherwise efficient processes.  While I could find no evidence that the compilers for either OS are NUMA aware, I knew the Operating Systems understood NUMA.

Each build VM has a separate vmdk for its compiling activities.  Their D:\ drive (or /home for Linux) is where the local sandboxes live.  I typically have this second drive on a “Virtual Device Node” changed to something other than 0:x.  This has proven beneficial in previous performance optimization efforts.

I figured the testing would be somewhat trivial, and would be wrapped up in a few days.  After all, the blades were purchased to quickly deliver CPU power for a production environment, and I didn’t want to hold that up.  But the data the tests returned had some interesting surprises.  It is not every day that you get to test 16vCPU VMs for a production environment that can actually use the power.  My home lab certainly doesn’t allow me to do this, so I wanted to make this count. 

Testing
The baseline tests would be to run code compiling on two of the production build systems (one Linux, and the other Windows) on an old blade, then the same set with the same source code on the new blades.  This would help in better understanding if there were speed improvements from the newer generation chips.  Most of the existing build VMs are similar in their configuration.  The two test VMs will start out with 4vCPUs and 4GB of RAM.  Once the baselines were established, the virtual resources of each VM would be dialed up to see how they respond.  The systems will be compiling the very same source code.

For the tests, I isolated each blade so they were not serving up other needs.  The test VMs resided in an isolated datastore, but lived on a group of EqualLogic arrays that were part of the production environment.  Tests were run at all times during the day and night to simulate real world scenarios, as well as demonstrate any variability in SAN performance.

Build times would be officially recorded in the Developers Build Dashboard.  All resources would be observed in vSphere in real time, with screen captures made of things like CPU, disk and memory, and dumped into my favorite brain-dump application; Microsoft OneNote.  I decided to do this on a whim when I began testing, but it immediately proved incredibly valuable later on as I found myself looking at dozens of screen captures constantly.

The one thing I didn’t have  time to test was the nearly limitless possible scenarios in which multiple monster VMs were contending for CPUs at the same time.  But the primary interest for now was to see how the build systems scaled.  I would then make my sizing judgments off of the results, and off of previous experience with smaller build VMs on smaller hosts. 

The [n/n] title of each test result column indicates the number of vCPUs followed by the amount of vRAM associated.  Stacked bar graphs show a lighter color at the top of each bar.  This indicates the difference in time between the best result and the worst result.  The biggest factor of course would be the SAN.

Bottleneck cat and mouse
Performance testing is a great exercise for anyone, because it helps challenge your own assumptions on where the bottleneck really is.  No resource lives as an island, and this project showcased that perfectly.  Improving the performance of these CPU bound systems may very well shift the contention elsewhere.  However, it may expose other bottlenecks that you were not aware of, as resources are just one element of bottleneck chasing.  Applications and the Operating Systems they run on are not perfect, nor are the scripts that kick them off.  Keep this in mind when looking at the results.

Test Results – Windows
The following are test results are with Windows 7, running the Visual Studio Compiler.  Showing three generations of blades.  The Dell M600 (HarperTown), M610, (Nehalem), and M620 (SandyBridge). 

Comparing a Windows code compile across blades without any virtual resource modifications.

image

Yes, that is right.  The old M600 blades were that terrible when it came to running VMs that were compiling.  This would explain the inconsistent build time results we had seen in the past.  While there was improvement in the M620 over the M610s, the real power of the M620s is that they have double the number of physical cores (16) than the previous generations.  Also noteworthy is the significant impact the SAN (up to 50%) was affecting the end result. 

Comparing a Windows code compile on new blade, but scaling up virtual resources

image

Several interesting observations about this image (above). 

  • When the SAN can’t keep up, it can easily give back the improvements made in raw compute power.
  • Performance degraded when compiling with more than 8vCPUs.  It was so bad that I quit running tests when it became clear they weren’t compiling efficiently (which is why you do not see SAN variability when I started getting negative returns)
  • Doubling the vCPUs from 4 to 8, and the vRAM from 4 to 8 only improved the build time by about 30%, even though the compile showed nearly perfect multithreading (shown below) and 100% CPU usage.  Why the degradation?  Keep reading!

image

    On a different note, it was becoming quite clear already I needed to take a little corrective action in my testing.  The SAN was being overworked at all times of the day, and it was impacting my ability to get accurate test results in raw compute power.  The more samples I ran the more consistent the inconsistency was.  Each of the M620’s had a 100GB SSD, so I decided to run the D:\ drive (where the build sandbox lives) on there to see a lack of storage contention impacted times.  The purple line indicates the build times of the given configuration, but with the D:\ drive of the VM living on the local SSD drive.

image

The difference between a slow run on the SAN and a run with faster storage was spreading.

Test Results – Linux
The following are test results are with Linux, running the GCC compiler. Showing three generations of blades.  The Dell M600 (HarperTown), M610, (Nehalem), and M620 (SandyBridge).

Comparing a Linux code compile across blades without any virtual resource modifications.

image

The Linux compiler showed a a much more linear improvement, along with being faster than it’s Windows counterpart.  Noticeable improvements across the newer generations of blades, with no modifications in virtual resources.  However, the margin of variability from the SAN is a concern.

Comparing a Linux code compile on new blade, but scaling up virtual resources

image

At first glance it looks as if the Linux GCC compiler scales up well, but not in a linear way.  But take a look at the next graph, where similar to the experiment with the Windows VM, I changed the location of the vmdk file used for the /home drive (where the build sandbox lives) over to the local SSD drive.

image

This shows very linear scalability with Linux and a GCC compiler.  A 4vCPU with 4GB RAM was able to compile 2.2x faster with 8vCPUs and 8GB of RAM.  Total build time was just 12 minutes.  Triple the virtual resources to 12/12, and it is an almost linear 2.9x faster than the original configuration.  Bump it up to 16vCPUs, and diminishing returns begin to show up, where it is 3.4x faster than the original configuration.  I suspect crossing NUMA nodes and the architecture of the code itself was impacting this a bit.  Although, don’t lose sight of the fact that a  build that could take up to 45 minutes on the old configuration took only 7 minutes with 16vCPUs.

The big takeaways from these results are the differences in scalability in compilers, and how overtaxed the storage is.  Lets take a look at each one of these.

The compilers
Internally it had long been known that Linux compiled the same code faster than Windows.  Way faster.  But for various reasons it had been difficult to pinpoint why.  The data returned made it obvious.  It was the compiler.

image

While it was clear that the real separation in multithreaded compiling occurred after 8vCPUs, the real problem with the Windows Visual Studio compiler begins after 4vCPUs.  This surprised me a bit because when monitoring the vCPU usage (in stacked graph format) in vCenter, it was using every CPU cycle given to it, and multithreading quite evenly.  The testing used Visual Studio 2008, but I also tested newer versions of Visual Studio, with nearly the same results. 

Storage
The original proposal included storage to support the additional compute horsepower.  The existing set of arrays had served our needs very well, but were really targeted at general purpose I/O needs with a focus of capacity in mind.  During the budget review process, I had received many questions as to why we needed a storage array.  Boiling it down to even the simplest of terms didn’t allow for that line item to survive the last round of cuts.  Sure, there was a price to pay for the array, but the results show there is a price to pay for not buying the array.

I knew storage was going to be an issue, but when contention occurs, its hard to determine how much of an impact it will have.  Think of a busy freeway, where throughput is pretty easy to predict up to a certain threshold.  Hit critical mass, and predicting commute times becomes very difficult.  Same thing with storage.  But how did I know storage was going to be an issue?  The free tool provided to all Dell EqualLogic customers; SAN HQ.  This tool has been a trusted resource for me in the past, and removes ALL speculation when it comes to historical usage of the arrays, and other valuable statistics.  IOPS, read/write ratios, latency etc.  You name it. 

Historical data of Estimated Workload over the period of 1 month

image

Historical data of Estimated Workload over the period of 12 months

image

Both images show that with the exception of weekends, the SAN arrays are maxed out to 100% of their estimated workload.  The overtaxing shows up on the lower part of each screen capture the read and writes surpassing the brown line indicating the estimated maximum IOPS of the array.  The 12 month history showed that our storage performance needs were trending upward.

Storage contention and how it relates to used CPU cycles is also worth noting.  Look at how inadequate storage I/O influences compute. The image below shows the CPU utilization for one of the Linux builds using 8vCPUs and 8GB RAM when the /home drive was using fast storage (the local SSD on the vSphere host)

image

Now look at the same build when running  against a busy SAN array.  It completely changes the CPU usage profile, and thus took 46% longer to complete.

image

General Observations and lessons

  • If you are running any hosts using pre-Nehalem architectures, now is a good time to question why. They may not be worth wasting vSphere licensing on. The core count and architectural improvements on the newer chips put the nails in the coffin on these older chips.
  • Storage Storage Storage. If you have CPU intensive operations, deal with the CPU, but don’t neglect storage. The test results above demonstrate how one can easily give back the entire amount of performance gains in CPU by not having storage performance to support it.
  • Giving a Windows code compiling VM a lot of CPU, but not increasing the RAM seemed to make the compiler trip on it’s own toes.  This makes sense, as more CPUs need more memory addresses to work with. 
  • The testing showcased another element of virtualization that I love. It often helps you understand problems that you might otherwise be blind to. After establishing baseline testing, I noticed some of the Linux build systems were not multithreading the way they should. Turns out it was some scripting errors by our Developers. Easily corrected.

Conclusion
The new Dell M620 blades provided an immediate performance return.  All of the build VMs have been scaled up to 8vCPUs and 8GB of RAM to get the best return while providing good scalability of the cluster.  Even with that modest doubling of virtual resources, we now have nearly 30 build VMs that when storage performance is no longer an issue, will run between 4 and 4.3 times faster than the same VMs on the old M600 blades.  The primary objective moving forward is to target storage that will adequately support these build VMs, as well as looking into ways to improve multithreaded code compiling in Windows.

Helpful Links
Kitware blog post on multithreaded code compiling options

http://www.kitware.com/blog/home/post/434

Using a Synology NAS as an emergency backup DNS server for vSphere

Powering up a highly virtualized infrastructure can sometimes be an interesting experience.  Interesting in that “crossing-the-fingers” sort of way.  Maybe it’s an outdated run book, or an automated power-on of particular VMs that didn’t occur as planned.  Sometimes it is nothing more than a lack of patience between each power-on/initialization step.  Whatever the case, if it is a production environment, there is at least a modest amount of anxiety that goes along with this task.  How often does this even happen?  For those who have extended power outages, far too often.

One element that can affect power-up scenarios is availability of DNS.  A funny thing happens though when everything is virtualized.  Equipment that powers the infrastructure may need DNS, but DNS is inside of the infrastructure that needs to be powered up.  A simple way around this circular referencing problem is to have another backup DNS server that supplements your normal DNS infrastructure.  This backup DNS server acts as a slave to the server having authoritative control for that DNS zone, and would handle at minimum recursive DNS queries for critical infrastructure equipment, and vSphere hosts.  While all production systems would use your normal primary and secondary DNS, this backup DNS server could be used as the secondary name server a few key components:

  • vSphere hosts
  • Server and enclosure Management for IPMI or similar side-band needs
  • Monitoring nodes
  • SAN components (optional)
  • Switchgear (optional)

vSphere certainly isn’t as picky as it once was when it comes to DNS.  Thank goodness.  But guaranteeing immediate availability of name resolution will help your environment during these planned, or unplanned power-up scenarios.  Those that do not have to deal with this often have at least one physical Domain Controller with integrated DNS in place.  That option is fine for many organizations, and certainly accomplishes more than just availability of name resolution.  AD design is a pretty big subject all by itself, and way beyond the scope of this post.  But running a spare physical AD server isn’t my favorite option for a number of different reasons, especially for smaller organizations.  Some folks way smarter than me might disagree with my position.  Here are a few reasons why it isn’t my preferred option.

  • One may be limited in Windows licensing
  • There might be limited availability of physical enterprise grade servers.
  • One may have no clue as to if, or how a physical AD server might fit into their DR strategy.

As time marches on, I also have a feeling that this approach will be falling out of favor anyway.  During a breakout session for optimizing virtualized AD infrastructures at the 2012 VMWorld, it was interesting to hear that the VMware Mothership still has some physical AD servers running the PDCe role.  However, they were actively in the process of eliminating this final, physical element, and building recommendations around doing so.  And lets face it, a physical DC doesn’t align with the vision of elastic, virtualized datacenters anyway.

To make DNS immediately available during these power-up scenarios, the prevailing method in the “Keep it Simple Stupid” category has been running a separate physical DNS server.  Either a Windows member server with a DNS role, or a Linux server with BIND.  But it is a physical server, and us virtualization nuts hate that sort of thing.  But wait!  …There is one more option.  Use your Synology NAS as an emergency backup DNS server.  The intention of this is not to supplant your normal DNS infrastructure. it’s simply to help a few critical pieces of equipment start up.

The latest version of Synology’s DSM (4.1) comes with a beta version of a DNS package.  It is pretty straight forward, but I will walk you through the steps of setting it up anyway.

1.  Verify that your Windows DNS servers allow to transfer to the IP address of the NAS.  Jump into the Windows Server DNS MMC snap in, highlight the zone you want to setup a zone transfer to, and click properties.  Add or verify that the settings allow a zone transfer to the new slave server

2.  In the Synology DSM, open the Package Center, and install DNS package.

3.  Enable Synology DSM Firewall to allow for DNS traffic.  In the Synology DSM, open the Control Panel > Firewall.  Highlight the interface desired, and click Create.  Choose “Select from a built in list of applications” and choose “DNS Server”  Save the rule, and exit out of the Firewall application.

4.  Open up “DNS Server” from the Synology launch menu.

image

5.  Click on “Zones” and click Create > Slave Zone.  Choose a “Forward Zone” type, and select the desired domain name, and Master DNS server

image

6.  Verify the population of recourse records by selecting the new zone, clicking Edit > Resource Records.

image

7.  If you want, or need to have this forward DNS requests, enable the forwarders checkbox. (In my Home Lab, I enable this.  In my production environment, I do not)

image

8.  Complete the configuration, and test with a client using this IP address only for DNS, simply to verify that it is operating correctly.  Then, go back and tighten up some of the security mechanisms as you see fit.  Once that is completed, jump back into your ESXi hosts (and any other equipment) and configure your secondary DNS to use this server.

image

In my case, I had my Synology NAS to try this out in my home lab, as well as newly deployed unit at work (serving the primary purpose of a Veeam backup target).  In both cases, it has worked exactly as expected, and allowed me to junk an old server at work running BIND.

If the NAS lived on an isolated storage network that wasn’t routable, then this option wouldn’t work, but if you have one living on a routable network somewhere, then it’s a great option.  The arrangement simplifies the number of components in the infrastructure while insuring service availability.

Even if you have multiple internal zones, you may want to have this slave server only handling your primary zone.  No need to make it more complicated than it needs to be.  You also may choose to set up the respective reverse lookup zone as a slave.  Possible, but not necessary for this purpose.

There you have it.  Nothing ground breaking, but a simple way to make a more resilient environment during power-up scenarios.

Helpful Links:

VMWorld 2012.  Virtualizing Active Directory Best Practices (APP-BCA1373).  (Accessible by VMWorld attendees only)
http://www.vmworld.com/community/sessions/2012/

SSDs in the workplace. Trust, but verify

Most of my interests, duties, and responsibilities surround infrastructure.  Virtualization, storage, networking, and the architecture of it all.  Ask me about the latest video card out there, and not only will I probably not know about it, I might not care.  Eventually problems crop up on the desktop though, and the hands have to get dirty.  This happened to me after some issues arose over recently purchased SSD drives.

The Development Team wanted SSDs to see if they could make their high end workstations compile code faster.  Sounded reasonable to me, but the capacity and price point just hadn’t been there until recently.  When it was decided to move forward on the experiment, my shortlist of recommended drives was very short.  I specifically recommended the Crucial M4 line of SSDs.  There are a number of great SSD drives out there, but the M4 has a good reputation, and also sits in my workstation, my laptop, and my NAS in my home lab.  I was quite familiar with the performance numbers they were capable of.

It didn’t take long to learn that that through a series of gaffes, what was ultimately ordered, delivered, and installed on those Developer workstations were not the Crucial M4 SSD drives that have such a good reputation, but the Crucial V4 drives.  The complaints were quite startling.  Code compile times increasing significantly.  In fact, more than doubling over their spinning disk counterparts.  When you have cries to bring back the 7200RPM SATA drives, there must be a problem.   It was time to jump into the fray to see what was up.  The first step was to simply verify that the disks were returning expected results. 

The testing
I used the venerable Iometer for testing, and set it to the following conditions.

  • Single Worker, running for 5 minutes
  • 80000000 Sectors
  • 64 Outstanding I/Os
  • 8KB block size
  • 35% write / 65% read
  • 60% random / 40% sequential
    Each test was run three times, then averaged. Just like putting a car on a Dyno to measure horsepower, the absolute numbers generated was not of tremendous interest to me. Environmental conditions can affect this too much.  I was looking at how these performance numbers related to each other.

For the sake of clarity, I’ve simplified the list of test systems to the following:

  • PC1 = New Dell Precision M6600 laptop.  Includes Crucial M4 SSD and 7200 RPM SATA drive
  • PC2 = Older Dell Precision T3400 workstation.  Includes Crucial V4 SSD and 7200  RPM SATA drive
  • PC3 = New Dell Precision T5500 workstation.  Includes Crucial V4 SSD and 7200 RPM SATA drive
  • VM = A VM in a vSphere 5.0 cluster against an EqualLogic PS6100 array with 7200 RPM SATA drives
    I also tested under different settings (block sizes, etc.), but the results were pretty consistent.  Something was terribly wrong with the Crucial V4 SSDs. Or, they were just something terrible.

The results
Here are the results.

For the first two charts, the higher the number, the better.

image

image

For the next two charts, the lower the number, the better

image

image

You might be thinking this is an unfair test because they are comparing different systems.  This was done to show it wasn’t one system demonstrating the miserable performance results of the V4.  So, just to pacify curiosity, here are some results of the same tests on a system that had the V4, then was swapped out with the M4.

For the blue numbers, the higher the better.  For the red numbers, the lower the better.

image

If one looks on the specs between the M4 and V4, there is nothing too unexpected.  Sure, one is SATA II while the other is SATA III.  But the results speak for themselves.  This was not an issue of bus speed.  The performance of the M4 drives were very good; exactly as expected.  The performance of the V4 drives were terrible – far worse than anything I’ve seen out of an SSD.  This isn’t a one off “bad drive” situation either, as there are a bag full of them that perform the same way.  They’ve been tested in brand new workstations, and workstations a few years old.  Again, the same result across the board for all of them.   Surely the V4 is not the only questionable SSD out there.  I’m sure there are some pretty hideous ones lining store shelves everywhere.  I’m sharing this experience to show the disparity between SSDs so that others can make some smart purchasing decisions.

As for the comparison of code compile times, I’ll be saving that for another post.

Conclusion
It was interesting to see how SSDs were so quickly dismissed internally before any real testing had been performed to verify they were actually working okay. Speculation from the group even included them not being a serious option in the workplace.  This false impression was concerning to me, as I knew how much flash is changing the enterprise storage industry.  Sure, SSDs can be weird at times, but the jury is in; SSDs change the game for the better.  Perhaps the disinterest in testing was simply due to this subject not being their area of focus, or they had other things to think about.  Whatever the case, it was certainly a lesson for me in how quickly results can be misinterpreted. 

So if anything, this says to me a few things about SSDs.

  • Check the specific model number, and make sure that it matches the model you desire. 
  • Stick with makes and models that you know.
  • If you’ve never used a particular brand and model of SSD, test it first.  I’m tempted to say, test it no matter what.
  • Stay far far away from the “value” SSDs out there.  It almost appears like solid state thievery.  I can only imagine the number of folks who have under performing drives like the V4, and wonder what all the fuss about SSDs are.  At least with a bad spinning disk, you could tear it apart and make the worlds strongest refrigerator magnet.  Bad SSDs aren’t even good as a paper weight.

– Pete

Diagnosing a failed iSCSI switch interconnect in a vSphere environment

The beauty of a well constructed, highly redundant environment is that if a single point fails, systems should continue to operate without issue.  Sometimes knowing what exactly failed is more challenging than it first appears.  This was what I ran into recently, and wanted to share what happened, how it was diagnosed, and ultimately corrected.

A group of two EqualLogic arrays were running happily against a pair of stacked Dell PowerConnect 6224 switches, serving up a 7 node vSphere cluster.  The switches were rebuilt over a year ago, and since that time they have been rock solid.  Suddenly, the arrays started spitting out all kinds of different errors.  Many of the messages looked similar to these:

iSCSI login to target ‘10.10.0.65:3260, iqn.2001-05.com.equallogic:0-8a0906-b6cc21609-d200014832f4ecfb-vmfs001’ from initiator ‘10.10.0.10:52155, iqn.1998-01.com.vmware:esx1-70a98577’ failed for the following reason:
Initiator disconnected from target during login.

Some of the earliest errors on the array looked like this:

10/1/2012 1:01:11 AM to 10/1/2012 1:01:11 AM
Warning: Member PS6000e network port cannot be reached. Unable to obtain network performance data for the member.
Warning: Member PS6100e network port cannot be reached. Unable to obtain network performance data for the member.
10/1/2012 1:01:11 AM to 10/1/2012 1:01:11 AM
Caution: Some SNMP requests to member PS6100e for disk drive information timed out.
Caution: Some SNMP requests for information about member PS6100e disk drives timed out.

VMs that had guest attached volumes were generating errors similar to this:

Subject: ASMME smartcopy from SVR001: MPIO Reconfiguration Request IPC Error – iqn.2001-05.com.equallogic:0-8a0906-bd5d27503-7ef000ed5d54a8c1-ntfs001 on host SVR001

[01:01:11] MPIO failure during reconfiguration request for target iqn.2001-05.com.equallogic:0-8a0906-476f6bd06-0c500008a0c4c41f-ntfs002 with error status 0x16000000.

[01:01:11] MPIO failure during reconfiguration request for target iqn.2001-05.com.equallogic:0-8a0906-dc0da1609-2fe0014145f4e931-ntfs001 with error status 0x80070006.

Before I had a chance to look at anything, I suspected something was wrong with the SAN switch stack, but was uncertain beyond that.  I jumped into vCenter to see if anything obvious showed up.  But vSphere and all of the VMs were motoring along just like normal.  No failed uplink errors, or anything else noticeable.  I didn’t do much vSphere log fishing at this point because all things were pointing to something on the storage side, and I had a number of tools that could narrow down the problem.  With all things related to storage traffic, I wanted to be extra cautious and prevent making matters worse with reckless attempts to resolve.

First, some background on how EqualLogic arrays work.  All arrays have two controllers, working in an active/passive arrangement.  Depending on the model of array, each controller will have between two and four ethernet ports per controller, with each port having an IP address assigned to it.  Additionally, there will be a single IP address to define the “group” the member array is a part of.  (The Group IP is single IP used by systems looking for an iSCSI target, to let the intelligence of the arrays figure out how to distribute traffic across interfaces.)  If some of the interfaces can’t be contacted (e.g. disconnected cable, switch failure, etc.), the EqualLogic arrays will be smart enough to distribute across the active links.

The ports of each EqualLogic array are connected to the stacked SAN switches in a meshed arrangement for redundancy.  If there ware a switch failure, then one wouldn’t be able to contact the IP addresses of the ethernet ports connected to one of the switches.  But using a VM with guest attached volumes (which have direct access to the SAN), I could successfully ping all four interfaces (eth0 through eth3) on each array.  Hmm…

So then I decided to SSH into the array and see if I could perform the same test.  The idea would be to test from one IP on one of the arrays to see if a ping would be successful on eth0 through eth3 on the other array.  The key to doing this is to use an IP of one of the individual interfaces as the source, and not the Group IP.  Controlling the source and the target during this test will tell you a lot.  After connecting to the array via SSH, the syntax for testing the interfaces on the target array would be this:

ping –I “[sourceIP] [destinationIP]”  (quotes are needed!)

From one of the arrays, pinging all four interfaces on the second array revealed that only two of the four ports succeeded.  But the earlier test from the VM proved that I could ping all interfaces, so I chose to change the source IP as one of the interfaces living on the other switch.  Performed the same test, and the opposite results occurred.  The ports that failed on the last test passed on this test, and the ports that passed the last test, failed on this time.  This seemed to indicate that both switches were up, but the communication between switches were down. 

While I’ve never seen these errors on switches using stacking modules, I have seen the MPIO errors above on a trunked arrangement.  One might run into these issues more with trunking, as it tends to leave more opportunity for issues caused by configuration errors.  I knew that in this case, the switch configurations had not been touched for quite some time.  The status of the switches via the serial console stated the following:

SANSTACK>show switch
Management Standby Preconfig Plugged-in Switch Code
SW Status Status Model ID Model ID Status Version
1 Mgmt Sw PCT6224 PCT6224 OK 3.2.1.3
2 Unassigned PCT6224 Not Present 0.0.0.0

The result above wasn’t totally surprising, in that if the stacking module was down, the master switch wouldn’t be able to be able to gather the information from the other switch.

Dell also has an interesting little tool call “Lasso.”  The Dell Lasso Tool will help grab general diagnostics data from a variety of sources (servers, switches, storage arrays).  But in this case, I found it convenient to test connectivity from the array group itself.  The screen capture below seems to confirm what I learned through the testing above.

image

So the next step was trying to figure out what to do about it.  I wanted to reboot/reload the slave switch, but knowing both switches were potentially passing live data, I didn’t want to do anything to compromise the traffic.  So I employed an often overlooked, but convenient way of manipulating traffic to the arrays; turning off the interfaces on the array that are connected to the SAN switch that needs to be restarted.  If one turns off the interfaces on each array connected to the switch that needs the maintenance, then there will not be any live data passing through that switch.  Be warned that you better have a nice, accurate wiring schematic of your infrastructure so that you know which interfaces can be disabled.  You want to make things better, not worse.

After a restart of the second switch, the interconnect reestablished itself.  The interfaces on the arrays were re-enabled, with all errors disappearing.  I’m not entirely sure why the interconnect went down, but the primary objective was diagnosing and correcting in a safe, deliberate, yet speedy way.  No VMs were down, and the only side effect of the issue was the errors generated, and some degraded performance.  Hopefully this will help you in case you see similar symptoms in your environment.

Helpful Links

Dell Lasso Tool
http://www.dell.com/support/drivers/us/en/555/DriverDetails?driverId=4T3Y6&c=us&l=en&s=biz

Reworking my PowerConnect 6200 switches for my iSCSI SAN
https://vmpete.com/2011/06/26/reworking-my-powerconnect-6200-switches-for-my-iscsi-san/

Dell TechCenter.  A great resource all things related to Dell in the Enterprise.
http://en.community.dell.com/techcenter/b/techcenter/default.aspx

Multipathing in vSphere with the Dell EqualLogic Multipathing Extension Module (MEM)

There is a lot to be said about the Dell EqualLogic Multipathing Extension Module (MEM) for vSphere.  One is that it is an impressive component that without a doubt will improve the performance of your vSphere environment.  The other is that it often not installed by organizations large and small.  This probably stems from a few reasons.

  • The typical user is uncertain of the value it brings.
  • vSphere Administrators might be under the assumption that VMware’s Round Robin will perform the same thing.
  • The assumption that if they don’t have vSphere Enterprise Plus licensing, they can’t use MEM.
  • It’s command line only

What a shame, because the MEM gives you optimal performance of vSphere against your EqualLogic array, and is frankly easier to configure.  Let me clarify, easier to configure correctly.  iSCSI will work seemingly okay with not much effort.  But that lack of effort initially can catch up with you later; resulting in no multipathing, poor performance, and possibly prone to error.

There are a number of good articles that outline the advantages of using the MEM.  There is no need for me to repeat, so I’ll just stand on the shoulder’s of their posts, and provide the links at the end of my rambling.

The tool can be configured in a number of different ways to accommodate all types of scenarios; all of which is well documented in the Deployment Guide.  The flexibility  in deployment options might be why it seems a little intimidating to some users. 

I’m going to show you how to set up the MEM in a very simple, typical fashion.  Let’s get started.  We’ll be working under the following assumptions:

  • vSphere 5 and the EqualLogic MEM  *
  • An ESXi host Management IP of 192.168.199.11
  • Host account of root with a password of mypassword
  • a standard vSwitch for iSCSI traffic will be used with two physical uplinks (vmnic4 & vmnic5)   The vSwitch created will be a standard vSwitch, but it can easily be a Distributed vSwitch as well.
  • Three IP addresses for each host; two for iSCSI vmkernels (192.168.198.11 & 192.168.198.21), and one for Storage Heartbeat. (192.168.198.31)
  • Jumbo frames (9000 bytes) have been configured on your SAN switchgear, and will be used on your ESXi hosts.
  • A desire to accommodate VMs that used guest attached volumes.
  • EqualLogic Group IP address of:  192.168.198.65
  • Storage network range of 192.168.198.0 /24

When applying to your environment, just tailor the settings to reflect your environment.

Download and preparation

1.  Download the MEM from the Dell EqualLogic customer web portal to a workstation that has the vSphere CLI installed

2.  Extract the MEM so that it resides in a C:\MEM directory. You should see a setup.pl file in the root of C:\MEM, along with a dell-eql-mem-esx5-[version].zip Keep this zip file, as it will be needed during the installation process.

Update ESXi host

1.  Put ESXi host in Maintenance Mode

2.  Delete any previous vSwitch that goes into the pNICs for iSCSI. Will also need to remove any previous port bindings.

3.  Initiate script against first host:

setup.pl –configure –server=192.168.199.11 –username=root –password=mypassword

This will walk you through a series of variables you need to enter.  It’s all pretty straightforward, but I’ve found the more practical way is to include it all as one string.  This minimizes mistakes, improves documentation, and allows you to just cut and paste into the vSphere CLI.  The complete string would look like this.

setup.pl –configure –server=192.168.199.11 –vswitch=vSwitchISCSI –mtu=9000 –nics=vmnic4,vmnic5 –ips=192.168.198.11,192.168.198.21 –heartbeat=192.168.198.31 –netmask=255.255.255.0 –vmkernel=iSCSI –nohwiscsi –groupip=192.168.198.65

It will prompt for a user name and password before it runs through the installation.  Near the end of the installation, it will return:

No Dell EqualLogic Multipathing Extension Module found. Continue your setup by installing the module with the ‘esxcli software vib install’ command or through vCenter Update Manager

This is because the MEM VIB has not been installed yet.  MEM will work but only using the default pathing policies.  The MEM VIB can be installed by typing in the following:

setup.pl –install –server=192.168.199.11 –username=root –password=mypassword

If you look in vCenter, you’ll now see the vSwitch and vmkernel ports created and configured properly, with the port bindings configured correctly as well.  You can verify it with the following

setup.pl –query –server=192.168.199.11 –username=root –password=mypassword

But you aren’t quite done yet.  If you are using guest attached volumes, you will need to create Port Groups on that same vSwitch so that the guest volumes can connect to the array.  To do it properly in which the two vNICs inside the guest OS can multipath to the volume properly, you will need to create two Port Groups.  When complete, your vSwitch may look something like this:

image

Take a look at the VMkernel ports created by MEM, you will see the NIC Teaming Switch Failover Order has been set so that one vmnic is set to “Active” while the other is set to “Unused”  The other VMkernel port has the same settings, but with the vmnics reversed in their “Active” and “Unused” state.The Port Groups you create for VMs using Guest attached volumes will take a similar approach.  Each Port Group will have one “Active” and one “Standby” adapter (“Standby” not “unused” like the VMkernel).  Each Port Group has the vmnics reversed.  When configuring a VM’s NICs for guest attached volume access, you will want to assign one vmnic to one Port Group, while the other is assigned to the other Port Group.  Confused?  Great.  Take a look at Will Urban’s post on how to configure Port Groups for guest attached volumes correctly. 

Adjusting your existing environment.
If you need to rework your existing setup, simply put each host into Maintenance Mode one at a time and perform the steps above with your appropriate information.Next, take a look at your existing Datastores, and if they are using one of the built in Path Selection Policy methods (“Fixed” “Round Robin” etc.), change them over to “DELL_PSP_EQL_ROUTED”If you have VMs that leverage guest attached volumes off of a single teamed Port Group, you may wish to temporarily create this Port Group under the exact same name so the existing VMs have don’t get confused.  Remove this temporary Port Group once you’ve had the opportunity to change the VM’s properties.So there you have it.  A simple example of how to install and deploy Dell’s MEM for vSphere 5.  Don’t leave performance and ease of management on the shelf.  Get MEM installed and running in your vSphere environment.

UPDATE (9/25/2012)
The instructions provided was under the assumption that vSphere 5 was being used.  Under vSphere 5.1 and the latest version of MEM, the storage heartbeat is no longer needed.  I have modified the post to accommodate, including the link below that references the latest Dell EqualLogic MEM documentation.  I’d like to thank the Dell EqualLogic Engineering team for pointing out this important distinction.

Helpful Links

A great summary on the EqualLogic MEM and vStorage APIs
http://whiteboardninja.wordpress.com/2011/02/01/equallogic-mem-and-vstorage-apis/

Comac Hogan’s great post on how the MEM works it’s magic.
http://blogs.vmware.com/vsphere/2011/11/dells-multipath-extension-module-for-equallogic-now-supports-vsphere-50.html

Some performance testing of the MEM
http://www.spoonapedia.com/2010/07/dell-equallogic-multipathing-extension.html

Official Documentation for the EqualLogic MEM (Rev 1.2, which covers vSphere 5.1)
http://www.equallogic.com/WorkArea/DownloadAsset.aspx?id=11000

VDI for me. Part 5 – The wrap up

 

Now that VMware View is up and running, you might be curious to know how it is working.  Well, you’re in luck, because this post is about how View worked, and what was learned from this pilot project.  But first, here is a quick recap of what has been covered so far.

VDI for me. Part 1 – Why the interest 
VDI for me. Part 2 – The Plan
VDI for me. Part 3 – Firewall considerations
VDI for me. Part 4 – Connection Servers and tuning

I was given the opportunity to try VMware View for a few different reasons (found here).  I wasn’t entirely sure what to expect, but was determined to get a good feel for what VDI in 2012 could do.  Hopefully this series has helped you gain an understanding as well. 

The user experience
Once things were operational, the ease and ubiquity of access to the systems was impressive.  One of our most frequent users often stated that he simply forgot where the work was actually being performed.  Comments like that are a good indicator of success.  From a remote interaction standpoint, the improvements most often showed up where it was really needed; remote display over highly latent connections, with convenience of access.  Being able to access a remote system from behind one corporate network to another was as productive as it was cool. 

It was interesting to observe how some interpreted the technology.  Some embraced it for what it was (an appliance to be more productive), while others chose to be more suspicious.  You may have users who complain about their existing computers, but are apprehensive at the notion of it being taken away for something that isn’t tangible.  Don’t underestimate this emotional connection between user and computer.  It’s a weird, but very real aspect of a deployment like this. 

Virtualization Administrators know that good performance is often a result of a collection of components (storage, network, CPU, hypervisor) working well together through a good design.  Those of us who have virtualized our infrastructures are accustomed to this.  Users are not.  As VMs become more exposed to the end users (whether they be for VDI, or other user-facing needs), your technical users may become overly curious by what’s “under the hood” with their VM.  This can be a problem.  Comparisons between their physical machine and the VM are inevitable, and they may interpret a VM with half the processors and RAM as their physical machine to provide only half of the experience.  You might even be able to demonstrate that the VM is indeed better performing in many ways, yet the response might be that they still don’t have enough RAM, CPU, etc.  The end user knows nothing about hypervisors or IOPS, but they will pay attention to some of the common specifications general consumers of technology have been taught to care about; RAM and CPUs.

So in other words, there will be aspects of a project like this that have everything to do with virtualization, yet nothing to do with virtualization.  It can be as much of a people issue as it is a technical issue.

Other Impressions
The PCoIP protocol is very nice, and really shines in certain situations. I love the fact that it is a tunable, non-connection oriented protocol that leaves all of the rendering up to the host. It just makes so much sense for remote displays. But it can have characteristics that make it feel different to the end user. The old “window shake” test might redraw itself slightly different than in a native display, or using something like RDP. This is something that the user may or may not notice.

The pilot program included the trial of a PCoIP based Zero Client. The Wyse P20 didn’t disappoint. Whether it was connecting to a VM brokered by View, or a physical workstation with a PCoIP host card brokered by View, the experience was clean and easy. Hook up a couple of monitors to it, and turn it on. It finds the connection server, so all you need to do is enter your credentials, and you are in. The zero client was limited to just PCoIP, so if you need flexibility in that department, perhaps a thin client might be more appropriate for you. I wanted to see what no hassle was really like.

As far as feedback, the top three questions I usually received from users went something like this:

“Does it run on Linux?”

“How come it doesn’t run on Linux?”

“When is it going to run on Linux?”

And they weren’t just talking about the View Client (which as of this date will run on Ubuntu 11.04), but more importantly, the View Agent.  There are entire infrastructures out there that use frameworks and solutions that run on nothing but Linux.  This is true especially in arenas like Software Development, CAE and Scientific communities.  Even many of VMware’s offerings are built off of frameworks that have little to do with Windows.  The impression that the supported platforms of View gave to our end users was that VMware’s family of solutions were just Windows based.  Most of you reading this know that simply isn’t true.  I hope VMware takes a look at getting View agents and clients out for Linux.

Serving up physical systems using View as the connection broker is an interesting tangent to the whole VDI experience.  But of course, this is a one user to one workstation arrangement – its just that the workstation isn’t stuffed under a desk somewhere.  I suspect that VMware and its competitors are going to have to tackle the problem of how to harness GPU power through the hypervisor so that all but the most demanding of high end systems can be virtualized.  Will it happen with specialized video cards likely to come from the VMware/NVIDIA partnership announced in October of 2011?  Will it happen with some sort of SR-IOV?  The need for GPU performance is there.  How it will be a achieved, I don’t know.  In the short term, if you need big time GPU power, a physical workstation with a PCoIP host card will work fine.

The performance and wow factor of running a View VM on a tablet is high as well.  If you want to impress anyone, just show this setup on a tablet.  Two or three taps on the tablet and you are in.  But we all know that User Experience (UX) designs for desktop applications were meant for a large screen, mouse, and a keyboard.  It will be interesting to see how the evolution of these technologies continue, so that UX can hit mobile devices in a more appropriate way.  Variations of application virtualization is perhaps the next step.  Again, another exciting unknown.

Also a worthwhile note is competition, not only in classically defined VDI solutions, but access to systems.  A compelling aspect of using View is that it pairs a solution for remote display, and brokering secure remote access into one package.  But other competing solutions do not necessarily have to take that approach.  Microsoft’s “Direct Access” allows for secure RDP sessions to occur without a traditional VPN.  I have not had an opportunity yet to try their Unified Access Gateway (UAG) solution, but it gets rave reviews from those who implement it, and use it.  Remote Desktop Session Host (RDSH) in Windows Server 8 promises big things (if you only use Windows of course).

Among the other challenges is how to implement such technologies in a way that is cost effective.  Up front costs associated with going beyond a pilot phase might be a bit tough to swallow, as technical challenges such as storage I/O deserve attention.  I suspect with the new wave of SSD and SSD hybrid SAN arrays out there, that it might make the technical and financial challenges more palatable.  I wish that I had the opportunity to demonstrate how well these systems would work on an SSD or hybrid array, but the word “pilot” typically means “keep the costs down.”  So no SSD array until we move forward with a larger deployment.

There seems to be a rush by many to take a position on whether VDI is the wave of the future, or a bust that will never happen.  I don’t think its necessary to think that way.  It is what it is; a technology that might benefit you or the business you work for, or it might not.  What I do know is that it is rewarding and fun to plan and deploy innovative solutions that help end users, while addressing classic challenges within IT.  This was one of those projects.

Those who have done these types of implementations will tell you that successful VDI implementations always pay acute attention to the infrastructure, especially storage.  (Reading about failed implementations seems to confirm this).  I believe it.  I was almost happy that my licensing forced me to keep this deployment small, as I could focus on the product rather than some of the implications with storage I/O that would inevitably come up with a larger deployment.  Economies of scale makes VDI intriguing in deployment and operation.  However, it appears to be that scaling is the tricky part. 

What might also need a timely update is Windows licensing.  There is usually little time left in the day to understand the nuances of EULAs in general – especially Windows licensing.  VDI adds an extra twist to this.  A few links at the end of this post will help explain why.

None of these items above discount the power and potential of VDI.  While my deployments were very small, I did get a taste of its ability to consolidate corporate assets back to the data center.  The idea of provisioning, maintaining, and protecting end user systems seems possible again, and in certain environments could have a profound improvement.  It is easy to envision smaller branch office greatly reducing, or eliminating servers at their location.   AD designs simplify.  Assets simplify, as does access control – all with providing a more flexible work environment.  Not a bad combination.

Thanks for reading.

Helpful Links:
Two links on Windows 7 SPLA and VDI
http://www.brianmadden.com/blogs/brianmadden/archive/2011/03/02/why-microsoft-hates-vdi.aspx
http://www.brianmadden.com/blogs/gabeknuth/archive/2012/03/09/gasp-turns-out-onlive-really-isn-t-in-compliance-with-microsoft-licensing.aspx

RDSH in Windows Server 2008
http://searchvirtualdesktop.techtarget.com/tip/RDSH-and-RemoteFX-in-Windows-Server-8-to-improve-VDI-user-experience

VDI has little to do with the Desktop
http://whiteboardninja.wordpress.com/2011/01/24/planning-for-vdi-has-little-to-do-with-the-desktop/

Scott Lowe’s interesting post on SR-IOV
http://blog.scottlowe.org/2012/03/19/why-sr-iov-on-vsphere/

Improving density of VMs per host with Teradici’s PCoIP Offload card for VMware View
http://www.teradici.com/pcoip/pcoip-products/teradici-apex-2800.php

VDI for me. Part 4

 

Picking up where we left off in VDI for me. Part 3, we are at a point in which the components of View can be installed and configured.  As much as I’d like to walk you through each step, and offer explanations at each point, sticking to abbreviated steps is a better way to help you understand how the pieces of the puzzle fit together.  Besides, others have great posts on installing and configuring the View Connection servers, not to mention the VMware documentation, which is quite good.  The links at the end of the post will give you a good start.  My focus will be to hit on the main areas to configure to get View up and running.

Here is the relationship between the Connection Servers, the clients, and the systems running the agents in my environment.  The overall topology for my View environment can be found in VDI for me. Part 2.

For clients accessing View from the Internal LAN

image

For clients accessing View from offsite locations

image

Overview of steps
This is the order I used for deploying the View Components.  To simplify, you may wish to skip steps 3 and 4 until you get everything working on the inside of your network. 

  1. Install Composer on your vCenter Server.
  2. Build a VM and install View Connection Server intended for local LAN access only.
  3. Build a VM and install View Connection Server intended to talk with Security Server.
  4. Build a VM and install View Security Server in your DMZ.
  5. Install View Agent on a VM.
  6. Create a Pool for the VM, and entitle the pool to an user or group in AD.
  7. Connect to the VM using the View Client.

Configuring your first Connection Server (For Internal Access)
From the point that your first connection manager is installed, you may begin the configuration.

  1. Browse out to VMware View Administrator portal on your connection server (https://[yourconnectionserver]/admin) and enter the appropriate credentials.
  2. Drill down into View Configuration > Product Licensing and Usage > Edit License to add your license information.
  3. Register your vCenter Server by going to View Configuration > Servers > Add.  Fill out all of the details, but do not click “Enable View Composer” quite yet.  Click OK to exit.
  4. Go back into Edit the vCenter server configuration, and click “Enable View Composer and Click OK to exit.
  5. In the area where the listing of View Connection servers are listed, select the only View Connection Server on the list, and click “Edit”.  You will want to make sure both check boxes are unchecked, and use internal FQDN and IP addresses only.

image

Configuring your Second Connection Server (to be paired with Security Server)
During the installation of View on the second server, it will ask what type of Connection Server it will be.  Choose “Replica” from the list, and type in the name of your first Connection Server.

  1. Browse out to the View Administrator Portal, and you will now see a second connection server listed.  Highlight it, and click on Edit.
  2. Unlike the first connection server, this connection server needs to have both checkboxes checked.

image

Configuring your Security Server (to be paired with your second Connection Server)
Just a few easy steps will take care of your Security Server.

  1. Browse out to the View Administrator portal, highlight the Connection Server you want to pair with the security server, and click More Commands > Specify Security Server Pairing Password.
  2. Install the Connection Server install bits onto your new Security Server.  Choose “Security Server” for the type of Connection Server it will be.  It will then prompt you to enter the internal Connection Server to pair it to.  This is the internal FQDN of the server Connection Server.
  3. Enter the View pairing password established in step 1.  This will make the Security Server show up in the View Administrator Portal.
  4. Go back to the View Administrator portal, highlight the server that is listed under the Security Server, and click Edit.  This is where you will enter in the FQDN desired.  The PCoIP address should be the publicly registered IP address.  In my case, it is the address bound to the external interface of my firewall, but your topology might dictate otherwise.

image

 

After it is all done, in the View Administrator portal, you should see one entry for a vCenter server, two entries for the View Connection servers, and one entry for a Security Server.

image

From this point, it is just a matter of installing the View Agent on the VMs (or physical workstation with a PCoIP host card) you’d like to expose, create a pool, entitle a user or group, and you should be ready to connect.

Tuning
After you add the VMware View adm templates to Active Directory, a number of tunable settings will be available to you.  The good news in the tuning department is that while PCoIP is highly tunable, I don’t feel it has to be the first thing you need to address after the initial deployment.  With View 5, it works quite well out of the box.  I will defer to this post http://myvirtualcloud.net/?p=2061 on some common, View specific GPO settings you might want to adjust, especially in a WAN environment.  The two settings that will probably make the biggest impact are the “Maximum Frame Rate” settings, and the “Build to Lossless” toggle.  I applied these and a few others in order to accommodate our Development Team working on another continent deal with their 280+ms latency. 

The tools available to monitor, test, and debug PCoIP are improving almost by the day, and will be an interesting area to watch.  Take a look at the links for the PCoIP Log Viewer and the PCoIP Configuration utilities at the end of this post.

Tips and Observations
When running View, there is a noticeable increase on the dependence of vCenter, and the databases that support it and View Composer.  This is especially the case in smaller environments where the server running vCenter might be housing the vCenter database, and the database for View Composer.  Chris Wahl’s recent post Protecting the vCenter Database with SQL Log Shipping addresses this, and provides a good way to protect the vCenter databases through log shipping.  If you are a Dell EqualLogic user, it may be helpful to move your SQL DB and Log volumes off to guest attached volumes, and use their ASM/ME application to easily make snaps and replicas of the database.  Regardless of the adjustments that you choose to make, factor this in to your design criteria, especially if the desktops served up by View become critical to your business.

If your connection to a View VM terminates prematurely, don’t worry.  It seems to be a common occurrence during initial deployment that can happen for a number of reasons.  There are a lot of KB articles on how to diagnose them.  One that I ran across that wasn’t documented very much was that the VM may not have enough memory assigned to the video RAM.  The result can be that it works fine using RDP, but disconnects when using PCoIP.  I’ve had some VMs mysteriously reduce themselves back down to a default number that won’t support large or multiple screen resolutions.  Take a quick look the settings of your VM.  Once those initial issues have been resolved, I’ve found everything to work as expected.

In my mad rush to build out the second View environment at our CoLo, everything worked perfectly, except when it came to the View client validating the secured connection. All indicators pointed to SSL, and perhaps how the exported SSL certificate was applied to the VM running the Security Server. I checked, and rechecked everything, burning up a fair amount of time. It turned out it was a silly mistake (aren’t they all?). In C:\Program Files\VMware\VMware View\Server\sslgateway\conf there needs to be a file called locked.properties. This contains information on the exported certificate. Well, when I created the locked.properties file, Windows was nice enough to append the .txt to it (e.g. locked.properties.txt). The default settings in Windows left that extension hidden, so it didn’t show. By the way, I’ve always hated that default setting for hiding file extensions. It is controlled via GPO at my primary site, but didn’t have that set at the CoLo site.

Next up, I’ll be wrapping up this series with the final impressions of the project.  What worked well, what didn’t.  Perceptions from the users, and from those writing the checks.  Stay tuned.

Helpful Links
VMware View Documentation Portal.  A lot of good information here.
http://www.vmware.com/support/pubs/view_pubs.html

A couple of nice YouTube videos showing a step by step installation of View Composer
http://www.youtube.com/watch?v=JSFnkLW1ve4&feature=youtube_gdata_player
http://www.youtube.com/watch?v=SDc21h0uTkA&feature=youtube_gdata_player

How to apply View specific settings for systems via GPO (written for 4.5, but also applies to 5.0)
http://blog.vhowto.info/2010/11/25/vmware-view-4-5-active-directory-group-policies/

PCoIP disconnect codes
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2012101

PCoIP Log Viewer
http://mindfluxinc.net/?p=195

PCoIP Configuration utility (beta)
http://mindfluxinc.net/?p=338

More PCoIP tuning suggestions
http://mindfluxinc.net/?p=338

VDI for me. Part 3

 

In VDI for me. Part 2, I left off with how VMware View was going to be constructed in my environment.  We are almost at the point of installing and configuring the VMware View components, but before that is addressed, the most prudent step is to ensure that the right type of traffic can communicate across the different isolated network segments.  This post is simply going to focus on the security rules to do such a thing.  For me, access to these segments are managed by a Celestix MSA 5200i, 6 port Firewall running Microsoft ForeFront Threat Management Gateway (TMG) 2010.  While the screen captures are directly from TMG, much of the information here would apply to other security solutions.

Since all of the supporting components of VMware View will need to communicate across network segments anyway, I suggest making accommodations in your firewall before you start building the View components.  Sometimes this is not always practical, but in this case, I found that I only had to make a few adjustments before things were working perfectly with all of the components.

My network design was a fairly straightforward, 4 legged topology. (a pretty picture of this can be seen in Part 2)

Network Leg Contains
External All users connecting to our View environment.
LAN View connection server dedicated for access from the inside. 
View connection server dedicated for communication with the Security Server. 
Systems running the View agent software.
DMZ1 Externally facing View “Security Server”
DMZ4 vSphere Management Network.  vCenter, and SQL databases providing services for vCenter, and View Composer.

For those who have their vSphere Management network on a separate network by way of a simple VLAN, your rules will be simpler than mine.  For clarity, I will just show the rules that are used for getting VMware View to work.

Before you get started, make sure you have planned out all of the system names and IP addresses of the various Connection Servers, VM’s running the View Agent.  It will make the work later on easier. 

Creating Custom Protocols for VMware View in TMG 2010
In order to build the rules properly, you will first need to define some “User-Defined” protocols.  For the sake of keeping track of all of the user defined protocols, I always included the name “View” (to remember it’s purpose), the direction, type, and the port number.  Here was the list (as I named them) that was used as a part of my rule sets.

VMware View Inbound TCP&UDP (4172)
VMware View Outbound (32111)
VMware View Outbound (4001)
VMware View Outbound (8009)
VMware View Outbound (9427)
VMware ViewComposer Inbound (18443)
VMware ViewComposer Outbound (18443)
VMware ViewPCoIP Outbound (4172)
VMware ViewPCoIP SendReceiveUDP (4172)

Page 19 of the VMware View Security Reference will detail the ports and access needed.  I appreciate the detail, and it is all technically correct, but it can be a little confusing.  Hopefully, what I provide will help bridge the gap on anything confusing in the manual.  My implementation at this time does not include a View Transfer Server, so if your deployment includes this, please refer to the installation guide.

Creating Access Rules for VMware View in TMG 2010
The next step will be to build some access rules.  Access rules are typically defining access in a From/To arrangement.  Here are what my rules looked like for a successful implementation of VMware View in TMG 2010.

image

Creating Publishing rules for VMware View in TMG 2010
In the screen above, near the bottom, you see two Publishing rules.  These are for the purposes of securely exposing a server that you want visible to the outside world.  In this case, that would be the View Security Server.  The server will still have its private address as it resides in the DMZ, but would take on one of the assigned public IP addresses bound to the external interface of the TMG appliance.  To make View work, you will need two publishing rules.  One for HTTPS, and the other for PCoIP.  A View session with the display setting of RDP will use only the HTTPS publisher.  A View session with the display setting of PCoIP will use both of the publishing rules.  Page 65 of the View 5 Architecture Planning Guide illustrates this pretty well.

In the PCoIP publishing rule, notice how you need both TCP and UDP, and of course, the correct direction.

image

My friend Richard Hicks had some great information on his ForeFront TMG blog that was pertinent to this project. ForeFront TMG 2010 Protocol Direction Explained is a good reminder of what you will need to know when defining custom protocols, and the rule sets that use them.  The other was the nuances of using RDP with the “Web Site Publishing Rule” generator.  Let me explain.

TMG has a “Web Site Publishing Rule” generator that allows for a convenient way of exposing HTTP and HTTPS related traffic to the intended target. This publisher’s role is to protect by inspection. It terminates the session, decrypts, inspects, then repackages for delivery onto its destination. This is great for many protocols inside of SSL such as HTTP, but protocols like RDP inside SSL do not like it. This is what I was running into during deployment. View connections using PCoIP worked fine. View connections using RDP did not. Rich was able to help me better understand what the problem was, and how to work around it. The fix was simply to create a “Non-Web Server Protocol Publishing Rule” instead, choosing HTTPS as the protocol type.  For all of you TMG users out there, this is the reason why I haven’t described how to create a “Web Listener” to be used with a traditional “Web Site Publishing Rule.”  There is no need for one.

A few tips in with implementing you’re your new firewall rules.  Again, most of these apply to any Firewall you choose to use.

1.  Even if you have the intent of granular lockdown (as you should), it may be easiest to initially define the rule sets a little broader.  Use things like entire network segments instead of individually assigned machine objects  You can tighten the screws down later (remember to do so), and it is easier to diagnose issues.

2.  Watch those firewall logs.  Its easy to mess something up along the way, and your real time firewall logs will be your best friend.  But be careful not to get too fancy with the filtering.  You may be missing some denied traffic that doesn’t necessarily match up with your filter.

3.  You will probably need to create custom protocols.  Name them in such a way that they are clear that they are an incoming or outgoing protocol, and perhaps whether they are TCP, or UDP.  Otherwise, it can get a little confusing when it comes to direction of traffic.  Rule sets have a direction, as do the protocols that are contained in them.

4.  Stay disciplined to rule set taxonomy.  You will need to understand what the rule is trying to do.  Consistency is key.  You may find it more helpful to name the computer objects  the role that they are playing, rather than their actual server name.  It helps with understanding the flow of the rules.

5.  Add some identifier to your rules defined for View.  That way, when you are troubleshooting, you can enter “View” in the search function, and it quickly shows you only the rule sets you need to deal with.

6.  Stick the the best practices when it comes to placement of the View rules into your overall rule sets.  TMG processes the rules by order, so there is some methods to make the processing most efficient.  They remain unchanged from it’s predecessor, ISA 2006.  Here is a good article on better understanding the order.

7.  TMG 2010 has a nice feature of grouping rules.  This gives the ability of a set of contiguous rules to be seen as one logical unit.  You might find this helpful in most of your View based rule creation.  I would probably recommend having your access rules for View in a different group than your publishing rules.  This is so that you can maintain best practices on placement/priority of rule types.

8.  When you get to the point diagnosing what appear to be connection problems between clients, agents, and connection servers, give VMware a call.  They have a few tools that will help in your efforts.  Unfortunately, I can’t provide any more information about the tools at this time, but I can say that for the purposes of diagnosing connectivity issues, they are really nice.

I also stumbled upon an interesting (and apparently little known) issue when you have a system with multiple NICs that is also running the View agent.  For me, this issue arose on the high powered physical workstation with PCoIP host card, using View as the connection broker.  This system had two additional NICs that connected to the iSCSI SAN.  The PCoIP based connections worked, but RDP sessions through View failed, even when standard RDP worked without issue.  Shut off the other NICs, and everything worked fine.  VMware KB article 1026498 addresses this.  The fix is simply adding a registry entry

On the host with the PCoIP card, open regedit and add the following entry:
HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VDM\Node Manager

Add the REG_SZ value:
172.16.128.0/24

If you experience issues connecting to one system running the View Agent, but not the other, a common practice is to remove and reinstall the View Agent.  Any time the VMware Tools on the VM are updated, you will also need to reinstall the agent. 

More experimentation and feedback
As promised in part 1 of this series, I wanted to keep you posted of feedback that I was getting from end users, and observations I had along the way. 

The group of users allowed to connect continued to be impressed to the point that using it was a part of their workday.  I found myself not being able to experiment quite the way I had planned, because users were depending on the service almost immediately.  So much for that idea of it being a pilot project.

The experimentation with serving up a physical system with PCoIP using VMware as a connection broker has continued to be an interesting one. There are pretty significant market segments that demands high powered GPU processing. CAD/CAE, visualization, animation, graphic design, etc have all historically relied on client side GPUs. So it is a provocative thought to be able to serve up high powered graphics workstations without it sitting under a desk somewhere.  The elegance of this arrangement is that once a physical system has a PCoIP host card in it, and the View Agent installed, it is really no different than the lower powered VM’s served up by the cluster.  Access is the same, and so is the experience.  Just a heck of a lot more power.  Since it is all host based rendering, you can make the remote system as powerful as your budget allows.  Get ready for anyone who accesses a high powered workstation like this to be spoiled easily.  Before you know it, they will ask if they can have 48GB of RAM on their VM as well. 

Running View from any client (Windows Workstation, Mac, Tablets, Ubuntu, and a Wyse P20 zero client) proved to give basically the same experience.  It was easy for the end users to connect.  Since I have a dual name space (“view.mycompany.com” from the outside, and “view.local.lan” from the inside), the biggest confusion has been for laptop users remembering which address to use.  That, and reminding them to not use the VPN to connect.  A few firewall rules blocking access will help guide them. 

One of my late experiments came after I met all of my other acceptance criteria for the project.  I wanted to see how VMware View worked with linked clones.  Setting up linked clones was pretty easy.  However, I didn’t realize until late in the project that a linked clone arrangement of View really requires you to run a Microsoft KMS licensing server.  Otherwise, your trusty MAK license keys might be fully depleted in no time.  There is a VMware KB Article describing a possible workaround, but it also warns you of the risks.  Accommodating for KMS licensing is not a difficult matter to address (except for extremely small organizations who don’t qualify for KMS licensing), but it was something I didn’t anticipate.

I had the chance to do this entire design and implementation not once, but twice.  No, it wasn’t because everything blew up and I had no backups.  My intention was to build a pilot out at my Primary facility first, then build the same arrangement (as much as possible) at the CoLocation facility.  What made this so fast and easy?  As I did my deployment at my Primary facility, I did all of my step by step design documentation in Microsoft OneNote; my favorite non-technical application.  Step by step deployment of the systems, issues, and other oddball events were all documented the first time around.  It made the second go really quick and easy.  Whether it be my Firewall configuration, or the configurations of the various Connection Servers, the time spent documenting paid off quickly.

Next up, I’ll be going over some basic configuration settings of your connection servers, and maybe a little tuning.

VDI for me. Part 2

 

In VDI for me.  Part 1, I described a little bit about our particular interests and use cases for VDI, and how we wanted to deploy a pilot project to learn more about it.  So now what?  Where does a person start?  Well, deployment guides and white papers for VMware View will tell you all of the possibilities.  They are thorough, but perhaps a bit tough to decipher when trying to understand what a simple VDI arrangement might look like.  So I’m just going to focus on how I built out a simple arrangement of VMware View, and how the components of VDI listed in my first post fit together.

Topology of VMware View
The Topology of View should fit in very nicely with what what you already have for a virtualized infrastructure.  If you are already running a vSphere cluster in your environment, its really just a matter of building your supporting components, which in a very simple or small deployment might be an instance of Composer, and a View Connection server or two.

image

VMware deployment guides will have all of the details of these components, so I will try to keep it brief. 

Connection Server.  This is the VM that is the broker that your View Client software will connect to, and will be available to users inside your network to connect to (e.g. “view.corp.lan”).  This is a domain joined system, and should probably reside on your primary LAN network. 

Security Server.  This is another VM that is running the Connection Server component but in a slightly different mode.  It’s purpose is to act as the secure proxy for external clients to connect to your internal systems presented by View.  It should only be used for clients outside of your network to connect to (e.g. “view.yourcompany.com”).  This server should not be joined to the domain, and should reside in a secure DMZ segment of your network that you have full granular control of ingress and egress traffic.

Composer.  This is the server that is behind the magic of using a minimal storage footprint to issue out many Virtual Desktops – by way of linked clones.  For my simple deployment, I chose to install it on my VM running vCenter.  I also chose to stick to the basics when it came to configuring Composer, as I really wanted to focus on the behavior of the View Client itself, not experiment how best to deploy hundreds of desktops with linked clones.  Composer needs a SQL Server, so you can use the database server that serves up vCenter, or something else if you wish.

Transfer Server.  This is a dedicated server that is used for “offline mode” laptop users or “Bring your Own PC” (BYoPC) arrangements.  Its optional, but gives the ability for a laptop user who normally uses a VDI VM that traditionally lives on the infrastructure, to be “checked out” to the laptop.  It uses all of the local resources of the laptop, and when/if it has the ability to phone home via an internet connection, it can send and update of the VM.  It is an optional feature, but a compelling one for those who might ask, “but what if I need my Virtual Desktop, and I have no internet connection?” or if you are considering a BYOPC arrangement.  It does not have to be joined to the domain, and its placement depends on the use case.

View Agent.  This is the tiny bit of software installed on a system that you want VMware View to present to a user.  In its simplest form, it can just be loaded onto an existing VM, or something served up from Composer.  If you need really high end horsepower with perhaps intensive graphics, it can be installed on a physical workstation with a PCoIP host card, so that VMware View can serve it up just like any other resource (more on this in a later post).  Right now, the agent software can only be installed on a Windows based system. 

View Client.  This is the application used by the end user to connect to their Virtual Desktop.  Currently the official client software is limited to Windows based machines, and iPads.  However, there is a Technology Preview edition for Mac OSX, and some distributions of Linux.  Note that if you are using a PCoIP Zero client, such as a Wyse P20, there is no software that you have to install on it.  Just connect it to the network, and connect to the internal View Connection Server, and you are good to go.

Interestingly enough, If you have a need to access your VDI systems from outside of your LAN, (thus requiring you to run the Security Server in a DMZ), you will need a separate/additional VM (domain joined) on the inside of your LAN that is running the Connection Server software, and is used exclusively by the Security Server.  The reason it cannot use the Connection Server above that was designated for internal use is that there are toggles that must be set based on the type of connection that is going to occur.  These toggles are mutually exclusive. (I’ll address this more on later posts)  You can try to use just one Connection Server coupled with a Security Server for both internal and external access, but traffic will not flow efficiently, and your performance will be affected by this. 

I think VMware might need to review their architecture of the Connection Servers.  Having three VM’s (which doesn’t even include the View Transfer Server) serve up internal and external access probably doesn’t seem like much when you have hundreds of clients, but for a pilot or a smaller deployment, it is overkill, and could be reduced to at least two with some tweaks in the Connection Server software.  Perhaps it wouldn’t be that big of a deal if they were just VM appliances.  (hint hint…)

The Software
I chose the “View Premier Starter Pack” which included all of the things needed to get started. View Composer, ThinApp, View Persona Management, vShield Endpoint, and “View Client for Local Mode”  Also included is a license of vSphere Enterprise Plus for Desktop Standalone licenses, and vCenter (special licensing considerations apply).  The result is that you get to try out VDI on a single vSphere host, with up to 10 VDI clients, and all of the tools necessary to experiment with the features of View.  From there, you would just purchase additional client licenses or vSphere host licenses as necessary.  It is really a great bundle, and an easy way to begin a pilot project with everything you need.

Placement of systems
More than likely, you will be presenting these systems to the outside world for some of your staff.  This is going to involve some non-VMware related decisions and configurations to get things running correctly.  Modifications will need to be made with your Firewall, DNS, SQL, vSphere, and AD.  If you are a smaller organization, the Administrator may be the same person (you).  Daunting, but at least you cut out the middle man.  That’s the way I like to think of it.  Nevertheless, you will want to at least exercise some care in the placement of systems to minimize performance issues, making sure traffic is running efficiently, in a secure manner.

There is going to be quite a bit of communication between the Connection servers, vCenter, the systems running the agent, and the clients running the View Client.  VMware publishes a nice document on all of the ports needed for each type of connection.  It’s all technically there, but does take a little time to decipher.  I stared at this document endlessly, while keeping a close I on my real-time firewall logs as I attempted to make various connections.  I really think I needed the two in order to get it working. 

As mentioned earlier, you should really have the Security Manager VM sitting off in a protected network segment off of your firewall.  For me, my primary edge security device is a Microsoft ForeFront Threat Management Gateway (TMG) Firewall software running on a nicely packaged appliance by Celestix Networks.  I’ve said it before, and I’ll say it again.  It is such a shame that Microsoft does not get its due credit on this.  My security friends in high places have consistently stated that there is simply nothing better when it comes to robust, flexible protection across the protocol and application stacks than TMG.  I feel fortunate to be working with the good product that it is.  The information that I share throughout this series will be based on use of TMG, but the overall principals will be similar to other edge security solutions.

How much you need to modify your firewall depends on how you place some of these systems.  Since these View related components will be interacting with vCenter, you will have to plan accordingly.  Some vSphere environments simply have their vSphere Management Network (aka Service Console network) simply behind a VLAN and routed directly to the LAN, while others have it on a separate network leg only accessible by a physical router or firewall.  I fall in the latter category, as my old switches never had the capability to do interVLAN routing.  Happily, these old switches have since been replaced, but I my vSphere Management network stills resides in an isolated network leg, and my Firewall settings for VMware View will accommodate for that.

Early impressions
While I wanted to deploy the systems correctly, and minimize floundering, I also wanted to see some early results.  After I built up the systems to Manage the environment, and worked out the connectivity issues, I thought I’d go ahead and install the View Agent on a few of the existing VMs used by our Software Developers located on another continent.  These Developers have played a very important role to the companies development efforts, so anything I could do to improve their user experience was a plus.  Previously, their work environment consisted of their own laptop connected to our network via a PPTP based VPN, then using RDP to connect up to a VM provisioned for them.  Their VM has all of their Development applications and tools, which allows for the data to be kept in the datacenter.  Oh, and their typical latency?  Around 280ms on average.  Yeah, their connection is that terrible. 

So my first tests really involved nothing more than installing the VMware View agent on their VM, then giving them the web address to download the VMware View Client, and a few steps on how to connect.  I also reminded them not to log into the VPN, as that was no longer necessary.  I asked them to try both RDP and PCoIP over their connection, along with some anecdotal feedback whenever they had a chance. 

  • Within 20 minutes they responded saying things like, “At first glance it works awesome” and “Much much faster than VPN + RDP.
  • That same day they told the rest of their team members about it, and during a Development review, stated, “please don’t take it away!
  • Further feedback included comments like “This feels at least 10x faster than the older method” along with “I wouldn’t dream of playing a video on the old RDP, and rotating a 3D view in our software would’ve made you want to bang your head against a wall.  This [new] way is great.
  • Their impressions over their highly latent connection was that using the View client with PCoIP was much more responsive than the View client with RDP.

What is really interesting was that I didn’t make a single modification to tune PCoIP, nor did I adjust their VMs in any way.  That is the power of a new approach to remote display rendering.  I do not know the exact thresholds that Teradici (the makers of the PCoIP protocol) were shooting for when it came to latency, but I’m willing to bet that nearly 300ms of latency was outside of their typical use case.  It is quite the compliment to the power of PCoIP.  Needless to say, once it was tested, I couldn’t bear to take this away from the Remote Developers, so I did my best to work my deployment around their work days.  This was really exciting because these were just normal VM’s that I was serving up.  I didn’t have a chance at this point to serve up the high end workstation with the VMware View Agent installed on it – something I was really looking forward to.

Some more lessons and observations
I’ve witnessed what others have experienced.  It is far too easy to let a pilot project turn into a production environment.  FAST.  With reactions like I shared above, it happens right before your eyes.  The neat features that were demonstrated were must-haves almost overnight.  No wonder so many small VDI deployments ultimately suffer in some form or another as they grow.  So if you are contemplating even a pilot deployment, keep this in mind.  I’m doing my best to contain expectations until I have one of the key ingredients in place; fast storage targeted for IOPS hungry VDI deployments.  For me this means a Dell EqualLogic PS6100XV hybrid array.  Until that time, I am not going to even consider expanding this beyond the 10 instances that comes with the View Premier Bundle.

It was nice to see that during my early deployment, that VMware released a “Technology Preview” edition of the View 5.0 Client for Mac OSX, as well as a version for Ubuntu.  It is a nice step forward, but it doesn’t go far enough.  A fully supported View Client needs to be provided for Redhat clones, and debian based linux distributions.  Likewise, the View Agent (the bit of software installed on the VM or the physical workstation that makes the system available via View) is limited to only Windows Operating Systems.  It is my position that having an agent for Linux distributions would serve VMware quite well – much better than they even know.  I’m hoping this is in their feature backlog.

Next up, I’ll be going over some typical View Administrator configuration settings, Firewall settings to get View and PCoIP flowing correctly, as well as my experimentation with a physical workstation with a PCoIP host card and a VMware View Agent installed so that it can be brokered by VMware View.

 

Helpful links
A recently released post by VMware describing exactly the benefit with hardware based systems that we are trying to exploit.
http://blogs.vmware.com/euc/2012/01/enhancing-graphics-processing-with-teradici-pcoip-host-cards-and-vmware-view.html

VMware View 5.0 Documentation Portal
http://www.vmware.com/support/pubs/view_pubs.html

A comprehensive guide to ports needed to be opened with View 5 and it’s related services
http://pubs.vmware.com/view-50/topic/com.vmware.ICbase/PDF/view-50-security.pdf

The readme for the View Client Tech Preview for Mac OSX
http://communities.vmware.com/docs/DOC-17916

EqualLogic hybrid arrays in VDI testing
http://en.community.dell.com/techcenter/b/techcenter/archive/2011/02/23/equallogic-ps6000xvs-hybrid-arrays-shine-in-vdi-testing.aspx