Getting the big IT purchase approved

IT organizations are faced with a tantalizing array of options when it comes to hardware and software solutions. But long before anything can ever be deployed, it has to be purchased, which means at some point it had to be approved. Sometimes deploying a solution is easy compared to getting it approved. But how does one go about getting the big ticket item through? Well, here is my attempt at demystifying the process.

First, lets just say that "big purchase" is without a doubt a relative term. For an SMB, $10,000 might be a show stopper, while seven figures for a large enterprise may be part of the routine. Both offer unique challenges, but share similar tactics. Getting a big IT purchase approved typically consists of a unique set of skills and experience. A mix of preparation, clarity, delivery, timing, and attitude make up the chaotic formula that when done well, will improve the odds of success. It is a skill that can be equally important to anything you bring in your technical arsenal.

Preparation
You will serve yourself well if you think and deliver like a consultant. Life in Ops can get muddied down by internal strife, whack-a-mole fire fighting, and the occasional "look at this new feature" deployment even though nobody asked for it. Take notice of how a good consultant does things. Step back to understand the desired result, then build out your own statement defining the typical design inputs like requirements, constraints, assumptions and risks.

At some point, you will need to prioritize your own wants, and pick your battles. You typically can’t have everything, so start from the ground up of what IT’s mission statement is, and work from there. Start with bet-the-business elements like high availability, and data/system protection that won’t be spoken up for by anyone but IT. Then, if there are other needs, they may in fact be a departmental need that impacts productivity and revenue. While IT may be the enabler of the request, make sure the identity of the requester is clear.

It’s not uncommon for an SMB to have very little money allocated to IT, but this isn’t an excuse for lack of diligence in preparation. Large organizations have more money, but proportionally much more complex problems to solve, SLAs to adhere to, and regulations to comply with. If you have no idea how your organization’s IT spending compares to peers in your industry, it is time to learn, and communicate that as a part of your presentation if your funds are abnormally low.

This is also an opportunity for you to project yourself as the "solution provider" in your organization. Embrace this. Help them understand why technology costs have increased over the past 10 years. If someone says, "Why don’t we just use the cloud for this?" Rather than let smoke pour out of your ears, respond with "That is a great question Joe. IT is constantly looking for the best ways to deliver services that meets the requirements of the organization." And then go into an appropriate level of detail on why it may or may not be a good fit. (If it is a good fit, then say so!). The point here is to embrace the solution provider role for the organization.

Your biggest competitor to your proposal will be, you guessed it, doing nothing. But there is a cost of doing nothing. The key stakeholders might look at this proposed expenditure and compare it to $0. In most cases, this is completely wrong, and it is up to you to help them understand what the real cost comparison is.

One opportunity sometimes overlooked is the power of a cost deferral. Does the unbudgeted solution you are proposing delay a much larger budgeted purchase until perhaps next year? Showcase this. Good proposals typically show a TCO of 3 to 5 years. But do not underestimate the allure an immediate cost deferral has to your friendly CFO.

Get input on defining the "what" of a problem, and it’s impacts. The "how" is usually reserved for the Subject Matter Expert (e.g. you). This will minimize silly ideas from others suggesting your storage capacity issues can be solved by the Friday flier for Best Buy.

Learn to prime the pump. Do a little one-on-one campaigning. This is a common method suggested in many books on successful leadership. It is your chance to win over your constituents before any formal proposal. Trying holding an internal "Lunch and Learn" about trends in technology. Share a little about how amazing virtualization is, and help them understand some basic challenges of IT. These techniques will engage key personnel, and help in establishing a trusting relationship with IT.

The presentation – IT Shark Tank
I’m a big fan of the show, ‘Shark Tank.’ If you aren’t familiar with it, four very successful investors hear pitches by would-be entrepreneurs who are looking for investment funds in exchange for a stake in equity. The investors bring their own wealth, smarts and competitive nature to the table, and can be quite tough on prospective entrepreneurs. A few things can be gleaned from this, and applied directly to your ability to deliver a successful proposal.

  • Come prepared. Nothing kills a proposal like lack of preparation, and not knowing your facts. Lets say you are requesting more storage: You’d better believe some of the simplest questions will be asked. Many that you may overlook when entering a room. "How much storage do we have?" "How much do we have left?" "How much do we need?" "Why does it cost so much?" "what are the alternatives?"
  • Clearly state the problem, the impacts to the business, the options, and your recommendations.
  • Learn to answer the simplest of questions in the simplest of ways. "Does this proposal save us money?" "Is there a less expensive way to do this?"
  • Craft your message to your audience and appeal to their sensibilities. Flog yourself upside the head if you use any IT acronyms, or assume that technical gymnastics is going to impress them. It won’t. What will is being concise. Every word has a purpose.
  • Provide a little (but not too much) context to the problem that you are trying to solve. Leverage an analogy if you need to.
  • Know the counterpoints, and how to respond. Know how you are going to answer a question you don’t know the answer to.
  • Seek to understand their position. What might they dislike (e.g. unpredictable expenses, obligated debt, investments they don’t understand, etc.)
  • Respect everyone’s time. Make it quick, make it concise, and if they would like more detail, you can certainly do that, but don’t make it a part of the pitch.

How to deal with everyone else in the food chain
Be honest with your vendors. They have a job to do, and are trying to help you. If you show interest in a solution that is 10x more than what you can afford, it isn’t going to do anyone good to bring them in for an onsite demonstration. They will appreciate your honesty so they can perhaps focus on more cost appropriate solutions. Believe it or not, most want the right solution for you in the first place, as repeat business is the most important value they can bring back to their own organization.

If you are someone who doesn’t have deep-dive knowledge on the solution you are proposing, take advantage of the SE for the VAR or channel partner as a resource. Many of my friends in the industry are SEs and are some of the best and the brightest folks I know, and they all came from the Ops side at some point. Use them as a resource to learn about the solutions they are proposing, and ask them challenging questions.

Be honest with your organization. This isn’t about what you want. Your value will increase when you can demonstrate repeatedly that you have their best interests in mind.

After the decision
If the proposal was approved, focus on delivering at least some results fast. Then showcase the win and how IT can help solve organizational challenges. This may sound like self promotion, but it is not if done right. The wins are for the organization, not you. This establishes trust, and lays the groundwork for the future. Use company newsletters, or establish a monthly IT Review to share updates.

If it was denied, don’t take it personal. It is great to show passion, but don’t confuse passion for what you are really trying to do; helping your organization make the best strategic and financial decision for them. Would it be gratifying to get a new Datacenter revamp through only to realize it was the financial tipping point of the organization just a few months later? Keep it all in perspective. Besides, some of the best purchasing decisions I’ve been involved with were the ones that were ultimately rejected, which gave solutions a chance to mature, and me an opportunity to find a different way to solve a problem.

Try doing your own proposal or presentation retrospective. What went well and what didn’t. Ask for feedback on how it went. You might be surprised at the responses you get.

Conclusion
You have the unique opportunity to be the technology advocate for the organization rather than simply a burden to the budget.  Do I get everything approved?  Of course I don’t, but a well prepared proposal will allow you, and your organization to make the smartest decisions possible, and help IT deliver great results.

Practical tips for a Veeam Backup and Recovery deployment

I’ve been using Veeam Backup and Recovery in my production environment for a while now, and in hindsight, it was one of the best investments we’ve ever made in our IT infrastructure. It has completely changed the operational overhead of protecting our VMs, and the data they serve up. Using a data protection solution that utilizes VMware’s APIs provides the simplicity and flexibility that was always desired. Moving away from array based features for protection has enabled the protection of VMs to better reflect desired RPO and RTO requirements – not by the limitations imposed by LUN sizes, array capacity, or functionality.

While Veeam is extremely simple in many respects, it is also a versatile, feature packed application that can be configured a variety of different ways. The versatility and the features can be a little confusing to the new user, so I wanted to share 25 tips that will help make for a quick and successful deployment of Veeam Backup and Recovery in your environment.

First lets go over a few assumptions that will be the basis for my recommendations:

  • There are two sites that need protection.
  • VMs and data need to be protected at each site, locally.
  • VMs and data need to be protected at each site, remotely.
  • A NAS target exists at each site.
  • Quick deployment is important.
  • You’ve already read all of the documentation. Winking smile

    Architecture
    There are a number of different ways to set up the architecture for Veeam. I will show a few of the simplest arrangements:

    In this arrangement below there would be no physical servers – only a NAS device. This is a simplified arrangement of what I use. If one wanted a rebuilt server (Windows or Linux) acting purely as a storage target, that could be in place of where you see the NAS. The architecture would stay the same.

    image

    Optionally, a physical server not just acting as a storage target, but also as a physical proxy would look something like this below:

    image

    Below is a combination of both, where a physical server is acting as the Proxy, but like the virtual proxy, is using an SMB share to house the data. In this case, a NAS unit.

     

    image

    Implementation tips
    These tips focus not so much on ultimately what may suite your environment best (only you know that) or leveraging all of the features inside the product, but rather, getting you up and running as quickly as possible so you can start returning great results.

    Job Manager Servers & Proxies

    1.  Have the job Manager server, any proxies, and the backup targets living on their own VLAN for a dedicated backup network.

    2.  Set up SNMP monitoring on any physical ports used in the backup arrangement.  It will be helpful to understand how utilized the physical links get, and for how long.

    3.  Make sure to give the Job Manager VM enough resources to play with – especially if it will have any data mover/proxy responsibilities.  The deployment documentation has good information on this, but for starters, make it 4vCPU with 5GB of RAM.

    4.  If there is more than one cluster to protect, consider building a virtual proxy inside each cluster that it will be responsible for protecting, then assign it to jobs that protect VMs in that cluster.  In my case, I use PernixData FVP in two clusters.  I have the data stores that house those VMs only accessible by their own cluster (a constraint of FVP).  Because of that, I have a virtual proxy living in each cluster, with backup jobs configured so that it will use a specific virtual proxy.  These virtual proxies have a special setting in FVP that will instruct the VMs being backed up to flush their write cache to the backing storage

    image

    Storage and Design

    5.  Keep the design simple, even if you know you will need to adjust at a later time.  Architectural adjustments are easy to do with Veeam, so  go ahead and get Veeam pointed to the target, and start running some jobs.  Use this time to get familiar with the product, and begin protecting the jewels as quickly as possible.

    6.  Let Veeam use the default SQL Server Express instance on the Veeam Job Manager VM.  This is a very reasonable, and simple configuration that should be adequate for a lot of environments.

    7.  Question whether a physical proxy is needed.  Typically physical proxies are used for one of three reasons.  1.)  They offload job processing CPU cycles from your cluster.  2.)  In simple arrangements a Windows based Physical proxy might also be the Repository (aka storage target).   3.) They allow for one to leverage a "direct-from-SAN" feature by plugging in the system to your SAN fabric.  The last one in my opinion introduces the most hesitation.  Here is why:

    • Some storage arrays do not have a "read-only" iSCSI connection type.  When this is the case, special care needs to be taken on the physical server directly attached to the SAN to ensure that it cannot initialize the data store.  The reality is that you are one mistake away from having a very long day in front of you.  I do not like this option when there is no secondary safety mechanism from the array on a "read-only" connection type.
    • Direct-from-SAN access can be a very good method for moving data to your target.  So good that it may stress your backing storage enough (via link saturation or physical disk limits) to perhaps interfere with your production I/O requirements.
    • Additional efforts must be taken when using write buffering mechanisms that do not live on the storage array (e.g. PernixData) .

    8.  Veeam has the ability to back up to an SMB share, or an NFS mount.  If an NFS mount is chosen, make sure that it is a storage target running native Linux.  Most NAS units like a Synology are indeed just a tweaked version of Linux, and it would be easy to conclude that one should just use NFS.  However, in this case, you may run into two problems.

    • The SMB connection to a NAS unit will likely be faster (which most certainly is the first time in history that an SMB connection is faster than an NFS connection) .
    • The Job Manager might not be able to manage the jobs on that NAS unit (connected via NFS) properly.  This is due to BusyBox and Perl on the Synology not really liking each other.  For me, this resulted in Veeam being unable to remove sun setting backups.  Changing over to an SMB connection on the NAS improved the performance significantly, and allowed for job handling to work as desired.

    9.  Veeam has a great new feature (version 7.x)  called a "Backup Copy" job, which allows for the backup made locally to be shipped to a remote site.  The "Backup Copy" job achieves one of the most basic requirements of data protection in the simplest of ways.  Two copies of the data at two different locations, but with the benefit of only processing the backup job once.  It is a new feature of Version 7, and although it is a great feature, it behaves differently, and warrants some time spent before putting into production.  For a speedy deployment, it might be best simply to configure two jobs.  One to a local target, and one to a remote target.  This will give you the time to experiment with the Backup Copy job feature.

    10.  There are compelling reasons for and against using a rebuilt server as a storage target, or using a NAS unit.  Both are attractive options.  I ended using a dedicated NAS unit.  It’s form factor, drive bay count, and the overall cost of provisioning was the only option that could match my requirements.

    Operations

    11.  In Veeam B&R, "Replication Jobs" are different than "Backup Jobs."  Instead of trying to figure out all of the nuances of both right away, use just the "Backup job" function with both local and remote targets.  This will give you time to better understand the characteristics of the replication functionality. One also might find that the "Backup Job" suites the environment and need better than the replication option.

    12.  If there are daily backups going to both local and offsite targets (and you are not using the "Backup Copy" option, have them run 12 hours apart from one another to reduce RPOs.

    13.  Build up a test VM to do your testing of a backup and restore.  Restore it in the many ways that Veeam has to offer.  Best to understand this now rather than when you really need to.

    14.  I like the job chaining/dependency feature, which allows you to chain multiple jobs together.  But remember that if a job is manually started, it will run through the rest of the jobs too.  The easiest way to accommodate this is to temporarily remove it from the job chain.

    15.  Your "Backup Repository" is just that, a repository for data.  It can be a Windows Server, a Linux Server, or an SMB share.  If you don’t have a NAS unit, stuff an old server (Windows or Linux) with some drives in it and it will work quite well for you.

    16.  Devise a simple, clear job naming scheme.  Something like [BackupType]-[Descriptive Name]-[TargetLocation] will quickly tell you what it is and where it is going to.  If you use folders in vCenter to organize your VMs, and your backups reflect the same, you could also  choose to use the folder name.  An example would be "Backup-SharePointFarm-LOCAL" which quickly and accurately describes the job.

    17.  Start with a simple schedule.  Say, once per day, then watch the daily backup jobs and the synthetic fulls to see what sort of RPO/RTOs are realistic.

    18.  Repository naming.  Be descriptive, but come up with some naming scheme that remains clear even if you aren’t in the application for several weeks.  I like indicating the location of the repository, if it is intended for local jobs, or remote jobs, and what kind of repository it is (Windows, Linux, or SMB).  For example:  VeeamRepo-[LOCATION]-for-Local(SMB)

    19.  Repository organization.  Create a good tree structure for organization and scalability.  Veeam will do a very good job at handling the organization of the backups once you assign a specific location (share name) on a repository.  However, create a structure that provides the ability to continue with the same naming convention as your needs evolve.  For instance, a logical share name assigned to a repository might be \\nas01\backups\veeam\local\cluster1  This arrangement allows for different types of backups to live in different branches.

    20.  Veeam might prevent the ability of creating more than one repository going to the same share name (it would see \\nas01\backups\veeam\local\cluster1 and \\nas01\backups\veeam\local\cluster2 as the same).  Create DNS aliases to fool it, then make those two targets something like \\nascluster1\backups\veeam\local\cluster1  and  \\nascluster2\backups\veeam\local\cluster2 

    21.  When in doubt, leave the defaults.  Veeam put in great efforts to make sure that you, or the software doesn’t trip over itself.  Uncertain of job number concurrency?  Stick to the default.  Wondering about which backup mode to use? (Reverse Incremental versus Incrementals with synthetic fulls). Stay with the defaults, and save the experimentation for later.

    22.  Don’t overcomplicate the schedule (at least initially).  Veeam might give you flexibility that you never had with array based protection tools, but at the same time, there is no need to make it complicated.  Perhaps group the VMs by something that you can keep track of, such as the folders they are contained in within vCenter.

    23.  Each backup job can be adjusted so that whatever target you are using, you can optimize it for preset storage optimization type.  WAN target, LAN target, or local target.  This can easily be overlooked, but will make a difference in backup performance.

    24.  How many backups you can keep is a function of change range, frequency, dedupe and compression, and the size of your target.  Yep, that is a lot of variables.  If nothing else, find some storage that can serve as the target for say, 2 weeks.  That should give a pretty good sampling of all of the above.

    25.  Take one item/feature once a week, and spend an hour or two looking into it.  This will allow you to find out more about say, Changed block tracking, or what the application aware image processing feature does.  Your reputation (and perhaps, your job) may rely on your ability to recover systems and data.  Come up with a handful of scenarios and see if they work.

    Veeam is an extremely powerful tool that will simplify your layers of protection in your environment. Features like SureBackup, Virtual Labs, and their Replication offerings are all very good. But more than likely, they do not need to be a part of your initial deployment plan. Stay focused, and get that new backup software up and running as quickly as possible. You, and your organization, will be better off for it.

    - Pete

    Effects of introducing write-back caching with PernixData FVP

    Implementing new technology that solves real problems is great. It is exciting, and you get to stand on the shoulders of the smart folks who dreamed up the solution. But with all of that glory comes new design and operation elements that may have been introduced. This isn’t a bad thing. It is just different. The magic of virtualization didn’t excuse the requirement of needing to understand the design and operational considerations of the new paradigm. The same goes for implementing host based caching in a virtualized environment.

    Implementing FVP is simple and the results can be impressive. For many, that is about all the effort they may end up putting into it. But there are design considerations that will help maximize the investment, and minimize false impressions, or costly mistakes. I want to share what has been learned against my real world workloads, so that you can understand what to look for, and possibly how to get more out of your investment. While FVP accelerates both reads and writes, it is the latter that warrants the most consideration, so that will be the focus of this post.

    When accelerating storage using FVP, the factors that I’ve found to have the most influence on how much your storage I/O is accelerated are:

    • Interconnect speed between hosts of your pooled flash
    • Performance delta between your flash tier, and your storage tier.
    • Working set size of your data
    • Duty cycle write I/O profile of your VMs (including peak writes, and duration)
    • I/O size of your writes (which can vary within each workload)
    • Likelihood or frequency of DRS or manual vMotion activities
    • Native speed and consistency of your flash (the flash itself, and the bus speed)
    • Capacity of your flash (more of an influence on read caching, but can have some impact on writes too)

    Write-back caching & vMotion
    Most know by now that to guard against any potential data loss in the event of a host failure, FVP provides redundancy of write-back caching through the use of one or more peers. The interconnect used is the vMotion network. While FVP does a good job of decoupling the VM’s need to wait for the backing datastore, a VM configured for write-back with redundancy must acknowledge the write I/O of the VM from it’s local flash, AND the one or more peers before it returns the write ACK to the VM.

    What does this mean to your environment? More traffic on your vMotion network. Take a look at the image below. In a cluster NOT accelerated by FVP, the host uplinks that serve a vMotion network might see relatively little traffic, with bursts of traffic only during vMotion activities. That would also be the case if you were running FVP in write-back mode with no peers (WB+0). This image below is what the activity on the vMotion network looks like as perceived by one of the hosts after the VMs had write-back with redundancy of one peer. In this case the writes were averaging about 12MBps across the vMotion network. You will see that the spike is where a vMotion kicked off: The spike is the peak output of a 1GbE interface; about 125MBps.

    image

    Is this bad that the traffic is running over your vMotion network? No, not necessarily. It has to run over something. But with this knowledge, it is easy to see that bandwidth for inter-server communication will be more important than ever before. Your infrastructure design may need to be tweaked to accommodate the new role that the vMotion network plays.

    Can one get away with a 1GbE link for cross server communication? Perhaps. It really depends on the factors above, which can sometimes be hard to determine. So with all of the variables to consider, it is sometimes easiest to circle back to what we do know:

    • Redundant write back caching with FVP will be using network connectivity (via vMotion network) for every single write that occurs for an accelerated VM.
    • Redundant write back caching writes are multiplied by the number of peers that are configured per accelerated VM.
    • The write accelerated I/O commit time (latency) will be as fast as the slowest connection.  Your vMotion network will likely be slower than the local bus.  A poor quality SSD or an older generation bus could be a bottleneck too.
    • vMotion activities enjoy using every bit of bandwidth it has available to it.
    • VM’s that are committing a lot of writes might also be taxing CPU resources, which may kick in DRS rules to rebalance the load – thus creating more vMotion traffic.  Those busy VMs may be using more active memory pages as well, which may increase the amount of data to move during the vMotion process.

    The multiplier of redundancy
    Lets run through a simple scenario to better understand the potential impact an undersized vMotion network can have on the performance of write-back caching with redundancy. The example is addressing writes only.

    • 4 hosts each have a group of 6 VM’s that consistently write 5MBps per VM.  Traditionally, these 24 VMs would be sending a total of 120MBps to the backing physical storage.
    • When write back is enabled without any redundancy (WB+0), the backing storage will still see the same amount of writes committed, but it will be in a slightly different way.  Sequential, and smoothed out as data is flushed to the backing physical storage.
    • When write back is enabled and a write redundancy of “local flash and 1 network flash device” (WB+1) is chosen, the backing storage will still see 120MBps go to it eventually, but there will be an additional 120MBPs of data going to the host peers, traversing the vMotion network.
    • When write back is enabled and a write redundancy of “local flash and 2 network flash devices” (WB+2) is chosen, the backing storage will still see 120MBps to it, but there will be an additional 240MBps of data going to the host peers, traversing the vMotion network.

    image

    The write-back redundancy configuration is a per-VM setting, so there not necessarily a need to change them all to one setting. Your VMs will most likely not have the same write workload either. But this is to illustrate the point that as the example shows, it is not hard to saturate a 1GbE interface. Assuming an approximate 125MBps on a single 1GbE interface, under the described arrangement, saturation would occur with each VM configured for write-back with redundancy of one peer (WB+1). This leaves little headroom for other traffic that might be traversing that network, such as vMotions, or heartbeats.

    Fortunately FVP has the smarts built in to ensure that vMotion activities and write-back caching get along. However, there is no denying the physics associated with the matter. If you have a lot of writes, and you really want to leverage the full beauty of FVP, you are best served by fast interconnects between hosts. It is a small price to pay for supreme performance. FVP might expose the fact that 1GbE not be ideal in an accelerated environment, but consider what else has changed over the years. Standard memory sizes of deployed VMs have increased significantly (The vOpenData Public Dashboard confirms this). That 1GbE vMotion network might have been good for VM’s with 512MB of RAM, but what about those with 4, 8, or 12GB of RAM?  That 1GbE vMotion network has become outdated even for what it was originally designed for.

    Destaging
    One characteristic unique with any type of write-back caching is that eventually, the data needs to be destaged to the backing physical datastore. The server-side flash that is now decoupled from the backing storage has the potential to accommodate a lot of write I/Os with minimal latency. One may or may not have the backing spindles, or conduit large enough to be sending your write I/O to the backing physical storage if this high write I/O lasts long enough. Destaging issues can occur on an arrangement like FVP, or with storage arrays and DAS arrangements that front performance I/O with flash that get pushed to slower spindles.

    Knowing the impact of this depends on the workload and the environment it runs in.

    • If the duty cycle of the write workload that is above the physical storage I/O limit allows for enough “rest time” (defined as any moment that the max I/O to the backing physical storage is below 100%) to destage before the next over commitment begins, then you have effectively increased your ability to deliver more write I/Os with less latency.
    • If the duty cycle of the write workload that is above the physical storage I/O limit is sustained for too long, the destager of that given VM will fill to capacity, and will not be able to accelerate any faster than it’s ability to destage.

    Huh?  Okay, a picture might be a better way to describe this.  The callouts below point to the two scenarios described.

    image

     

    So when looking at this write I/O duty cycle, there becomes a concept of amplitude of the maximum write I/O, and frequency of those times in which is it overcommitting. When evaluating an environment, you might see this crude sine-wave show up. This write I/O duty cycle, coupled with your physical components is the key to how much FVP can accelerate your environment.

    What happens when the writes to the destager surpass the ability of your backing storage to keep up with the writes? Once the destager for that given VM fills up, it’s acceleration will reduce to the rate that it can evacuate the data to the backing storage.  One may never see this in production, but it is possible.  It really depends on the factors listed at the beginning of the post.  The only way to clearly see this is from a synthetic workload, where I show it was able to push 5 times the write I/Os (blue line) before eventually filling up the destager to the point where it was throttled back to the rate of the datastore (purple line)

    SNAGHTML329ee44

    This will have an impact on the effective latency, shown below (blue line).  While the destager is full, it will not be able to fulfill the write at the low latency typically associated with flash, reflecting latency closer to the backing datastore (purple line).

    image

    Many workloads would never see this behavior, but those that are very write intensive (like mine), and that have a big delta between their acceleration tier and their backing storage may run into this.

    The good news is that workloads have a tendency to be bursty, which is a perfect match for an acceleration tier. In a clustered arrangement, this is much harder to predict, and bursty can be changed to steady-state quite quickly. What this demonstrates is that if there is enough of a performance delta between your acceleration tier, and your storage tier, under cases of sustained writes, there may be times where it doesn’t have the opportunity to flush enough writes to maintain it’s ability to accelerate.

    Recommendations
    My recommendations (and let me clarify that these are my opinions only) on implementing FVP would include.

    • Initially, run the VMs in write-through mode so that you can leverage the FVP analytics to better understand your workload (duty cycles, read/write ratios, maximum write throughput for a VM, IOPS, latency, etc.)
    • As you gain a better understanding of the behavior of these workloads, introduce write-back caching to see how it helps the systems changed.
    • Keep and eye on your vMotion network (in particular, those with 1GbE environments and limited physical ports) and see if one ever comes close to saturation.  Other leading indicators will be increased latency on accelerated writes.
    • Run out and buy some 10GbE NICs for your vMotion network.  If you are in a situation with a total 1GbE legacy fabric for your SAN, and your vMotion network, and perhaps you have limits on form factors that may make upgrading difficult (think blades here), consider investing in 10GbE for your vMotion network, as opposed to your backing storage. Your read caching has probably already relieved quite a bit of I/O pressure on your storage, and addressing your cross server bandwidth is ultimately a more affordable, and simpler task.
    • If possible, allocate more than one link and configure for Multi-NIC vMotion. At this time, FVP will not be able to leverage this, but it will allow vMotion to use another link if the other link is busy. Another possible option would be to bond multiple 1GbE links for vMotion. This may or may not be suitable for your environment.

    So if you haven’t done so already, plan to incorporate 10GbE for cross-server communication for your vMotion Network. Not only will your vMotioning VM’s thank you, so will the performance of FVP.

    - Pete

    Helpful links:

    Fault Tolerant Write acceleration
    http://frankdenneman.nl/2013/11/05/fault-tolerant-write-acceleration/

    Destaging Writes from Acceleration Tier to Primary Storage
    http://voiceforvirtual.com/2013/08/14/destaging-writes-i/

    Using a new tool to discover old problems

    It is interesting what can be discovered when storage is accelerated. Virtual machines that were previously restricted by the underperforming arrays now get to breath freely.  They are given the ability to pass storage I/O as quickly as the processor needs. In other words, the applications that need the CPU cycles get to dictate your storage requirements, rather than your storage imposing artificial limits on your CPU.

    With that idea in mind, a few things revealed themselves during the process of implementing PernixData FVP.  Early on, it was all about implementing and understanding the solution.  However, once the real world workloads began accelerating, there was intrigue on the analytics that FVP was providing.  What was generating the I/O that was being accelerated?  What processes were associated with the other traffic not being accelerated, and why?  What applications were behind the changing I/O sizes?  And what was causing the peculiar I/O patterns that were showing up?  Some of these were questions raised at an earlier time (see: Hunting down unnecessary I/O before you buy that next storage solution ).  The trouble was, the tools I had to discover the pattern of data I/O were limited.

    Why is this so important? In the spirit of reminding ourselves that no resource is an island, here is an example of a production code compile run, as looking from the perspective of the guest CPU. The first screen capture is the code with adequate storage I/O to support the application’s needs. A full build and is running nearly perfect CPU utilization of all 8 of it’s vCPUs.  (screen shots taken from my earlier post; Vroom! Scaling up Virtual Machines in vSphere to meet performance requirements-Part 2)

    image

    Below is that very same code compile, under stressed backend storage. It took 46% longer to complete, and as you can see, changes the CPU utilization of the build run.

    image

    The primary goal for this environment was to accelerate the storage. However, it would have been a bit presumptuous to conclude that all existing storage traffic is good, useful I/O. There is a significant amount of traffic originating from outside of IT, and the I/O generated needed to be understood better.  With the traffic passing more freely thanks to FVP acceleration, patterns that previously could not expose themselves should be more visible. This was the basis for the first discovery

    A little “CSI” work on the IOPS
    Many continuous build systems use some variation of a polling mechanism to understand when there is new checked in code that needs to be compiled. This should be a very light weight process.  However, once storage performance was allowed to breath better, the following patterns started showing up on all of the build VMs.

    The image below shows the IOPS for one build VM during a one hour period of no compiling for that particular VM.  The VM’s were polling for new builds every 5 minutes.  Yep, that “build heartbeat” was as high as 450 IOPS on each VM.

    high-IOPS-heartbeat

    Why wasn’t this noticed before?  These spikes were being suppressed by my previously overtaxed storage, which made them more difficult to see. These were all writes, and were translating into 500 to 600 steady state IOPS just to sit idle (as seen below from the perspective of the backing storage)

    Array-VMFSvolumeIOPS

    So what was the cause? As it turned out, the polling mechanism was using some source code control (SVN) calls to help the build machines understand if it needed to execute a build. Programmatically, the Development Team has no idea that the script that they develop is going to be efficient, or not efficient. They are separated by that layer of the infrastructure. (Sadly, I have a feeling this happens more often than not in general Application Development). This resulted in a horribly inefficient method. After helping them understand the matter, it was revamped, and now polling for each VM only takes 1 to 2 IOPS every 5 minutes.

    Idle-IOPS2

    The image below shows how the accelerated cluster of 30 build VMs looks when there are no builds running.

    Idle-IOPS

    The inefficient polling mechanism wasn’t the only thing found. A few of the Linux build VMs had a rouge “Beagle” search daemon running on them. This crawler did just that, indexing data on these Linux machines, and creating unnecessary I/O.  With Windows, Indexers and other CPU and I/O hogs are typically controlled quite easily by GPO, but the equivalent services can creep into Linux systems if not careful.  It was an easy fix at least.

    The cumulative benefit
    Prior to the efforts of accelerating the storage, and looking to make it more efficient, the utilization of the arrays looked as the image shows.  (6 hour period, from the perspective of the arrays)

    Array-IOPS-before

    Now, with the combination of understanding my workload better, and acceleration through FVP, that same workload looks like this (6 hour period, from the perspective of the arrays):

    Array-IOPS-after

    Notice that the estimated workload is far under the 100% it was regularly pegged at for 24 hours a day, 6 days a week.  In fact, during the workday, the arrays might only peak at 50% to 60% utilization.  When no builds are running, the continuous build system may only be drawing 25 IOPS from the VMFS volumes that contain the build machines, which is much more reasonable than where it was at.

    With the combination of less pressure on the backing physical storage, and the magic of pooled flash on the hosts, the applications and CPU get to dictate how much storage I/O is needed.  Below is a screen capture of IOPS on a production build VM while compiling was being performed.  It was not known up until this point that a single build VM needed as much as 4,000 IOPS to compile code because the physical storage was never capable of satisfying that type of need.

    IOPS-single-VM

    Conclusion
    Could some of these discoveries have been made without FVP?  Yes, perhaps some of it. But good analysis comes from being able to interpret data in a consumable way. Its why various methods of data visualization such as bar graphs, pie charts, and X-Y-Z plots exist. FVP certainly has been doing a good job of accelerating workloads, but it is also helps the administrator understand the I/O better.  I look forward to seeing how the analytics might expand in future tools or releases from PernixData.

    A friend once said to me that the only thing better than a new tractor is a reason to use it. In many ways, the same thing goes for technology. Virtualization might not even be that fascinating unless you had real workloads to run on top of it. Ditto for for PernixData FVP. When applied to real workloads, the magic begins to happen, and you learn a lot about your data in the process.

    Accelerating storage using PernixData’s FVP. A perspective from customer #0001

    Recently, I described in "Hunting down unnecessary I/O before you buy that next storage solution" the efforts around addressing "technical debt" that was contributing to unnecessary I/O. The goal was to get better performance out of my storage infrastructure. It’s been a worthwhile endeavor that I would recommend to anyone, but at the end of the day, one might still need faster storage. That usually means, free up another 3U of rack space, and open checkbook

    Or does it?  Do I have to go the traditional route of adding more spindles, or investing heavily in a faster storage fabric?  Well, the answer was an unequivocal "yes" not too long ago, but times are a changing, and here is my way to tackle the problem in a radically different way.

    I’ve chosen to delay any purchases of an additional storage array, or the infrastructure backing it, and opted to go PernixData FVP.  In fact, I was customer #0001 after PernixData announced GA of FVP 1.0.  So why did I go this route?

    1.  Clustered host based caching.  Leveraging server side flash brings compute and data closer together, but thanks to FVP, it does so in such a way that works in a highly available clustered fashion that aligns perfectly with the feature sets of the hypervisor.

    2.  Write-back caching. The ability to deliver writes to flash is really important. Write-through caching, which waits for the acknowledgement from the underlying storage, just wasn’t good enough for my environment. Rotational latencies, as well as physical transport latencies would still be there on over 80% of all of my traffic. I needed true write-back caching that would acknowledge the write immediately, while eventually de-staging it down to the underlying storage.

    3.  Cost. The gold plated dominos of upgrading storage is not fun for anyone on the paying side of the equation. Going with PernixData FVP was going to address my needs for a fraction of the cost of a traditional solution.

    4.  It allows for a significant decoupling of "storage for capacity" versus "storage for performance" dilemma when addressing additional storage needs.

    5.  Another array would have been to a certain degree, more of the same. Incremental improvement, with less than enthusiastic results considering the amount invested.  I found myself not very excited to purchase another array. With so much volatility in the storage market, it almost seemed like an antiquated solution.

    6.  Quick to implement. FVP installation consists of installing a VIB via Update Manager or the command line, installing the Management services and vCenter plugin, and you are off to the races.

    7.  Hardware independent.  I didn’t have to wait for a special controller upgrade, firmware update, or wonder if my hardware would work with it. (a common problem with storage array solutions). Nor did I have to make a decision to perhaps go with a different storage vendor if I wanted to try a new technology.  It is purely a software solution with the flexibility of working with multiple types of flash; SSDs, or PCIe based. 

    A different way to solve a classic problem
    While my write intensive workload is pretty unique, my situation is not.  Our storage performance needs outgrew what the environment was designed for; capacity at a reasonable cost. This is an all too common problem.  With the increased capacities of spinning disks, it has actually made this problem worse, not better.  Fewer and fewer spindles are serving up more and more data.

    My goal was to deliver the results our build VMs were capable of delivering with faster storage, but unable to because of my existing infrastructure.  For me it was about reducing I/O contention to allow the build system CPU cycles to deliver the builds without waiting on storage.  For others it might delivering lower latencies to their SQL backed ERP or CRM servers.

    The allure of utilizing flash has been an intriguing one.  I often found myself looking at my vSphere hosts and all of it’s processing goodness, but disappointed those SSD sitting in the hosts couldn’t help to augment my storage performance needs.  Being an active participant in the PernixData beta program allowed me to see how it would help me in my environment, and if it would deliver the needs of the business.

    Lessons learned so far
    Don’t skimp on quality SSDs.  Would you buy an ESXi host with one physical core?  Of course you wouldn’t. Same thing goes with SSDs.  Quality flash is a must! I can tell you from first hand experience that it makes a huge difference.  I thought the Dell OEM SSDs that came with my M620 blades were fine, but by way of comparison, they were terrible. Don’t cripple a solution by going with cheap flash.  In this 4 node cluster, I went with 4 EMLC based, 400GB Intel S3700s. I also had the opportunity to test some Micron P400M EMLC SSDs, which also seemed to perform very well.

    While I went with 400GB SSDs in each host (giving approximately 1.5TB of cache space for a 4 node cluster), I did most of my testing using 100GB SSDs. They seemed adequate in that they were not showing a significant amount of cache eviction, but I wanted to leverage my purchasing opportunity to get larger drives. Knowing the best size can be a bit of a mystery until you get things in place, but having a larger cache size allows for a larger working set of data available for future reads, as well as giving head room for the per-VM write-back redundancy setting available.

    An unexpected surprise is how FVP has given me visibility into the one area of I/O monitoring that is traditional very difficult to see;  I/O patterns. See Iometer. As good as you want to make it.  Understanding this element of your I/O needs is critical, and the analytics in FVP has helped me discover some very interesting things about my I/O patterns that I will surely be investigating in the near future.

    In the read-caching world, the saying goes that the fastest storage I/O is the I/O the array never will see. Well, with write caching, it eventually needs to be de-staged to the array.  While FVP will improve delivery of storage to the array by absorbing the I/O spikes and turning random writes to sequential writes, the I/O will still eventually have to be delivered to the backend storage. In a more write intensive environment, if the delta between your fast flash and your slow storage is significant, and your duty cycle of your applications driving the I/O is also significant, there is a chance it might not be able to keep up.  It might be a corner case, but it is possible.

    What’s next
    I’ll be posting more specifics on how running PernixData FVP has helped our environment.  So, is it really "disruptive" technology?  Time will ultimately tell.  But I chose to not purchase an array along with new SAN switchgear because of it.  Using FVP has lead to less traffic on my arrays, with higher throughput and lower read and write latencies for my VMs.  Yeah, I would qualify that as disruptive.

     

    Helpful Links

    Frank Denneman – Basic elements of the flash virtualization platform – Part 1
    http://frankdenneman.nl/2013/06/18/basic-elements-of-the-flash-virtualization-platform-part-1/

    Frank Denneman – Basic elements of the flash virtualization platform – Part 2
    http://frankdenneman.nl/2013/07/02/basic-elements-of-fvp-part-2-using-own-platform-versus-in-place-file-system/

    Frank Denneman – FVP Remote Flash Access
    http://frankdenneman.nl/2013/08/07/fvp-remote-flash-access/

    Frank Dennaman – Design considerations for the host local FVP architecture
    http://frankdenneman.nl/2013/08/16/design-considerations-for-the-host-local-architecture/

    Satyam Vaghani introducing PernixData FVP at Storage Field Day 3
    http://www.pernixdata.com/SFD3/

    Write-back deepdive by Frank and Satyam
    http://www.pernixdata.com/files/wb-deepdive.html

    Iometer. As good as you want to make it.

    Most know Iometer as the go-to synthetic I/O measuring tool used to simulate real workload conditions. Well, somewhere, somehow, someone forgot the latter part of that sentence, which is why it ends up being so misused and abused.  How many of us have seen a storage solution delivering 6 figure IOPS using Iometer, only to find that they are running a 100% read, 512 byte 100% sequential access workload simulation.  Perfect for the two people on the planet that those specifications might apply to.  For the rest of us, it doesn’t help much.  So why would they bother running that sort of unrealistic test?   Pure, unapologetic number chasing.

    The unfortunate part is that sometimes this leads many to simply dismiss Iometer results.  That is a shame really, as it can provide really good data if used in the correct way.  Observing real world data will tell you a lot of things, but the sporadic nature of real workloads make it difficult to use for empirical measurement – hence the need for simulation.

    So, what are the correct settings to use in Iometer?  The answer is completely dependent on what you are trying to accomplish.  The race for a million IOPS by your favorite storage vendor really means nothing if their is no correlation between their simulated workload, and your real workload.  Maybe IOPS isn’t even an issue for you.  Perhaps your applications are struggling with poor latency.  The challenge is to emulate your environment with a synthetic workload that helps you understand how a potential upgrade, new array, or optimization might be of benefit.

    The mysteries of workloads
    Creating a synthetic workload representing your real workload assumes one thing; that you know what your real workload really is. This can be more challenging that one might think, as many storage monitoring tools do not help you understand the subtleties of patterns to the data that is being read or written.

    Most monitoring tools tend to treat all I/O equally. By that I mean, if over a given period of time, say you have 10 million I/Os occur.  Let’s say your monitoring tells you that you average 60% reads and 40% writes. What is not clear is how many of those reads are multiple reads of the same data or completely different, untouched data. It also doesn’t tell you if the writes are overwriting existing blocks (which might be read again shortly thereafter) or generating new data. As more and more tiered storage mechanisms comes into play, understanding this aspect of your workload is becoming extremely important. You may be treating your I/Os equally, but the tiered storage system using sophisticated caching algorithms certainly do not.

    How can you gain more insight?  Use every tool at your disposal.  Get to know your applications, and the duty cycles around them. What are your peak hours? Are they in the middle of the day, or in the middle of the night when backups are running?

    Suggestions on Iometer settings
    You may find that the settings you choose for Iometer yields results from your shared storage that isn’t nearly as good as you thought.  But does it matter?  If it is an accurate representation of your real workload, not really.  What matters is if are you able to deliver the payload from point a to point b to meet your acceptance criteria (such as latency, throughput, etc.).  The goal would be to represent that in a synthetic workload for accurate measurement and comparison.

    With that in mind, here are some suggestions for the next time you set up those Iometer runs.

    1.  Read/write ratio.  Choose a realistic read/write ratio representing your workload. With writes, RAID penalties can hurt your effective performance by quite a bit, so if you don’t have an idea of what this ratio currently is, it’s time for you to find out.

    2.  Transfer request size. Is your payload the size of a ball bearing, or a bowling ball? Applications and operating systems vary on what size is used. Use your monitoring systems to best determine what your environment consists of.

    3.  Disk size.  Use the "maximum disk size" in multiples of 1048576, which is a 1GB file. Throwing a bunch of zeros in there might fill up your disk with Iometer’s test file. Depending on your needs, a setting of 2 to 20 GB might be a good range to work with.

    4.  Number of outstanding I/Os.  This needs to be high enough so that the test can keep sending I/O requests to it as the storage is fulfilling requests to it. A setting of 32 is pretty common.

    5.  Alignment of I/O. Many of the standard Iometer ICF files you find were built for physical drives. It has the "Align I/Os on:" setting to "Sector boundaries"   When running tests on a storage array, this can lead to inconsistent results, so it is best to align on 4K or 512 bytes.

    6.  Ramp up time. Offer at least a minute of ramp up time.

    7.  Run time. Some might suggest running simulations long enough to exhaust all caching, so that you can see "real" throughput.  While I understand the underlying reason for this statement, I believe this is missing the point.  Caching is there in the first place to take advantage of a working set of warm and cold data, bursts, etc. If you have a storage solution that satisfies the duty cycles that exists in your environment, that is the most important part.

    8.  Number of workers.  Let this spawn automatically to the number of logical processors in your VM. It might be overkill in many cases because of terrible multithreading abilities of most applications, but its a pretty conventional practice.

    9.  Multiple Iometer instances.  Not really a setting, but more of a practice.  I’ve found running multiple tests a way to better understand how a storage solution will react under load as opposed to on it’s own. It is shared storage after all.

    Disclaimer
    If you were looking for this to be the definitive post on Iometer, that isn’t what I was shooting for.  There are many others who are much more qualified to speak to the nuances of Iometer than me.  What I hope to do is to offer a little practical perspective on it’s use, and how it can help you.  So next time you run Iometer, think about what you are trying to accomplish, and let go of the number chasing.  Understand your workloads, and use the tool to help you improve your environment.

    Hunting down unnecessary I/O before you buy that next storage solution

    Are legacy processes and workflows sabotaging your storage performance? If you are on the verge of committing good money for more IOPS, or lower latency, it might be worth taking a look at what is sucking up all of those I/Os.

    In my previous posts about improving the performance of our virtualized code compiling systems, it was identified that storage performance was a key factor in our ability to leverage our scaled up compute resources. The classic response to this dilemma has been to purchase faster storage. While that might be a part of the ultimate solution, there is another factor worth looking into; legacy processes, and how they might be impacting your environment.

    Even though new technologies are helping deliver performance improvements, one constant is that traditional, enterprise class storage is expensive. Committing to faster storage usually means committing large chunks of dollars to the endeavor. This can be hard to swallow at budget time, or doesn’t align well with the immediate needs. And there can certainly be a domino effect when improving storage performance. If your fabric cannot support a fancy new array, the protocol type, or speed, get ready to spend even more money.

    Calculated Indecision
    In the optimization world, there is an approach called "delay until the last responsible moment" (LRM). Do not mistake this for a procrastinator’s creed of "kicking the can." It is a pretty effective, Agile-esque strategy in hedging against poor, or premature purchasing decisions to, in this case, the rapidly changing world of enterprise infrastructures. Even within the last few years, some companies have challenged traditional thinking when it comes to how storage and compute is architected. LRM helps with this rapid change, and has the ability to save a lot of money in the process.

    Look before you buy
    Writes are what you design around and pay big money for, so wouldn’t it be logical to look at your infrastructure to see if legacy processes are undermining your ability to deliver I/O? That is the step I took in an effort to squeeze out every bit of performance that I could with my existing arrays before I commit to a new solution. My quick evaluation resulted in this:

    • Using array based snapshotting for short term protection was eating up way too much capacity; 25 to 30TB. That is almost half of my total capacity, and all for a retention time that wasn’t very good. How does capacity relate to performance? Well, if one doesn’t need all of that capacity for snapshot or replica reserves, one might be able to run at a better performing RAID level. Imagine being able to cut the write penalty by 2 to 3 times if you were currently running RAID levels focused on capacity. For a write-intensive environment like mine, that is a big deal.
    • Legacy I/O intensive applications and processes identified. What are they, can they be adjusted, or are they even needed anymore.

    Much of this I didn’t need to do a formal analysis of. I knew the environment well enough to know what needed to be done. Here is what the plan of action has consisted of.

    • Ditch the array based snapshots and remote replicas in favor of Veeam. This is something that I wanted to do for some time. Local and remote protection is now the responsibility of some large Synology NAS units as the backup target for Veeam. Everything about this combination has worked incredibly well. For those interested, I’ll be writing about this arrangement in the near future.
    • Convert existing Guest Attached Volumes to native VMDKs. My objective with this is to make Veeam see the data so that it can protect it. Integrated, compressed and deduped. What it does best.
    • Reclaim all of the capacity gained from no longer using snaps and replicas, and rebuild one of the arrays from RAID 50, to RAID 10. This will cut the write penalty from 4, to 2.
    • Adjust or eliminate legacy I/O intensive apps.

    The Culprits
    Here were the biggest influencers of legacy I/O intensive applications (“legacy” after the incorporation of Veeam).  Total time per day shown below, and may reflect different backup frequencies.

    Source:  Legacy SharePoint backup solution
    Cost:  300 write IOPS for 8 hours per day
    Action:  This can be eliminated because of Veeam

    Source:  Legacy Exchange backup solution
    Cost:  300 write IOPS for 1 hour per day
    Action:  This can be eliminated because of Veeam

    Source:  SourceCode (SVN) hotcopies and dumps
    Cost:  200-700 IOPS for 12 hours per day.
    Action:  Hot copies will be eliminated, but SVN dumps will be redirected to an external target.  An optional method of protection that in a sense is unnecessary, but source code is the lifeblood of a software company, so it is worth the overhead right now.

    Source:  Guest attached Volume Copies
    Cost:  Heavy read IOPS on mounted array snapshots when dumping to external disk or tape.
    Action:  Guest attached volumes will be converted to native VMDKs so that Veeam can see and protect the data.

    Notice the theme here? Much of the opportunities for improvement in reducing I/O had to do with dealing with legacy “in-guest” methods of protecting the data.  Moving to a hypervisor centric backup solution like Veeam has also reinforced a growing feeling I’ve had about storage array specific features that focus on data protection.  I’ve grown to be disinterested in them.  Here are a few reasons why.

    • It creates an indelible tie between your virtualization infrastructure, protection, and your storage. We all love the virtues of virtualizing compute. Why do I want to make my protection mechanisms dependent on a particular kind of storage? Abstract it out, and it becomes way easier.
    • Need more replica space? Buy more arrays. Need more local snapshot space? Buy more arrays. You end up keeping protection on pretty expensive storage
    • Modern backup solutions protect the VMs, the applications, and the data better. Application awareness may have been lacking years ago, but not anymore.
    • Vendor lock-in. I’ve never bought into this argument much, mostly because you end up having to make a commitment at some point with just about every financial decision you make. However, adding more storage arrays can eat up an entire budget in an SMB/SME world. There has to be a better way.
    • Complexity. You end up having a mess of methods of how some things are protected, while other things are protected in a different way. Good protection often comes in layers, but choosing a software based solution simplifies the effort.

    I used to live by array specific tools for protecting data. It was all I had, and they served a very good purpose.  I leveraged them as much as I could, but in hindsight, they can make a protection strategy very complex, fragile, and completely dependent on sticking with that line of storage solutions. Use a solution that hooks into the hypervisor via the vCenter API, and let it do the rest.  Storage vendors should focus on what they do best, which is figuring out ways to deliver bigger, better, and faster storage.

    What else to look for.
    Other possible sources that are robbing your array of I/Os:

    • SQL maintenance routines (dumps, indexing, etc.). While necessary, you may choose to run these at non peak hours.
    • Defrags. Surely you have a GPO shutting off this feature on all your VMs, correct? (hint, hint)
    • In-guest legacy anything. Traditional backup agents are terrible. Virus scan’s aren’t much better.
    • User practices.  Don’t be surprised if you find out some department doing all sorts of silly things that translates into heavy writes.  (.e.g. “We copy all of our data to this other directory hourly to back it up.”)
    • Guest attached volumes. While they can be technically efficient, one would have to rely on protecting these in other ways because they are not visible from vCenter. Often this results in some variation of making an array based snapshot available to a backup application. While it is "off-host" to the production system, this method takes a very long time, whether the target is disk or tape.

    One might think that eventually, the data has to be committed to external disk or tape anyway, so what does it matter.  When it is file level backups, it matters a lot.  For instance, committing 9TB of guest attached volume data (millions of smaller files) directly to tape takes nearly 6 days to complete.  Committing 9TB of Veeam backups to tape takes just a little over 1 day.

      The Results

    So how much did these steps improve the average load on the arrays? This is a work in progress, so I don’t have the final numbers yet. But with each step, contention is decreased on my arrays, and my protection strategy has become several orders of magnitude simpler in the process.

    With all of that said, I will be addressing my storage performance needs with *something* new. What might that be?  Stay tuned.

    Configuring a VM for SNMP monitoring using Cacti

    There are a number of things that I don’t miss with old physical infrastructures.  One near the top of the list is a general lack of visibility for each and every system.  Horribly underutilized hardware running happily along side overtaxed or misconfigured systems, and it all looked the same.  Fortunately, virtualization has changed much of that nonsense, and performance trending data of VMs and hosts are a given.

    Partners in the VMware ecosystem are able to take advantage of the extensibility by offering useful tools to improve management and monitoring of other components throughout the stack.  The Dell Management Plug-in for VMware vCenter is a great example of that. It does a good job of integrating side-band management and event driven alerting inside of vCenter.  However, in many cases you still need to look at performance trending data of devices that may not inherently have that ability on it’s own.  Switchgear is a great example of a resource that can be left in the dark.  SNMP can be used to monitor switchgear and other types of devices, but it’s use is almost always absent in smaller environments.  But there are simple options to help provide better visibility even for the smallest of shops.  This post will provide what you need to know to get started.

    In this example, I will be setting up a general purpose SNMP management system running Cacti to monitor the performance of some Dell PowerConnect switchgear.  Cacti leverages RRDTool’s framework to deliver time based performance monitoring and graphing.  It can monitor a number of different types of systems supporting SNMP, but switchgear provides the best example that most everyone can relate to.  At a very affordable price (free), Cacti will work just fine in helping with these visibility gaps.  

    Monitoring VM
    The first thing to do is to build a simple Linux VM for the purpose of SNMP management.  One would think there would be a free Virtual Appliance out on the VMware Virtual Appliance Marektplace for this purpose, but if there is, I couldn’t find it.  Any distribution will work, but my instructions will cater toward the Debian distributions – particularly Ubuntu, or a Ubuntu clone like Linux Mint (my personal favorite).  Set it for 1vCPU and 512 MB of RAM.  Assign it a static address on your network management VLAN (if you have one).  Otherwise, your production LAN will be fine.  While it is a single purpose built VM, you still have to live with it, so no need to punish yourself by leaving it bare bones.  Go ahead and install the typical packages (e.g. vim, ssh, ntp, etc.) for convenience or functionality.

    Templates are an option that extend the functionality in Cacti.  In the case of the PowerConnect switches, the template will assist in providing information on CPU, memory, and temperature.  A template for the PowerConnect 6200 line of switches can be found here.  The instructions below will include how to install this.

    Prepping SNMP on the switchgear

    In the simplest of configurations (which I will show here), there really isn’t much to SNMP.  For this scenario, one will be providing read-only access of SNMP via a shared community name. The monitoring VM will poll these devices and update the database accordingly.

    If your switchgear is isolated, as your SAN switchgear might be, then there are a few options to make the switches visible in the right way. Regardless of what option you use, the key is to make sure that your iSCSI storage traffic lives on a different VLAN from your management interface of the device.  I outline a good way to do this at “Reworking my PowerConnect 6200 switches for my iSCSI SAN

    There are a couple of options in connecting the isolated storage switches to gather SNMP data: 

    Option 1:  Connect a dedicated management port on your SAN switch stack back to your LAN switch stack.

    Option 2:  Expose the SAN switch management VLAN using a port group on your iSCSI vSwitch. 

    I prefer option 1, but regardless, if it is iSCSI switches you are dealing with, you will want to make sure that management traffic is on a different VLAN than your iSCSI traffic to maintain the proper isolation of iSCSI traffic. 

    Once the communication is in place, just make a few changes to your PowerConnect switchgear.  Note that community names are case sensitive, so decide on a name, and stick with it.

    enable

    configure

    snmp-server location "Headquarters"

    snmp-server contact "IT"

    snmp-server community mycompany ro ipaddress 192.168.10.12

    Monitoring VM – Pre Cacti configuration
    Perform the following steps on the VM you will be using to install Cacti.

    1.  Install and configure SNMPD

    apt-get update

    mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf.old

    2.  Create a new /etc/snmp/snmpd.conf with the following contents:

    rocommunity mycompanyt

    syslocation Headquarters

    syscontact IT

    3.  Edit /etc/default/snmpd to allow snmpd to listen on all interfaces and use the config file.  Comment out the first line below and replace it with the second line:

    SNMPDOPTS=’-Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid 127.0.0.1′

    SNMPDOPTS=’-Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid -c /etc/snmp/snmpd.conf’

    4.  Restart the snmpd daemon.

    sudo /etc/init.d/snmpd restart

    5.  Install additional perl packages:

    apt-get install libsnmp-perl

    apt-get install libnet-snmp-perl

    Monitoring VM – Cacti Installation
    6.  Perform the following steps on the VM you will be using to install Cacti.

    apt-get update

    apt-get install cacti

    During the installation process, MySQL will be installed, and the installation will ask what you would like the MySQL root password to be. Then the installer will ask what you would like cacti’s MySQL password to be.  Choose passwords as desired.

    Now, the Cacti installation is available via http://[cactiservername]/cacti with a username and password of "admin" Cacti will now ask you to change the admin password.  Choose whatever you wish.

    7.  Download PowerConnect add-on from http://docs.cacti.net/usertemplate:host:dell:powerconnect:62xx and unpack both zip files

    8.  Import the host template via the GUI interface.  Log into Cacti, and go to Console > Import Templates, select the desired file (in this case, cacti_host_template_dell_powerconnect_62xx_switch.xml), and click Import.

    9.  Copy the 62xx_cpu.pl script into the Cacti script directory on server (/usr/share/cacti/site/scripts).  This may need executable permissions.  If you downloaded it to a Windows machine, but need to copy it to the Linux VM, WinSCP works nicely for this.

    10.  Depending on how things were copied, there might be some line endings in the .pl file.  You can clean up that 62xx_cpu.pl file by running the following:

    dos2unix 62xx_cpu.pl

    Using Cacti
    You are now ready to run Cacti so that you can connect and monitor your devices. This example shows how to add the device to Cacti, then monitor CPU and a specific data port on the switch.

    1.  Launch Cacti from your workstation by browsing out to http://[cactiservername]/cacti  and enter your credentials.

    2.  Create a new Graph Tree via Console > Graph Trees > Add.  You can call it something like “Switches” then click Create.

    3.  Create a new device via Console > Devices > Add.  Give it a friendly description, and the host name of the device.  Enter the SNMP Community name you decided upon earlier.  In my example above, I show the community name as being “mycompany” but choose whatever fits.  Remember that community names are case sensitive.

    4.  To create a graph for monitoring CPU of the switch, click Console > Create New Graphs.  In the host box, select the device you just added.   In the “Create” box, select “Dell Powerconnect 62xx – CPU” and click Create to complete.

    5.  To create a graph for monitoring a specific Ethernet port, click Console > Create New Graphs.  In the Host box, select the device you just added.  Put a check mark next to the port number desired, and select In/Out bits with total bandwidth.  Click Create > Create to complete. 

    6.  To add the chart to the proper graph tree, click Console > Graph Management.  Put a check mark next to the Graphs desired, and change the “Choose and action” box to “Place on a Tree [Tree name]

    Now when you click on Graphs, you will see your two items to be monitored

    image

    By clicking on the magnifying glass icon, or by the “Graph Filters” near the top of the screen, one can easily zoom or zoom out to various sampling periods to suite your needs.

    Conclusion
    Using SNMP and a tool like Cacti can provide historical performance data for non virtualized devices and systems in ways you’ve grown accustomed to in vSphere environments.  How hard are your switches running?  How much internet bandwidth does your organization use?  This will tell you.  Give it a try.  You might be surprised at what you find.

    Vroom! Scaling up Virtual Machines in vSphere to meet performance requirements–Part 2

    In my original post, Scaling up Virtual Machines in vSphere to meet performance requirements, I described a unique need for the Software Development Team to have a lot of horsepower to improve the speed of their already virtualized code compiling systems.  My plan of attack was simple.  Address the CPU bound systems with more powerful blades, and scale up the VMs accordingly.  Budget constraints axed the storage array included in my proposal, and also kept this effort limited to keeping the same number of vSphere hosts for the task. 

    The four new Dell M620 blades arrived and were quickly built up with vSphere 5.0 U2 (Enterprise Plus Licensing) with the EqualLogic MEM installed.  A separate cluster was created to insure all build systems were kept separate, and so that I didn’t have to mask any CPU features to make them work with previous generation blades.  Next up was to make sure each build VM was running VM hardware level 8.  Prior to vSphere 5, the guest VM was unaware of the NUMA architecture behind it.  Without the guest OS understanding memory locality, one could introduce problems into otherwise efficient processes.  While I could find no evidence that the compilers for either OS are NUMA aware, I knew the Operating Systems understood NUMA.

    Each build VM has a separate vmdk for its compiling activities.  Their D:\ drive (or /home for Linux) is where the local sandboxes live.  I typically have this second drive on a “Virtual Device Node” changed to something other than 0:x.  This has proven beneficial in previous performance optimization efforts.

    I figured the testing would be somewhat trivial, and would be wrapped up in a few days.  After all, the blades were purchased to quickly deliver CPU power for a production environment, and I didn’t want to hold that up.  But the data the tests returned had some interesting surprises.  It is not every day that you get to test 16vCPU VMs for a production environment that can actually use the power.  My home lab certainly doesn’t allow me to do this, so I wanted to make this count. 

    Testing
    The baseline tests would be to run code compiling on two of the production build systems (one Linux, and the other Windows) on an old blade, then the same set with the same source code on the new blades.  This would help in better understanding if there were speed improvements from the newer generation chips.  Most of the existing build VMs are similar in their configuration.  The two test VMs will start out with 4vCPUs and 4GB of RAM.  Once the baselines were established, the virtual resources of each VM would be dialed up to see how they respond.  The systems will be compiling the very same source code.

    For the tests, I isolated each blade so they were not serving up other needs.  The test VMs resided in an isolated datastore, but lived on a group of EqualLogic arrays that were part of the production environment.  Tests were run at all times during the day and night to simulate real world scenarios, as well as demonstrate any variability in SAN performance.

    Build times would be officially recorded in the Developers Build Dashboard.  All resources would be observed in vSphere in real time, with screen captures made of things like CPU, disk and memory, and dumped into my favorite brain-dump application; Microsoft OneNote.  I decided to do this on a whim when I began testing, but it immediately proved incredibly valuable later on as I found myself looking at dozens of screen captures constantly.

    The one thing I didn’t have  time to test was the nearly limitless possible scenarios in which multiple monster VMs were contending for CPUs at the same time.  But the primary interest for now was to see how the build systems scaled.  I would then make my sizing judgments off of the results, and off of previous experience with smaller build VMs on smaller hosts. 

    The [n/n] title of each test result column indicates the number of vCPUs followed by the amount of vRAM associated.  Stacked bar graphs show a lighter color at the top of each bar.  This indicates the difference in time between the best result and the worst result.  The biggest factor of course would be the SAN.

    Bottleneck cat and mouse
    Performance testing is a great exercise for anyone, because it helps challenge your own assumptions on where the bottleneck really is.  No resource lives as an island, and this project showcased that perfectly.  Improving the performance of these CPU bound systems may very well shift the contention elsewhere.  However, it may expose other bottlenecks that you were not aware of, as resources are just one element of bottleneck chasing.  Applications and the Operating Systems they run on are not perfect, nor are the scripts that kick them off.  Keep this in mind when looking at the results.

    Test Results – Windows
    The following are test results are with Windows 7, running the Visual Studio Compiler.  Showing three generations of blades.  The Dell M600 (HarperTown), M610, (Nehalem), and M620 (SandyBridge). 

    Comparing a Windows code compile across blades without any virtual resource modifications.

    image

    Yes, that is right.  The old M600 blades were that terrible when it came to running VMs that were compiling.  This would explain the inconsistent build time results we had seen in the past.  While there was improvement in the M620 over the M610s, the real power of the M620s is that they have double the number of physical cores (16) than the previous generations.  Also noteworthy is the significant impact the SAN (up to 50%) was affecting the end result. 

    Comparing a Windows code compile on new blade, but scaling up virtual resources

    image

    Several interesting observations about this image (above). 

    • When the SAN can’t keep up, it can easily give back the improvements made in raw compute power.
    • Performance degraded when compiling with more than 8vCPUs.  It was so bad that I quit running tests when it became clear they weren’t compiling efficiently (which is why you do not see SAN variability when I started getting negative returns)
    • Doubling the vCPUs from 4 to 8, and the vRAM from 4 to 8 only improved the build time by about 30%, even though the compile showed nearly perfect multithreading (shown below) and 100% CPU usage.  Why the degradation?  Keep reading!

    image

      On a different note, it was becoming quite clear already I needed to take a little corrective action in my testing.  The SAN was being overworked at all times of the day, and it was impacting my ability to get accurate test results in raw compute power.  The more samples I ran the more consistent the inconsistency was.  Each of the M620’s had a 100GB SSD, so I decided to run the D:\ drive (where the build sandbox lives) on there to see a lack of storage contention impacted times.  The purple line indicates the build times of the given configuration, but with the D:\ drive of the VM living on the local SSD drive.

    image

    The difference between a slow run on the SAN and a run with faster storage was spreading.

    Test Results – Linux
    The following are test results are with Linux, running the GCC compiler. Showing three generations of blades.  The Dell M600 (HarperTown), M610, (Nehalem), and M620 (SandyBridge).

    Comparing a Linux code compile across blades without any virtual resource modifications.

    image

    The Linux compiler showed a a much more linear improvement, along with being faster than it’s Windows counterpart.  Noticeable improvements across the newer generations of blades, with no modifications in virtual resources.  However, the margin of variability from the SAN is a concern.

    Comparing a Linux code compile on new blade, but scaling up virtual resources

    image

    At first glance it looks as if the Linux GCC compiler scales up well, but not in a linear way.  But take a look at the next graph, where similar to the experiment with the Windows VM, I changed the location of the vmdk file used for the /home drive (where the build sandbox lives) over to the local SSD drive.

    image

    This shows very linear scalability with Linux and a GCC compiler.  A 4vCPU with 4GB RAM was able to compile 2.2x faster with 8vCPUs and 8GB of RAM.  Total build time was just 12 minutes.  Triple the virtual resources to 12/12, and it is an almost linear 2.9x faster than the original configuration.  Bump it up to 16vCPUs, and diminishing returns begin to show up, where it is 3.4x faster than the original configuration.  I suspect crossing NUMA nodes and the architecture of the code itself was impacting this a bit.  Although, don’t lose sight of the fact that a  build that could take up to 45 minutes on the old configuration took only 7 minutes with 16vCPUs.

    The big takeaways from these results are the differences in scalability in compilers, and how overtaxed the storage is.  Lets take a look at each one of these.

    The compilers
    Internally it had long been known that Linux compiled the same code faster than Windows.  Way faster.  But for various reasons it had been difficult to pinpoint why.  The data returned made it obvious.  It was the compiler.

    image

    While it was clear that the real separation in multithreaded compiling occurred after 8vCPUs, the real problem with the Windows Visual Studio compiler begins after 4vCPUs.  This surprised me a bit because when monitoring the vCPU usage (in stacked graph format) in vCenter, it was using every CPU cycle given to it, and multithreading quite evenly.  The testing used Visual Studio 2008, but I also tested newer versions of Visual Studio, with nearly the same results. 

    Storage
    The original proposal included storage to support the additional compute horsepower.  The existing set of arrays had served our needs very well, but were really targeted at general purpose I/O needs with a focus of capacity in mind.  During the budget review process, I had received many questions as to why we needed a storage array.  Boiling it down to even the simplest of terms didn’t allow for that line item to survive the last round of cuts.  Sure, there was a price to pay for the array, but the results show there is a price to pay for not buying the array.

    I knew storage was going to be an issue, but when contention occurs, its hard to determine how much of an impact it will have.  Think of a busy freeway, where throughput is pretty easy to predict up to a certain threshold.  Hit critical mass, and predicting commute times becomes very difficult.  Same thing with storage.  But how did I know storage was going to be an issue?  The free tool provided to all Dell EqualLogic customers; SAN HQ.  This tool has been a trusted resource for me in the past, and removes ALL speculation when it comes to historical usage of the arrays, and other valuable statistics.  IOPS, read/write ratios, latency etc.  You name it. 

    Historical data of Estimated Workload over the period of 1 month

    image

    Historical data of Estimated Workload over the period of 12 months

    image

    Both images show that with the exception of weekends, the SAN arrays are maxed out to 100% of their estimated workload.  The overtaxing shows up on the lower part of each screen capture the read and writes surpassing the brown line indicating the estimated maximum IOPS of the array.  The 12 month history showed that our storage performance needs were trending upward.

    Storage contention and how it relates to used CPU cycles is also worth noting.  Look at how inadequate storage I/O influences compute. The image below shows the CPU utilization for one of the Linux builds using 8vCPUs and 8GB RAM when the /home drive was using fast storage (the local SSD on the vSphere host)

    image

    Now look at the same build when running  against a busy SAN array.  It completely changes the CPU usage profile, and thus took 46% longer to complete.

    image

    General Observations and lessons

    • If you are running any hosts using pre-Nehalem architectures, now is a good time to question why. They may not be worth wasting vSphere licensing on. The core count and architectural improvements on the newer chips put the nails in the coffin on these older chips.
    • Storage Storage Storage. If you have CPU intensive operations, deal with the CPU, but don’t neglect storage. The test results above demonstrate how one can easily give back the entire amount of performance gains in CPU by not having storage performance to support it.
    • Giving a Windows code compiling VM a lot of CPU, but not increasing the RAM seemed to make the compiler trip on it’s own toes.  This makes sense, as more CPUs need more memory addresses to work with. 
    • The testing showcased another element of virtualization that I love. It often helps you understand problems that you might otherwise be blind to. After establishing baseline testing, I noticed some of the Linux build systems were not multithreading the way they should. Turns out it was some scripting errors by our Developers. Easily corrected.

    Conclusion
    The new Dell M620 blades provided an immediate performance return.  All of the build VMs have been scaled up to 8vCPUs and 8GB of RAM to get the best return while providing good scalability of the cluster.  Even with that modest doubling of virtual resources, we now have nearly 30 build VMs that when storage performance is no longer an issue, will run between 4 and 4.3 times faster than the same VMs on the old M600 blades.  The primary objective moving forward is to target storage that will adequately support these build VMs, as well as looking into ways to improve multithreaded code compiling in Windows.

    Helpful Links
    Kitware blog post on multithreaded code compiling options

    http://www.kitware.com/blog/home/post/434

    Using a Synology NAS as an emergency backup DNS server for vSphere

    Powering up a highly virtualized infrastructure can sometimes be an interesting experience.  Interesting in that “crossing-the-fingers” sort of way.  Maybe it’s an outdated run book, or an automated power-on of particular VMs that didn’t occur as planned.  Sometimes it is nothing more than a lack of patience between each power-on/initialization step.  Whatever the case, if it is a production environment, there is at least a modest amount of anxiety that goes along with this task.  How often does this even happen?  For those who have extended power outages, far too often.

    One element that can affect power-up scenarios is availability of DNS.  A funny thing happens though when everything is virtualized.  Equipment that powers the infrastructure may need DNS, but DNS is inside of the infrastructure that needs to be powered up.  A simple way around this circular referencing problem is to have another backup DNS server that supplements your normal DNS infrastructure.  This backup DNS server acts as a slave to the server having authoritative control for that DNS zone, and would handle at minimum recursive DNS queries for critical infrastructure equipment, and vSphere hosts.  While all production systems would use your normal primary and secondary DNS, this backup DNS server could be used as the secondary name server a few key components:

    • vSphere hosts
    • Server and enclosure Management for IPMI or similar side-band needs
    • Monitoring nodes
    • SAN components (optional)
    • Switchgear (optional)

    vSphere certainly isn’t as picky as it once was when it comes to DNS.  Thank goodness.  But guaranteeing immediate availability of name resolution will help your environment during these planned, or unplanned power-up scenarios.  Those that do not have to deal with this often have at least one physical Domain Controller with integrated DNS in place.  That option is fine for many organizations, and certainly accomplishes more than just availability of name resolution.  AD design is a pretty big subject all by itself, and way beyond the scope of this post.  But running a spare physical AD server isn’t my favorite option for a number of different reasons, especially for smaller organizations.  Some folks way smarter than me might disagree with my position.  Here are a few reasons why it isn’t my preferred option.

    • One may be limited in Windows licensing
    • There might be limited availability of physical enterprise grade servers.
    • One may have no clue as to if, or how a physical AD server might fit into their DR strategy.

    As time marches on, I also have a feeling that this approach will be falling out of favor anyway.  During a breakout session for optimizing virtualized AD infrastructures at the 2012 VMWorld, it was interesting to hear that the VMware Mothership still has some physical AD servers running the PDCe role.  However, they were actively in the process of eliminating this final, physical element, and building recommendations around doing so.  And lets face it, a physical DC doesn’t align with the vision of elastic, virtualized datacenters anyway.

    To make DNS immediately available during these power-up scenarios, the prevailing method in the “Keep it Simple Stupid” category has been running a separate physical DNS server.  Either a Windows member server with a DNS role, or a Linux server with BIND.  But it is a physical server, and us virtualization nuts hate that sort of thing.  But wait!  …There is one more option.  Use your Synology NAS as an emergency backup DNS server.  The intention of this is not to supplant your normal DNS infrastructure. it’s simply to help a few critical pieces of equipment start up.

    The latest version of Synology’s DSM (4.1) comes with a beta version of a DNS package.  It is pretty straight forward, but I will walk you through the steps of setting it up anyway.

    1.  Verify that your Windows DNS servers allow to transfer to the IP address of the NAS.  Jump into the Windows Server DNS MMC snap in, highlight the zone you want to setup a zone transfer to, and click properties.  Add or verify that the settings allow a zone transfer to the new slave server

    2.  In the Synology DSM, open the Package Center, and install DNS package.

    3.  Enable Synology DSM Firewall to allow for DNS traffic.  In the Synology DSM, open the Control Panel > Firewall.  Highlight the interface desired, and click Create.  Choose “Select from a built in list of applications” and choose “DNS Server”  Save the rule, and exit out of the Firewall application.

    4.  Open up “DNS Server” from the Synology launch menu.

    image

    5.  Click on “Zones” and click Create > Slave Zone.  Choose a “Forward Zone” type, and select the desired domain name, and Master DNS server

    image

    6.  Verify the population of recourse records by selecting the new zone, clicking Edit > Resource Records.

    image

    7.  If you want, or need to have this forward DNS requests, enable the forwarders checkbox. (In my Home Lab, I enable this.  In my production environment, I do not)

    image

    8.  Complete the configuration, and test with a client using this IP address only for DNS, simply to verify that it is operating correctly.  Then, go back and tighten up some of the security mechanisms as you see fit.  Once that is completed, jump back into your ESXi hosts (and any other equipment) and configure your secondary DNS to use this server.

    image

    In my case, I had my Synology NAS to try this out in my home lab, as well as newly deployed unit at work (serving the primary purpose of a Veeam backup target).  In both cases, it has worked exactly as expected, and allowed me to junk an old server at work running BIND.

    If the NAS lived on an isolated storage network that wasn’t routable, then this option wouldn’t work, but if you have one living on a routable network somewhere, then it’s a great option.  The arrangement simplifies the number of components in the infrastructure while insuring service availability.

    Even if you have multiple internal zones, you may want to have this slave server only handling your primary zone.  No need to make it more complicated than it needs to be.  You also may choose to set up the respective reverse lookup zone as a slave.  Possible, but not necessary for this purpose.

    There you have it.  Nothing ground breaking, but a simple way to make a more resilient environment during power-up scenarios.

    Helpful Links:

    VMWorld 2012.  Virtualizing Active Directory Best Practices (APP-BCA1373).  (Accessible by VMWorld attendees only)
    http://www.vmworld.com/community/sessions/2012/

    Follow

    Get every new post delivered to your Inbox.

    Join 737 other followers