April 30, 2010 22 Comments
Behind every great virtualized infrastructure is a great SAN to serve everything up. I’ve had the opportunity to work with the Dell/EqualLogic iSCSI array for a while now, taking advantage of all of the benefits that the iSCSI based SAN array offers. One feature that I haven’t been able to use is the built in replication feature. Why? I only had one array, and I didn’t have a location offsite to replicate to.
I suppose the real “part 1” of my replication project was selling the idea to the Management Team. When it came to protecting our data and the systems that help generate that data, it didn’t take long for them to realize it wasn’t a matter of what we could afford, but how much we could afford to lose. Having a building less than a mile away burn to the ground also helped the proposal. On to the fun part; figuring out how to make all of this stuff work.
Of the many forms of replication out there, the most obvious one for me to start with is native SAN to SAN replication. Why? Well, it’s built right into the EqualLogic PS arrays, with no additional components to purchase, or license keys or fees to unlock features. Other solutions exist, but it was best for me to start with the one I already had.
For companies with multiple sites, replication using EqualLogic arrays seems pretty straight forward. For a company with nothing more than a single site, there are a few more steps that need to occur before the chance to start replicating data can happen.
Decision: Colocation, or hosting provider
One of the first decisions that had to be made was if we wanted our data to be replicated to a Colocation (CoLo) with equipment that we owned and controlled, or with a hosting provider that can provide native PS array space and replication abilities. Most hosting providers use a mixed variety of metering of data replicated to charge. Accurately estimating your replication costs assumes you have a really good understanding of how much data will be replicated. Unfortunately, this is difficult to know until you start replicating. The pricing models of these hosting providers reminded me too much of a cab fare; never knowing what you are going to pay until you get the big bill when you are finished. A CoLo with equipment that we owned fit with our current and future objectives much better. We wanted fixed costs, and the ability to eventually do some hosting of critical services at the CoLo (web, ftp, mail relay, etc.), so it was an easy decision for us.
Our decision was to go with a CoLo facility located in the Westin Building in downtown Seattle. Commonly known as the Seattle Internet Exchange (SIX), this is an impressive facility not only in it’s physical infrastructure, but how it provides peered interconnects directly from one ISP to another. Our ISP uses this facility, so it worked out well to have our CoLo there as well
Bandwidth requirements for our replication was, and is still unknown, but I knew our bonded T1’s probably weren’t going to be enough, so I started exploring other options for higher speed access. The first thing to check was to see if we qualified for a Metro-E or “Ethernet over Copper” (award winner for the dumbest name ever). Metro-E removes the element of T-carrier lines along with any proprietary signaling, and provides internet access of point-to-point connections at Layer 2, instead of Layer 3. We were not close enough to the carriers central office to get adequate bandwidth, and even if we were, it probably wouldn’t scale up to our future needs.
Enter QMOE, or Qwest Metro Optical Ethernet. This solution feeds Layer 2 Ethernet to our building via fiber, offering the benefit of high bandwidth, low latency, that can be scaled easily.
Our first foray using QMOE is running a 30mbps point-to-point feed to our CoLo, and uplinked to the Internet. If we need more later, there is no need to add or change equipment. Just have them turn up the dial, and bill you accordingly.
Topology planning has been interesting to say the least. The best decision here depends on the use-case, and lets not forget, what’s left in the budget.
Two options immediately presented themselves.
1. Replication data from our internal SAN would be routed (Layer 3) to the SAN at the CoLo.
2. Replication data from our internal SAN would travel by way of a VLAN to the SAN at the CoLo.
If my need was only to send replication data to the CoLo, one could take advantage of that layer 2 connection, and send replication data directly to the CoLo, without it being routed. This would mean that it would have to bypass any routers/firewalls in place, and have to be running to the CoLo on it’s own VLAN.
The QMOE network is built off of Cisco Equipment, so in order to utilize any VLANing from the CoLo to the primary facility, you must have Cisco switches that will support their VLAN trunking protocol (VTP). I don’t have the proper equipment for that right now.
In my case, here is a very simplified illustration as to how the two topologies would look:
Topology using VLANs
One may introduce more overhead and less effective throughput when the traffic becomes routed. This is where a WAN optimization solution could come into play. These solutions (SilverPeak, Riverbed, etc.) appear to be extremely good at improving effective throughput across many types of WAN connections. These of course must sit at the correct spot in the path to the destination. The units are often priced on bandwidth speed, and while they are very effective, are also quite an investment. But they work at layer 3, and must in between the source and a router at both ends of the communication path; something that wouldn’t exist on a Metro-E circuit where VLANing was used to transmit replicated data.
The result is that for right now, I have chosen to go with a routed arrangement with no WAN optimization. This does not differ too much from a traditional WAN circuit, other than my latencies should be much better. The next step if our needs are not sufficiently met would be to invest in a couple of Cisco switches, then send replication data over it’s own VLAN to the CoLo, similar to the illustration above.
My original SAN array is an EqualLogic PS5000e connected to a couple of Dell PowerConnect 5424 switches. My new equipment closely mirrors this, but is slightly better; An EqualLogic PS6000e and two PowerConnect 6224 switches. Since both items will scale a bit better, I’ve decided to change out the existing array and switches with the new equipment.
Some Lessons learned so far
If you are changing ISPs, and your old ISP has authoritative control of your DNS zone files, make sure your new ISP has the zone file EXACTLY the way you need it. Then confirm it one more time. Spelling errors and omissions in DNS zone files doesn’t work out very well, especially when you factor in the time it takes for the corrections to propagate through the net. (Usually up to 72 hours, but can feel like a lifetime when your customers can’t get to your website)
If you are going to go with a QMOE or Metro-E circuit, be mindful that you might have to force the external interface on your outermost equipment (in our case, the firewall/router, but could be a managed switch as well) to negotiate to 100mbps full duplex. Auto negotiation apparently doesn’t work to well on many Metro-E implementations, and can cause fragmentation that will reduce your effective throughput by quite a bit. This is exactly what we saw. Fortunately it was an easy fix.
Stay tuned for what’s next…