Behind every great virtualized infrastructure is a great SAN to serve everything up. I’ve had the opportunity to work with the Dell/EqualLogic iSCSI array for a while now, taking advantage of all of the benefits that the iSCSI based SAN array offers. One feature that I haven’t been able to use is the built in replication feature. Why? I only had one array, and I didn’t have a location offsite to replicate to.
I suppose the real “part 1” of my replication project was selling the idea to the Management Team. When it came to protecting our data and the systems that help generate that data, it didn’t take long for them to realize it wasn’t a matter of what we could afford, but how much we could afford to lose. Having a building less than a mile away burn to the ground also helped the proposal. On to the fun part; figuring out how to make all of this stuff work.
Of the many forms of replication out there, the most obvious one for me to start with is native SAN to SAN replication. Why? Well, it’s built right into the EqualLogic PS arrays, with no additional components to purchase, or license keys or fees to unlock features. Other solutions exist, but it was best for me to start with the one I already had.
For companies with multiple sites, replication using EqualLogic arrays seems pretty straight forward. For a company with nothing more than a single site, there are a few more steps that need to occur before the chance to start replicating data can happen.
Decision: Colocation, or hosting provider
One of the first decisions that had to be made was if we wanted our data to be replicated to a Colocation (CoLo) with equipment that we owned and controlled, or with a hosting provider that can provide native PS array space and replication abilities. Most hosting providers use a mixed variety of metering of data replicated to charge. Accurately estimating your replication costs assumes you have a really good understanding of how much data will be replicated. Unfortunately, this is difficult to know until you start replicating. The pricing models of these hosting providers reminded me too much of a cab fare; never knowing what you are going to pay until you get the big bill when you are finished. A CoLo with equipment that we owned fit with our current and future objectives much better. We wanted fixed costs, and the ability to eventually do some hosting of critical services at the CoLo (web, ftp, mail relay, etc.), so it was an easy decision for us.
Our decision was to go with a CoLo facility located in the Westin Building in downtown Seattle. Commonly known as the Seattle Internet Exchange (SIX), this is an impressive facility not only in it’s physical infrastructure, but how it provides peered interconnects directly from one ISP to another. Our ISP uses this facility, so it worked out well to have our CoLo there as well
Bandwidth requirements for our replication was, and is still unknown, but I knew our bonded T1’s probably weren’t going to be enough, so I started exploring other options for higher speed access. The first thing to check was to see if we qualified for a Metro-E or “Ethernet over Copper” (award winner for the dumbest name ever). Metro-E removes the element of T-carrier lines along with any proprietary signaling, and provides internet access of point-to-point connections at Layer 2, instead of Layer 3. We were not close enough to the carriers central office to get adequate bandwidth, and even if we were, it probably wouldn’t scale up to our future needs.
Enter QMOE, or Qwest Metro Optical Ethernet. This solution feeds Layer 2 Ethernet to our building via fiber, offering the benefit of high bandwidth, low latency, that can be scaled easily.
Our first foray using QMOE is running a 30mbps point-to-point feed to our CoLo, and uplinked to the Internet. If we need more later, there is no need to add or change equipment. Just have them turn up the dial, and bill you accordingly.
Topology planning has been interesting to say the least. The best decision here depends on the use-case, and lets not forget, what’s left in the budget.
Two options immediately presented themselves.
1. Replication data from our internal SAN would be routed (Layer 3) to the SAN at the CoLo.
2. Replication data from our internal SAN would travel by way of a VLAN to the SAN at the CoLo.
If my need was only to send replication data to the CoLo, one could take advantage of that layer 2 connection, and send replication data directly to the CoLo, without it being routed. This would mean that it would have to bypass any routers/firewalls in place, and have to be running to the CoLo on it’s own VLAN.
The QMOE network is built off of Cisco Equipment, so in order to utilize any VLANing from the CoLo to the primary facility, you must have Cisco switches that will support their VLAN trunking protocol (VTP). I don’t have the proper equipment for that right now.
In my case, here is a very simplified illustration as to how the two topologies would look:
Topology using VLANs
One may introduce more overhead and less effective throughput when the traffic becomes routed. This is where a WAN optimization solution could come into play. These solutions (SilverPeak, Riverbed, etc.) appear to be extremely good at improving effective throughput across many types of WAN connections. These of course must sit at the correct spot in the path to the destination. The units are often priced on bandwidth speed, and while they are very effective, are also quite an investment. But they work at layer 3, and must in between the source and a router at both ends of the communication path; something that wouldn’t exist on a Metro-E circuit where VLANing was used to transmit replicated data.
The result is that for right now, I have chosen to go with a routed arrangement with no WAN optimization. This does not differ too much from a traditional WAN circuit, other than my latencies should be much better. The next step if our needs are not sufficiently met would be to invest in a couple of Cisco switches, then send replication data over it’s own VLAN to the CoLo, similar to the illustration above.
My original SAN array is an EqualLogic PS5000e connected to a couple of Dell PowerConnect 5424 switches. My new equipment closely mirrors this, but is slightly better; An EqualLogic PS6000e and two PowerConnect 6224 switches. Since both items will scale a bit better, I’ve decided to change out the existing array and switches with the new equipment.
Some Lessons learned so far
If you are changing ISPs, and your old ISP has authoritative control of your DNS zone files, make sure your new ISP has the zone file EXACTLY the way you need it. Then confirm it one more time. Spelling errors and omissions in DNS zone files doesn’t work out very well, especially when you factor in the time it takes for the corrections to propagate through the net. (Usually up to 72 hours, but can feel like a lifetime when your customers can’t get to your website)
If you are going to go with a QMOE or Metro-E circuit, be mindful that you might have to force the external interface on your outermost equipment (in our case, the firewall/router, but could be a managed switch as well) to negotiate to 100mbps full duplex. Auto negotiation apparently doesn’t work to well on many Metro-E implementations, and can cause fragmentation that will reduce your effective throughput by quite a bit. This is exactly what we saw. Fortunately it was an easy fix.
Stay tuned for what’s next…
22 thoughts on “Replication with an EqualLogic SAN; Part 1”
Nice explanation of your selection process. How much data are you syncing between environments on a daily basis after the initial synchronization?
The replication data that I show is after the initial seed replica occurs for the given volume. Perhaps I should have clarified that more. As far as total amount that I’m replicating about 100GB on a daily basis. That is with the replication partner still living inside of my primary facility. Once it gets relocated, I will be adjusting what will be replicated, and the frequency, to best match my bandwidth.
Did that answer your question?
Thanks for reading.
Hi there; thanks for the write-up.
You said you were replicating 100GB/day – do you know how much data-change actually occurs daily? I’m trying to get a gauge of the overhead that replication causes to try to decide if it’s right for my organization. It’s difficult to tell until you actually do it, but any kind of rough data would be helpful.
Great question Matt. It is indeed difficult to truly know how much you will be replicating until you actually start doing it. I’ve pestered Dell/EqualLogic for a “simulation feature” that would help in capacity planning for replication. Let’s hope they do something about that. Anyway, there are a few things that you can do though to get a rough idea.
Look at your snapshot sizes. For each SAN volume that you have (whether they are VMFS, or native), delete one snapshot at a time, and see what the typical increase in available snapshot reserve. When you do this, if a volume makes frees up another 12GB of available snapshot space for everyone you delete, then that particular volume will probably be replicating about 12GB for whatever interval that the snap had (e.g. once a day for example). After a week or so of doing this, you should get a pretty good feel for the behavior of each volume. Some will be amazingly consistent, while others might vary quite a bit. Keep track of your analysis in a spreadsheet, and come to your conclusions after a week or so.
Understanding how much data-change occurs can be very difficult for another reason. Massaging of that data (e.g defrags, moving of files, and other file management techniques, etc.) will be interpreted by the SAN as changed block data, and thus will show that big increase when doing a snap, or a replica. I’d look at your internal processes and make sure you aren’t undermining your efforts to determine the real amount of changed data.
Remember that replication and snapshot sizes are also a function of time since the last snap or replica. For the sake of minimizing variables, just do this once a day for right now.
If you are on the fence, I say go for it. It brings a tremendous sense of security, and works extremely well. If you are concerned about bandwidth, you can do other things to minimize how much of your bandwidth your replicas consume. Thanks for reading!
Wow thanks for the quick reply. I appreciate the info.
The “wait & see” angle is pretty scary for me at the moment. Basically we’re trying to do an ‘all-in-one’ shot from old decrepit physical hardware to nice shiny new virtualized backend. Dell has an impressive solution so far, but the snapshot/replication sizes seem to be the “Achilles heel” of the entire setup. Our original idea was to use snapshots and offsite replication for our entire data protection scheme, keeping everything in VMFS format for easy of portability/recovery. Some of these horror stories about 200% overhead on change, and not having crash consistent snaps/replicas on DBs has got me re-thinking that entire strategy and potentially leaning back to tape for backup/retention. Hopefully our Dell rep can get a loaner for us shortly and I can start trying to get a feel for the disk costs of what we want to do, and weather or not it makes sense over traditional offline backup.
Yes, I have a client looking to do about the same amount of replication between datacenters.
I’m interested to see how this plays out. What do you guys think?
I am trying to do the same here but I can not get the replication to work. I can ping to each PS but when I try to start replication I get a error that it can’t find the partner any ideas? I will read the rest in hopes you cover your config on the router.
I assume that you are trying to replicate across a route, correct? If thats the case, did you ever try replication with the target array living on the same SAN network? Was that successfull? I found that step invaluable in working out the kinks with replication, and learning a few things along the way. From there I’d check your routers/firewalls to insure that the ports for replication are open.
Also, what type of replication are you attempting? Through the Group Manager GUI, through ASM/VE, or through ASM/ME? ASM/VE can most certainly be the trickiest, so I’d start out with the GroupManager GUI, then go to ASM/ME (if you have a use-case for that)
This sounds like a terrific product.
I determined my average replication bandwidth by hooking my DR SAN up locally and replicating it there. I let it run for a few days replicating each volume hourly and amassed greats statistics by looking under replication history under the GUI. I could see and track exactly how much data was replicating per volume per interval and how long it took. In my case I average about 12GB an hour but since each volume tries to replicate simultaneously at the top of the hour the replication for each volume (17 or so) was completed by 3 minutes past the hour. This works out to around 65MB/sec replicating from a three array group to a single array group over a 1GB connection. This did NOT work over a 100 Megabit connection though. Dell said there is a some overhead associated with replication that makes using a 100 Megabit connection infeasible. Never trusting a vendor I tried it anyway and they were right. You would think that a 100MB connection would be 10% as fast as a Gigabit. It was more like 2% as fast (I don’t remember the exact numbers but the link couldn’t keep up with the changes for replication interval I needed). Hope this helps,
Thanks for the comments Matt. With regards to that 100 megabit connection, I’d look at any connection points in between the source and the target. If they are negotiating correctly, you should see up to 95% protocol efficiency. Many switches and routers will not autonegotiate correctly if they are less than a 1GB connection. I had this happen to me during my internal testing where I introduced some cheap switches to emulate my physical topology (described in part 5 of my series). The same thing will happen often times with Layer 2 Metro E circuits. I’m not sure if the people at Dell might have understood your situation. There is nothing magical about a 100MB connection. I’ve been able to replicate at 10, 30, 100, and 1GB all reflecting the throughput anticipated. Thanks for sharing.
When I initially commented I seem to have clicked the -Notify me
when new comments are added- checkbox and from now
on each time a comment is added I get four emails with the exact same comment.
Is there a means you are able to remove me from that service?
You can do this via your WordPress account. Unfortunately I do not have any visibility to that.
This thread might be old but significant to Replication. In our environment, we process Replication in sizes of 100GB to 3TB. Even with a bandwidth of 100MB for the WAN, this does not cut it for us. With the introduction of EqualLogic PS6610E/ES (larger capacity SAN), replication process is till the same.
Any recommendation to make this faster and on time? We kept of missing our Replication schedule due to over-runs.
What you describe is quite common with EqualLogic based replication. One of the reasons is the extremely large page/block size they use (15MB). This means that for even small changes, one is sending very large chunks of data across the wire. Frankly, what I would do, and personally did do in my environment is to abstract away my data replication requirements from a SAN array, and let it live at a software layer. So something like a Zerto, or even to a lesser degree, Veeam, would work extremely well for this. Once of the HUGE advantages by going this route is that since you would no longer be relying on array based functionality, the source, and the target storage do not have to be the same. You could easily have some less expensive storage on the target side. Also, software allows for compression and dedupe to occur at the software layer as well. Storage manufactures push their replication functionality not because it is the superior way of doing things, but because it sells more boxes. while it is appealing to try to use something that is includeded as “free” for what you already bought, array based replication has some limitations, and you’d be better served to let software handle this.