Minimizing Downtime When Switching IP Addresses? 51
GeekTek asks: "As we all know, prices for co-location have plummeted since the height of the dot.com era. We've been shopping around and found a solution that works for us. We have a small setup of about a dozen Debian boxen, a few Windows servers and we run our own name servers (BIND 8.x). Most of our domain names are managed through our OpenSRS account. My concern is switching all of our server's IP addresses. I can not have any down time and I want to minimize the number of trips to the current co-lo (it's >2 hours away). What is the best way to do it? What experiences can you share in similar situations?"
what services are you running? (Score:3, Informative)
if it's just http, you can use redirects to mitigate the dns delay. https complicates this since you'll have to get certs for your temporary dns names (i.e., https://old/ redirecting to https:/new/, which also responds to https://old/)
if it's just about anything else, you might consider setting up a vpn between your existing datacenter and your new datacenter, and setup both environments to answer to the same dns names and have your old environment tunnel to your new one.
in short, doing this with zero downtime is highly dependent on the services you are running. it's not a generic problem that has one solution.
Short TTLs + rinetd or similar (Score:5, Informative)
Then when you're done moving a machine, change the IP in DNS. When it seems solid, you put the TTL back to a reasonable (1 day) number.
During the transition, you can also keep a machine at the old IP address and forward the services to the new IP address using tools such as rinetd or xinetd. This assures that you have all traffic going to the correct machine (possibly through the old machine) but that the old IP address is available during the move for clients that have broken DNS resolvers that don't correctly honor DNS TTL values. The rinetd/xinetd purpose machine can easily be a temporary box, such as a laptop - it's not doing any real processing.
If you're also moving your DNS machines, move one a week before the big move, update whois, and make sure everything settles down. Then move the other a day or two after the big move.
TTL = half the time until switchover (Score:2, Insightful)
Decrease the TTL of the DNS records during the switchover. If your current TTL is a day, then at least one day earlier, change it to, say, 300 (5 minutes). You'll experience a higher DNS query rate during that time, but probably nothing you can't handle.
Actually, you can reduce the DNS query rate by continuously setting the TTL to about half the time until the switchover. For instance, 24 hours before the switchover, set it to 12 hours. Then keep decreasing the TTL until it's down to about five minutes. This way, you won't get a continuous flood of DNS requests during the day before the switchover.
Re:TTL = half the time until switchover (Score:3, Informative)
Or you could use tinydns, which handles this automatically:
http://cr.yp.to/djbdns/tinydns-data.html [cr.yp.to]
Re:Short TTLs + rinetd or similar (Score:2)
mirror your servers (Score:2)
Re:mirror your servers (Score:1)
If possible, you can bring up the new IP address, make sure it's running okay, cut over DNS, then bring down the "old" address.
I've done it in various linux installations... In the routers themselves it's a little different (esp. when changing loopback addresses)
Fail over (Score:2)
Now mind you, if the latency between the two would have to be low enough that your DB doesn't choke doing redundancy/syncronization between your two sites.
If that can be setup, just change your namesevers to point to the new colo and watch the traffic transition over. Once your old site hits no traffic, you are done.
There are prolly better ways.. maybe just plain other ways.. but this is off the top of my head.
Reduce TTL (Score:3, Interesting)
(Note: I'm assuming you have duplicate equipment since that's the only way to physically move with no downtime unless your configuration allows you to remove half of your stuff and still keep running.)
Depending on your needs and current design you can also play NAT/Proxy games. Ie. set up a proxy server or use NAT to make your old IP contact your new servers to catch all the misdirected traffic until DNS propogates.
Last couple of times I did this it was fun to watch. I pulled the trigger on the DNS and could watch the load flow from the old to the new site (we were in the top 500 sites in traffic and did the move during the day so there was a statistically valid sample to work with).
Re:Reduce TTL (Score:2)
Either that or one heck of a transportable UPS and a kickass wireless setup!
Re:Only somewhat off topic... (Score:2)
Re:Only somewhat off topic... (Score:2, Insightful)
Re:Only somewhat off topic... (Score:1)
If you have a lower priority MX box our there compare your logs with the higer priority one and look for machines that actually tried the higher priority one. It's usually kind of funny. I suppose there's a way to fingerprint servers from this behavior.
DNS then HTTP then SMTP (Score:2, Insightful)
A few different ways (Score:1)
We did all this when moving from 4 T1's to a DS3, last year.
Another part of the job.... (Score:2, Insightful)
Even though my home network is only a two, sometimes three machines, I administer IP addresses through DHCP. The server has a static IP, everything else gets its IP served from DHCP, with a static MACIP mapping. My DNS is on the same machine.
For your situation, switch the machines to DHCP at the old location, and have everything running. You would need a temporary machine to act as the DHCP/DNS machine at the new location. When you move your machines, they should simply come up. Watch out for hardcoded IPs in other configs.
I presume your servers are on a DMZ, and you could arrange one machine as a DHCP/DNS server. Heck, a WalMart $200 box could more than do the job.
Re:Another part of the job.... (Score:1)
I'll admit I'm not an admin, except on my home LAN. But I am an early adopter, and was on the ground floor for the planning over a decade ago when our site began migrating from mainframes to workstations. We're now somewhere in the 5000 (SWAG, probably an underestimate) systems on the site lan, with probably 1/4 of those getting IP through DHCP, the rest static. So I don't see capacity being the problem.
One set of problems comes with department moves, and the coordination between physical and IP frequently gets messed up. That's what prompted my DHCP suggestion, because it sure could help around here at those times. I also know that what I suggest could lead to some large MACIP tables, but some admins like (anal?) control and this is one mechanism.
But I have no experience whatsoever in (probably heterogeneous) co-location. I'd assume you get floor space, electricity, an IP range, basic external DNS, and some form of bandwidth, either in the form of one/multiple ethernet or something more exotic that plugs into a router. Beyond that, I don't know what services you get from the data center and what you provide in your own square footage. If you're on your own in that floorspace, and if you're a small operator, then maybe it does start to look like my home LAN, just in a bigger room.
So help with a clue. I'm not about to quit my day job for it, and you'll probably say that's a good thing. Though anonymous, you're the most polite response. Generally, if someone clueless comes into my haunts asking obviously ignorant questions about chip design, I try to be polite. (Unless of course they're asking to get their homework done for them.)
Re:Another part of the job.... (Score:2)
DHCP certainly isn't a magic bullet but it does get some parts of planning the move into solid existence, and I don't see it causing any problems. If nothing else, the post is at least interesting because until I read it, I hadn't even considered using DHCP as part of a large scale move for machines with static addresses.
Re:Another part of the job.... (Score:1)
How would anything in your answer help him...?!?
Re:Another part of the job.... (Score:1)
Re:Another part of the job.... (Score:1)
Re:Another part of the job.... (Score:2)
We need people like you who can think outside of the box and come up with creative solutions to complex problems.
Send me an email and we'll talk.
Network Planning Cliff Notes (Score:1)
--Dan
A friend had that problem... (Score:2)
'Course if the old IP is completely dead, you've got problems. If you're physically moving the server, then I'm sure you can dig up an old 486 to run Apache on as a redirect.
Pure operational exercise (Score:5, Informative)
Lay out your priorities. You say "everything has to stay up" - maybe that's true, but I moved a rather large commercial site stuck in one colo elsewhere, in pieces, when we had a *lot* of money, and when cost analysis started being done, it turned out we could afford downtime.
Look at your traffic records, worry about what has to be up and what doesn't. Think *hard* about dependencies.
Perhaps you can afford two trips (which is what we did), in which case, you move a skeleton crew to the new site (pre configured and tested, of course) , switched DNS (you did think about your TTL, yes?), waited for it to be picked up from a site I knew had not cached the DNS, and completed the move.
Perhaps you can buy/borrow from the office/use spares (but be careful about occupying your spares!) for the move.
Perhaps you can offload the bulk of your traffic elsewhere (Akamai or something to move the demand on machines off the machines while you're doing it.)
I can't speak to your situation, but there's always a way to make it work - like I said, it is pure operations. Analyse, plan, plan again, execute.
More hints -
- Before you're slouching in the colo breaking down the network, copy all data where it needs to be from the comfort of your office. Doublecheck you got it right.
- when disassembling equipment, label all interconnects, in order, unless every box is flat on a local net, with nothing hanging off of them. Don't forget routers, and don't assume it's stupid to label something obvious. Assume you're going to be brain dead when you put it back together - if something unexpected happens (someone flips the truck?), you will be brain dead. And even if you're not, it does help, esp. with messy SCSI configs, etc.
- Write out a timeline, and give yourself more time than you need. Make sure other people concerned know what it is.
- Oh, _back up your machines_. I know, it is obvious, but I know of one company that screwed this up royally.
- Bring one more person than you need. They might be helpful, and if not, they can at least fetch coffee and donuts when you need them.
- Bring snacks, lots of them.
- Convince your accountant insurance is worth it, if they don't belive it already. We were moving ~2M worth of gear, and I would have been even more freaked out than I was if we hadn't insured it while it was being transported.
- Have a wad of company cash/credit card on hand. You never know what comes up.
- Ditto for spares, whatever you can - is that disk that's been spinning for 4 years going to come back up? Cat 5?
- If you have heavy gear, think about whether or not you're going to move it yourself.
- Overplan it. You'll be glad. Think contigencies and fall back positions.
- Make sure your staff is well rested before you do it, and that they have whatever they need before you start.
Hope this helps.
-j
Re:Pure operational exercise (Score:2)
Umm... DECLINED?
Ditto for spares, whatever you can - is that disk that's been spinning for 4 years going to come back up?
That one is easy: NOTACHANCE.
Oh, and from personal experience, make sure there are no "flags" on your account at your current provider or they may try to prevent you from removing equipment.
Re:Pure operational exercise (Score:1)
What others have recommened works for me. What I'm not planning is:
1. Test new location and put in a box
2. Turn it into ns1 (import records,etc), turn off ns1 at old location
3. Update host records for ns1 with Tucows and Internic
4. Start decreasing ttl
Wait a day or two to shake things out
5. mirror sites/put on temp servers, etc. (planned accordingly)
6. Transition sites to temp servers
7. Go to old co-lo, with pre-configured redirect server
8. dismantle (label) equipment, bring up to new co-lo, set back up
8. Have some well deserved coffee
9. Transition sites back to regular servers
10. Shut down and move ns0 & redirect server
I'm REALLY nervous about running on one name server for the few hours between moving. I also have an irrational fear that when I transition ns1 to its temporary home, it will cause a rift in the time-space continuum making all of my site inaccessable.
Is it advisable to bring ns0 up with the rest of the equipment and skip the redirect?
BTW, thank you guys, I'm pressed for time so have to respond to several postings at once.
- Eric (who is looking for a coffee bitch as he types)
Re:Pure operational exercise (Score:2)
3. Update host records for ns1 with Tucows and Internic
I'm REALLY nervous about running on one name server for the few hours between moving. I also have an irrational fear that when I transition ns1 to its temporary home, it will cause a rift in the time-space continuum making all of my site inaccessable.
Is it advisable to bring ns0 up with the rest of the equipment and skip the redirect?
If I'm following you, I wouldn't be too concerned with running on one name server for a short time. The odds of something going wrong on it in that period is lower than the odds of something else going wrong.If it makes you really nervous, have the redirect server also do DNS for that period, although that slows you down - you'll have to wait for Tucows or whoever to make an additional change.
As for fears of transitioning ns1, what I'd do is not turn off the old ns1 until after you're sure that the new one is being used - wait until the host record update goes through and you see zero traffic on the old one to turn it off.
Another hint I thought of:
- if you can get away with it, institute a configuration freeze several days before the move, and reboot everything you can at the old place. I wish I had done thing - several machines had very long uptimes, and people had made changes that caused problems when we brought them back up at the new location. If you can test them one at a time, you might save yourself some grief.
Good luck
-j
Re:Pure operational exercise (Score:2)
One last thought: any business that says "No downtime at all is acceptable!" has a philosophical problem more than a technical one. Sooner or later you may be forced to accept downtime whether you wanted it or not (backhoe? colo fire? worm infestation?), and you should be technically and psychologically prepared for it. There are very few businesses, I think, that absolutely CANNOT have downtime. (I mean, we hear that at my workplace. We sell children's books. I think if some people can't buy children's books in the middle of the night a couple times a year, it's not the end of the world, or even of the company.) If you are one of these, your management should be willing to accept the expense and complexity of truly redundant geographically distant systems. If they're not...we're back to a philosophical problem.
Simple (Score:2)
2) Change the DNS
3) When the DNS updates you're done.
4) Pat self on back.
5) Get back to work you damn code/admin monkey.
(This is how it worked at my job.)
Re:Simple (Score:1)
Reducing Your Time-to-Live (Score:3, Funny)
It's a proven fact your lifespan will actually decrease due to the stress involved in moving a network's IP address and the debugging that goes along with it.
What about Web Accounts Accquired? (Score:1)
How you deal with lot of different domains that is registered though various registars plus moving them to the new DNS Server? That would introduce lot more complexity than just simple change of the IP Addresses and DNS records. We talking about major movement of all Domains from one DNS Server to new DNS server plus new IP address allocation. What would you do to ensure the smooth movement of those?
-12: Flamebait (Score:3, Funny)
Find out what slashdot's admins did the couple of times they moved their servers.
Then don't do that.
I'm reminded of a familiar puzzle (Score:2)
The farmer (you) is taking his fox (servers), duck (DNS) and corn (customer web space) to market (colocation provider). He is currently stuck on the left bank of a major river (highway). The good news is that he has a boat available (pickup truck). The bad news is that the boat will only hold him and one other item for each crossing (changing over servers without service loss).
He dare not leave the duck alone with the corn as the corn would get eaten (no servers for DNS), or the duck alone with the fox as the duck would get eaten (no customer web space while swapping drives?). Also the farmer knows from prior experience that he cannot leave the corn alone on the right bank of the river (old location) since a large flock of crows (customers) is waiting to devour it (and you).
Can you help the farmer get everything across the river safely? (Ask Slashdot can!)
DNS Authoritative Servers (Score:2, Insightful)
To get around this, there are two scenarios:
1) Use outside nameservers as your authoritative servers for your domain. You may even be able to get your registrar to do this. Some registrars offer it as a feature and others may charge. In any case, having a separate set of nameservers means you can move from colocation facility to colocation facility with relative ease as mentioned in earlier posts.
2) Set up two servers at the new colo facility as DNS servers and set all of your TTLs etc to the desired values. Registers those IPs as nameservers with Network Solutions (you may be able to do this through your registrar). Then change the IP numbers of your nameservers for the domain names. Wait 48 hours for total propagation and proceed as has been outlined in previous posts.
Please note that you really should contact your registrar and find out what the proceedure is for changing the IP address of a nameserver. I know that in the past when we had to do it, there was a template sent to Network Solutions specifically for this task. This is most likely easier now and probably different for each registrar.
I went through this... (Score:2)
ifconfig eth0:1 123.45.67.89 netmask 255.255.255.224 broadcast 123.45.67.255
That's it. Now the people at my friend's company have set up the DNS to report the new IP adress and let it propagate through the 'net. One hour later or so all my domains targeted the new IP adress, everything went fine, with zero downtime.
The best is: everything was done through ssh, I didn't had to move my lazy ass
Only one pit to be aware of: don't forget to tell your firewall ! In my case it was simply adding eth0:1 to the list of firewalled interfaces in SuSEs
Now that everything works I could kick out the old IP adress and stuff... but I'm lazy
Re:I went through this... (Score:1)
1st step is maybe outsourcing DNS... (Score:1)
I've been real happy with zoneedit [zoneedit.com] for DNS services. In fact, from looking at the identical sign-up forms on the Verisign and Zoneedit websites, I think Verisign is reselling their services (not that what those guys do should necessarily be an inspiration to anyone).
This might help to eliminate one very hairy variable from your already complex equation.
Re:1st step is maybe outsourcing DNS... (Score:1)
I'd love to do that, but a lot of our money is made selling domain names and hosting tiny sites. It would get very expensive relitive to our hosting rates to go with zoneedit, ultradns and other such services.
I wish we could do it though.. dns is a pita