Best Practices For Infrastructure Upgrade?

Best Practices For Infrastructure Upgrade? 264

Posted by timothy on Saturday November 21, 2009 @06:50PM from the thinking-ahead dept.

An anonymous reader writes "I was put in charge of an aging IT infrastructure that needs a serious overhaul. Current services include the usual suspects, i.e. www, ftp, email, dns, firewall, DHCP — and some more. In most cases, each service runs on its own hardware, some of them for the last seven years straight. The machines still can (mostly) handle the load that ~150 people in multiple offices put on them, but there's hardly any fallback if any of the services die or an office is disconnected. Now, as the hardware must be replaced, I'd like to buff things up a bit: distributed instances of services (at least one instance per office) and a fallback/load-balancing scheme (either to an instance in another office or a duplicated one within the same). Services running on virtualized servers hosted by a single reasonably-sized machine per office (plus one for testing and a spare) seem to recommend themselves. What's you experience with virtualization of services and implementing fallback/load-balancing schemes? What's Best Practice for an update like this? I'm interested in your success stories and anecdotes, but also pointers and (book) references. Thanks!"

Best Practices For Infrastructure Upgrade?

This discussion has been archived. No new comments can be posted.

Search 264 Comments Log In/Create an Account

Comments Filter:

Why? (Score:3, Informative)

by John Hasler ( 414242 ) writes: on Saturday November 21, 2009 @07:04PM (#30188872) Homepage

Why virtual servers? If you are going to run multiple services on one machine (and that's fine if it can handle the load) just do it.

I'd say (Score:5, Informative)

by pele ( 151312 ) writes: on Saturday November 21, 2009 @07:04PM (#30188876) Homepage

don't touch anything if it's been up and running for the past 7 years. if you really must replicate then get some more cheap boxes and replicate. it's cheaper and faster than virtual anything. if you must. but 150 users doesn't warrant anything in my oppinion. I'd rather invest in backup links (from different companies) between offices. you can bond them for extra throughput.

Re:And the Key Factor is.... (Score:2, Informative)

by lukas84 ( 912874 ) writes: on Saturday November 21, 2009 @07:10PM (#30188946) Homepage

Again, wrong approach. Ask the higher-ups what kind of availability they want. The cost is derived from their wishes.

Re:Why? (Score:2, Informative)

by lukas84 ( 912874 ) writes: on Saturday November 21, 2009 @07:24PM (#30189060) Homepage

Virtualization does not automatically imply redundancy, and VM-level high availability will not protect you against application failures.

Re:Why? (Score:3, Informative)

by nabsltd ( 1313397 ) writes: on Saturday November 21, 2009 @09:19PM (#30189856)

Just p2v his entire data center first,
This brings to mind one other big advantage of VMs that help with uptime issues: fast reboots.
Some of those old systems might have to be administered following "Microsoft best practices" (reboot once a week just to be safe), and older hardware might have issues with that, plus it's just slower. Add in the fact that VMs don't have to do many of the things that physical hardware has to do (memory check, intialize the RAID, etc.), and you can reboot back to "everything running" in less than 30 seconds.
Although you never want to reboot if you can avoid it, this one factor gives you some serious advantages. If you have to apply a patch that requires a reboot, you can do so just by making sure the server isn't being used right now, and it's likely that people won't even notice. Of course, you don't do this until after you have done the same thing on the test server, and know that the patch won't cause issues.
then work on 'upgrades' from there.
And the test environment is a big thing that VMs can provide to help those upgrades. Just p2v the system, then clone it to create the test version. Use snapshots and torture the test system as much as you want.

Linux Vserver (Score:2, Informative)

by patrick_leb ( 675948 ) writes: on Saturday November 21, 2009 @10:13PM (#30190286)

Here's how we do it:
- Run your services in a few vservers on the same physical server:
* DNS + DHCP
* mail
* ftp
* www
- Have a backup server where your stuff is rsynced daily. This allows for quick restores in case of disaster.
Vservers are great because they isolate you from the hardware. Server becomes too small? Buy another one, move your vservers to it and you're done. Need to upgrade a service? Copy the vserver, upgrade, test, swap it with the old one when you are set. It's a great advantage to be able to move stuff easily from one box to another.

Re:Cloud Computing(TM) (Score:2, Informative)

by Anonymous Coward writes: on Saturday November 21, 2009 @10:55PM (#30190520)
We've probably dropped ~20K (w/o licensing) in our VMWare ESX cluster. Basically it's the "poor man's version" because of all of our purchasing restrictions, but here's about what it is:
- Basically a box with some 15K SAS drives in RAID1+0. Cost ~$5k
- Server with some SATA 1TB drives again in RAID1+0. Around 5K as well
- 3x cluster nodes. Dual 771 with 8 or 16GB of RAM
- Management node running Win Server 08
It's not a super config, and a lot of people will argue that it's not a true setup, but it's sufficient for our needs. I think we hit 4% CPU utilization across all the nodes the other day.

With VMWare, watch the 2TB filesystem limit. We ran in to that with our SATA array. Basically you have to slice it in to 2TB chunks to get VMware to accept it as a datastore.

As far as networking goes, we have a couple of gigE switches running the traffic. Our SANs are redundant, as we clone all of the machines from our SAS "SAN" to our SATA. If the "production" SAN goes down we can start up the clone from the SATA box in minutes. After the primary SAN comes back up we can VMotion it across to the other data store.
Re:Why? (Score:2, Informative)

by mysidia ( 191772 ) writes: on Sunday November 22, 2009 @12:56AM (#30191042)

That's where Windows 2008 MSCS, HAProxy, or Redhat cluster suite comes in.
For example, if you want a highly-available web service, you would have two VMware servers that you run a Webserver VM for on each server.
Then you would have a diskless load-balancer running HAProxy, to feet incoming web requests to a working web server.
For database services... you'd have a MySQL or MSSQL VM on each host, and a SAN or shared storage block filesystem with a GFS formatted LUN, and a Quorum disk (Linux) or Witness File share on a third physical host (for Windows 2008 MSCS), with clustering services configured so the SQL process is only active on the one host at a time, and only when quorum is met; if failure of another node is detected, a remaining node that can meet quorum will fence (KILL) the other VM, and then take over.
So in this manner, you can meet HA in a virtualized environment.
Although there are some considerations, like guest system clock accuracy, reliability of network connections to ensure an erroneous failure isn't detected during times of high load, and supported configurations for OS vendors' clustering capabilities

Re:Latest Trends (Score:3, Informative)

by Z00L00K ( 682162 ) writes: on Sunday November 22, 2009 @03:29AM (#30191740) Homepage Journal

Any server that can offer a RAID disk solution would be fine. Blade servers seems to be an overkill for most solutions - and they are expensive.
And then run DFS (Distributed File System) or similar to have replication between sites for the data. This will make things easier. And if you have a well working replication you can have the backup system located at the head office and don't have to worry about running around swapping tapes at the local branch offices.
Some companies tends to centralize email around a central mail server. This has it's pros and cons. The disadvantage is that if the head office goes down everyone is without email service. But the configuration can be more complicated if each branch office has it's own.
It's also hard to tell how to best stitch together a solution for a specific case without knowing how the company in question works. There is no golden solution that works for all companies.
The general idea is however that DNS and DHCP shall be local. If they aren't then the local office will be dead as a dodo as soon as there is a glitch in the net. Anyone not providing local DNS and DHCP should be brought out of the organization as soon as possible. And DNS and DHCP doesn't require much maintenance either, so they won't put much workload on the system administration.
There are companies (big ones) that run central DHCP and DNS, but glitches can cause all kind of trouble - like providing the same IP address to a machine in Holland and in Sweden simultaneously (yes - it has happened in reality, no joke) - and the work required to figure out what's wrong when multiple sites are involved in an IP address conflict can cost a lot. And if you run Windows you should have roaming profiles configured and a local server on each site where the profiles are stored.
Local WWW and FTP servers - can work, but watch out too since you have to check out if it's for internal or external use. Do you really need a local WWW and FTP server for each site? I would say - no. And those servers should be on a DMZ. It can of course be one server servicing both WWW and FTP. The big issue with especially FTP servers if they are for dedicated external users is the maintenance of the accounts on those servers. Obsolete FTP server accounts are a security risk.
And if you run Windows I would really suggest that you do set up WDS (Windows Deployment Server). This will allow your PC clients to do a network boot and reinstall them from an image. Saves a lot of time and headache.
And today many users have laptop computers, so hard disk encryption should be considered to limit the risk of having business critical data going into the wrong hands. Truecrypt [truecrypt.org] is one alternative that I have found that works really well. But don't run it on the servers.

Re:Cloud Computing(TM) (Score:3, Informative)

by TheLink ( 130905 ) writes: on Sunday November 22, 2009 @12:16PM (#30193994) Journal

I have vmware machines on one server at home. There are still benefits even though it's not a cluster. So it's not that stupid.

It is easier to move the virtual servers to another machine or O/S. This is useful when upgrading or when hardware fails or when growing (move from one real server to two or more real servers). There's no need to reinstall stuff because the drivers are different etc.

You can snapshot virtual machines and then back them up while they are running. Backup and restore is not that hard that way. So even if you have a single point of failure, if you have recent image back ups, you could buy a machine with preinstalled O/S, install vmware, and get back up and running rather quickly.

And when power fails and the UPS runs low on battery, I have a script that suspends all virtual machines then powers the server down. That's more convenient too than setting up lots of UPS agents on multiple machines and hoping they all shutdown in time.

DB performance sucks in a vmware guest though, so where DB/IO performance is important, use "real" stuff. Things may be better with other virtualization tech/software.

Are blades really such a good idea? (Score:3, Informative)

by TheLink ( 130905 ) writes: on Sunday November 22, 2009 @12:36PM (#30194164) Journal

In my uninformed opinion, blades are mainly a way for hardware vendors to extract more money from suckers.

They probably have niche uses. But when you get to the details they're not so great. Yes the HP iLO stuff is cool etc... When it works.

Many of the HP blades don't come with optical drives. You have to mount CD/DVD images via the blade software. Which seemed to only work reliably on IE6 on XP. OK so maybe we should have tried it with more browsers, than IE8, but who has time? Especially see below why you don't have time:

So far I haven't seen any mention in HP documentation that the transfer rate of the mounted CD/DVD image (or folder) between your laptop to the iLO software to a blade that you're trying to install stuff on is a measly 500 kilobytes per second. But that's what we encountered in practice.

Yes you can attach the blade network to another network and install it over the network, but if you can do that, doesn't that make the fancy HP iLO stuff less important? You might as well just get a network KVM right? That KVM will work with Dell/IBM/WhiteBoxServer so you can tell HP to fuck off and die if you want.

Which brings us to the next important point: Fancy Vendor X enclosures will only work with current and near future Vendor X blades. In 3-5 years time they might start charging you a lot more to buy new but obsolete Vendor X blades. Whoopee. What are the odds you can use the latest blades in your old enclosure? So you pay a premium for vendor lock-in and to be screwed in the future.

I doubt Google, etc use blades. And they seem to be able to manage hundreds of thousands of servers. OK so most of the servers might be running the same image/thing... So that makes it easy.

BUT if you are having very different servers do you really want them in a few blade enclosures? Then when you need to service that enclosure you'd be bringing down all the different blades...

Re:P2V and consolidate (Score:3, Informative)

by masdog ( 794316 ) writes: <masdog@@@gmail...com> on Sunday November 22, 2009 @12:39PM (#30194192)

VMWare converter is free, and it works with ESXi.

Check it out here. [vmware.com]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Best Practices For Infrastructure Upgrade? 264

Best Practices For Infrastructure Upgrade? More Login

Best Practices For Infrastructure Upgrade?

Why? (Score:3, Informative)

I'd say (Score:5, Informative)

Re:And the Key Factor is.... (Score:2, Informative)

Re:Why? (Score:2, Informative)

Re:Why? (Score:3, Informative)

Linux Vserver (Score:2, Informative)

Re:Cloud Computing(TM) (Score:2, Informative)

Re:Why? (Score:2, Informative)

Re:Latest Trends (Score:3, Informative)

Re:Cloud Computing(TM) (Score:3, Informative)

Are blades really such a good idea? (Score:3, Informative)

Re:P2V and consolidate (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot