Best Practices For Infrastructure Upgrade? 264
An anonymous reader writes "I was put in charge of an aging IT infrastructure that needs a serious overhaul. Current services include the usual suspects, i.e. www, ftp, email, dns, firewall, DHCP — and some more. In most cases, each service runs on its own hardware, some of them for the last seven years straight. The machines still can (mostly) handle the load that ~150 people in multiple offices put on them, but there's hardly any fallback if any of the services die or an office is disconnected. Now, as the hardware must be replaced, I'd like to buff things up a bit: distributed instances of services (at least one instance per office) and a fallback/load-balancing scheme (either to an instance in another office or a duplicated one within the same). Services running on virtualized servers hosted by a single reasonably-sized machine per office (plus one for testing and a spare) seem to recommend themselves. What's you experience with virtualization of services and implementing fallback/load-balancing schemes? What's Best Practice for an update like this? I'm interested in your success stories and anecdotes, but also pointers and (book) references. Thanks!"
Latest Trends (Score:2)
I've been looking at hp c3000 chassis office-size blade servers, which may serve as your production+backup+testing setup, and scale up moderately for what you need. Compact, easily manageable remotely, and if you're good about looking around, not terribly overpriced. Identical blades make a nice starting point for hosting identical VM images.
Re: (Score:2)
Blade servers are very nice for more than, say, 8 servers purchased at a time. The built-in remote integration of better blade servers, the trivial wiring, and physical management are sweet. But the blade server itself becomes a single point of failure, much as a network switch can be, so it takes thought to install and manage them properly. And they cost, at last glance, roughly $500/blade for the chassis. Is this worth an extra $500/server on your budget? Not if your servers are quite modest and the perso
Re: (Score:3, Informative)
Any server that can offer a RAID disk solution would be fine. Blade servers seems to be an overkill for most solutions - and they are expensive.
And then run DFS (Distributed File System) or similar to have replication between sites for the data. This will make things easier. And if you have a well working replication you can have the backup system located at the head office and don't have to worry about running around swapping tapes at the local branch offices.
Some companies tends to centralize email around
Are blades really such a good idea? (Score:3, Informative)
They probably have niche uses. But when you get to the details they're not so great. Yes the HP iLO stuff is cool etc... When it works.
Many of the HP blades don't come with optical drives. You have to mount CD/DVD images via the blade software. Which seemed to only work reliably on IE6 on XP. OK so maybe we should have tried it with more browsers, than IE8, but who has time? Especially see below why you
Many datacenters can't build out bladecenters (Score:3, Insightful)
The biggest problem I've found with blades is that you can't fill a rack with them. Several of the datacenters I've come across have been unable to fit more than one bladecenter per rack. Cooling and power being the problem.
At the moment. A rack full of 1U boxes look like the highest density to me.
Why? (Score:3, Informative)
Why virtual servers? If you are going to run multiple services on one machine (and that's fine if it can handle the load) just do it.
Re:Why? (Score:5, Funny)
redundancy.
Re: (Score:3, Insightful)
> redundancy.
+5 Funny.
Re: (Score:2, Informative)
Virtualization does not automatically imply redundancy, and VM-level high availability will not protect you against application failures.
Re: (Score:2, Informative)
That's where Windows 2008 MSCS, HAProxy, or Redhat cluster suite comes in.
For example, if you want a highly-available web service, you would have two VMware servers that you run a Webserver VM for on each server.
Then you would have a diskless load-balancer running HAProxy, to feet incoming web requests to a working web server.
For database services... you'd have a MySQL or MSSQL VM on each host, and a SAN or shared storage block filesystem with a GFS formatted LUN, and a Quorum disk (Linux) or Wi
Re: (Score:2)
Virtual was my first thought too.
Just p2v his entire data center first, then work on 'upgrades' from there.
Re: (Score:3, Informative)
Just p2v his entire data center first,
This brings to mind one other big advantage of VMs that help with uptime issues: fast reboots.
Some of those old systems might have to be administered following "Microsoft best practices" (reboot once a week just to be safe), and older hardware might have issues with that, plus it's just slower. Add in the fact that VMs don't have to do many of the things that physical hardware has to do (memory check, intialize the RAID, etc.), and you can reboot back to "everything running" in less than 30 seconds.
Althoug
Re: (Score:2, Insightful)
It creates a configuration nightmare. Apps with conflicting configurations.
Changes required for one app may break other apps.
Also, many OSes don't scale well.
In a majority of cases you actually get greater total aggregate performance out of the hardware by divvying it up into multiple servers. When your apps are not actually CPU-bound or I/O bound.
Linux is like this. For example, in running Apache.. after a certain number of requests, the OS uses the hardware inefficiently, and can't answer nearl
I'd say (Score:5, Informative)
don't touch anything if it's been up and running for the past 7 years. if you really must replicate then get some more cheap boxes and replicate. it's cheaper and faster than virtual anything. if you must. but 150 users doesn't warrant anything in my oppinion. I'd rather invest in backup links (from different companies) between offices. you can bond them for extra throughput.
Re: (Score:2, Insightful)
I doubt with only 150 people they would want to spend the money to have a server at every office in case that offices link went down. I agree wholeheartedly that the level of redundancy talked about is overkill. Also will WWW, mail, DNS, ... even work if the line is cut regardless if the server is in the building?
Re: (Score:2)
Re: (Score:2)
I'm not sure I'm convinced that it's really a good idea replacing 7 year old hardware with 5-6 year old hardware. Especially given that a single slightly-inexperienced sysadmin doing the system installs and upgrades in question is probably going to have their hands full for a year or so just on the software side. By the time the first wave of upgrades is done with, you're looking at hardware that's older than the stuff you're trying to get rid of was when you started the process.
Further, old cpus have com
Re: (Score:2)
Re: (Score:2)
Performance-per-watt becomes far more important when you're running a datacenter. When
Re: (Score:2)
Re: (Score:2)
if it works keep it running. You are correct in everything you point out. if anything, start first with a full replicated system setup, then a proper back up. next test the new systems, back up never seem to work on the first try so get the bug's worked out.
after this I have no real idea on what you need to do.
Think about the complexity of duplication (Score:5, Insightful)
there's hardly any fallback if any of the services dies or an office is disconnected. Now, as the hardware must be replaced, I'd like to buff things up a bit: distributed instances of services (at least one instance per office) and a fallback/load-balancing scheme (either to an instance in another office or a duplicated one within the same).
Is that really necessary? I know that we all would like to have bullet-proof services. However, is the network service to the various offices so unreliable that it justifies the added complexity of instantiating services at every location? Or even introducing redundancy at each location? If you were talking about thousands or tens of thousands of users at each location, it might make sense just because you would have to distribute the load in some way.
What you need to do is evaluate your connectivity and its reliability. For example:
Once you answer at least those questions, then you have the information you need in order to make a sensible decision.
Re: (Score:2)
Parent is right. KISS : keep it simple & stupid, there's a reason some of those servers have been running for 7 years straight. Don't make the error of over thinking it and planning for more than your organization needs (fun though it may be.) You can overthink your way from a simple install to a Rube Goldberg Machine.
balancing act (Score:2)
Beware of load balancing, because it will tempt you into getting too little capacity for mission-critical work. You need enough capacity to handle the entire load with multiple nodes down, or you will be courting a cascade failure. Load balancing is better than fallback, because you will be constantly testing all of the hardware and software setups and will discover problems before an emergency strikes; but do make sure you've got the overcapacity needed to take up the slack when bad things happen.
Get someone experienced on the boat! (Score:5, Insightful)
You know, you could've started with a bit more details - what operating system are you running on the servers? What OS are the clients running? What level of service are you trying to achieve? How many people work in your shop? What's their level of expertise?
If you're asking this on Slashdot now, it means you don't enough experience with this yet - so my first advice would be to get someone involved who does. Someone with many people with lots of experience and knowledge on the platform you work on. This means you'll have backup in case something goes south and your network design will benefit from their experience.
As for other advise, make sure you get the requirements from the higher-ups in writing. Sometimes they have ridiculous ideas regarding they availability they want and how much they're willing to pay for it.
Re: (Score:2, Insightful)
The main piece of missing information that annoys me is that part of the network service list that says "-- and some more." Half the services that were listed could be easily outsourced to any decent ISP, with cost depending on security, storage, and SLA requirements. ISP hosting or even colocation services give you cheap access to better redundant Internet links than your office will ever touch.
The other half could be done with a cheap firewall/VPN box at each site. In the age of OpenWRT, these boxes often
Take your time (Score:5, Insightful)
If you're like most IT managers, you probably have a budget. Which is probably wholly inadequate for immediately and elegantly solving your problems.
Look at your company's business, and how the different offices interact with each other, and with your customers. By just upgrading existing infrastructure, you may be putting some of the money and time where it's not needed, instead of just shutting down a service or migrating it to something more modern or easier to manage. Free is not always better, unless your time has no value.
Pick a few projects to help you get a handle on the things that need more planning, and try and put out any fires as quickly as possible, without committing to a long-term technology plan for remediation.
Your objective is to make the transition as boring as possible for the end users, except for the parts where things just start to work better.
Affordable SME Solution (Score:2, Interesting)
I am still in the process of upgrading a "legacy" infrastructure in a smaller (less than 50) office but I feel your pain.
First, it's not "tech sexy", but you've got to get the current infrastructure all written down (or typed up - but then you have to burn to cd just in case your "upgrade" breaks everything).
You should also "interview" users (preferrably by email but sometimes if you need an answer you have to just call them or... face to face even...) to find out what services they use - you might be surpr
openVZ (Score:4, Funny)
For services running on linux, openVZ can be used as a jail with migration capabilities instead of a full on VM,
DISCLAIMER: I don't have a job so I've read about this but not used it in a pro environment yet
Re: (Score:2)
I concur. OpenVZ is very lightweight. For a large number of small servers it saves on disk management because the OpenVZ instances' root directories are just a subdirectory on the physical server (and so they can share space in the same host partition). There's no dealing with virtual disk drives.
Don't do it (Score:5, Insightful)
Complexity is bad. I work in a department of similar size. Long long ago, things were simple. But then due to plans like yours, we ended up with quadruple replicated dns servers with automatic failover and load balancing, a mail system requiring 12 separate machines (double redundant machines at each of 4 stages: front end, queuing, mail delivery, and mail storage), a web system built from 6 interacting machines (caches, front end, back end, script server, etc.) plus redundancy for load balancing, plus automatic failover. You can guess what this is like: it sucks. The thing was a nightmare to maintain, very expensive, slow (mail traveling over 8 queues to get delivered), and impossible to debug when things go wrong.
It has taken more than a year, but we are slowly converging to a simple solution. 150 people do not need multiply redundant load balanced dns servers. One will do just fine, with a backup in case it fails. 150 people do not need 12+ machines to deliver mail. A small organization doesn't need a cluster to serve web pages.
My advice: go for simplicity. Measure your requirements ahead of time, so you know if you really need load balanced dns servers, etc. In all likelihood, you will find that you don't need nearly the capacity you think you do, and can make due with a much simpler, cheaper, easier to maintain, more robust, and faster setup. If you can call that making due, that is.
Re: (Score:2)
Actually it sounds like the system was designed to grow very large. The information provided does not indicate there are adequate alarms and documentation for when elements fail.
Google(tm) Cloud (Score:2, Funny)
Re: (Score:3, Insightful)
It is if you recommended outsourcing everything to the cloud.
Trying to make your mark, eh? (Score:3, Insightful)
The system you have works solidly, and has worked solidly for seven years.
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
Frankly, with the cost of modern hardware, you could triple the capacity of what you have now just by gradually swapping out for newer hardware over the next few months, and keeping the shite old boxen for fallback.
Virtualisation is, IMHO, *totally* inappropriate for 99% of cases where it is used, ditto *cloud* computing.
It sounds to me like you are more interested in making your own mark, than actually taking an objective view. I may of course be wrong, but usually that is the case in stories like this.
In my experience, everyone who tries to make their own mark actually degrades a system, and simply discounts the ways that they have degraded it as being "obsolete" or "no longer applicable"
Frankly, based on your post alone, I'd sack you on the spot, because you sound like the biggest threat to the system to come along in seven years.
These are NOT your computers, if you want a system just so, build it yourself with your own money in your own home.
This advice / opinion is of course worth exactly what it cost.
Apologies in advance if I have misconstrued your approach. (but I doubt that I have)
YMMV.
Re:Trying to make your mark, eh? (Score:5, Interesting)
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
...
Virtualisation is, IMHO, *totally* inappropriate for 99% of cases where it is used, ditto *cloud* computing.
I totally disagree.
Look at some of the services he listed: DNS and DHCP.
You literally can't buy a server these days with less than 2 cores, and getting less than 4 is a challenge. That kind of computing power is overkill for such basic services, so it makes perfect sense to partition a single high-powered box to better utilize it. There is no need to give up redundancy either, you can buy two boxes, and have every key services duplicated between them. Buying two boxes per service on the other hand is insane, especially services like DHCP, which in an environment like that might have to respond to a packet once an hour.
Even the other listed services probably cause negligible load. Most web servers sit there at 0.1% load most of the time, ditto with ftp, which tends to see only sporadic use.
I think you'll find that the exact opposite of your quote is true: for 99% of corporate environments where virtualization is used, it is appropriate. In fact, it's under-used. Most places could save a lot of money by virtualizing more.
I'm guessing you work for an organization where money grows on trees, and you can 'design' whatever the hell you want, and you get the budget for it, no matter how wasteful, right?
Re:Trying to make your mark, eh? (Score:4, Interesting)
Get real, for 150 users at WRT54 will do DNS etc....
Want a bit more poke, VIA EPIA + small flash disk.
"buy a server".. jeez, you work for IBM sales dept?
Re: (Score:3, Funny)
To me a room full of dedicated machines each running a single simple thing due to the 1990s approach of replacing a server with a dozen shit windows boxes that can't handle much but are cheap screams "a dozen vunerable points of critical failure".
Even MS Windows has progressed to the point where you don't need a single machine per service anymore in a light duty situation. Machines are going to fail, you may be lucky and it could be after they have served their t
Re: (Score:3, Insightful)
Get real, for 150 users at WRT54 will do DNS etc....
Want a bit more poke, VIA EPIA + small flash disk.
"buy a server".. jeez, you work for IBM sales dept?
I'm responding to your comment:
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
I recommended at least two boxes, for redundancy. He may need more, depending on load.
For a 150 user organization, that's nothing, most such organisation are running off a dozen servers or more, which is what the original poster in fact said. With virtualization, he'd be reducing his costs.
One per service is insane, which is what you said. If you wanted dedicated boxes for each service AND some redundancy, that's TWO per service!
Backpedaling and pretending that a WRT54 can som
Re: (Score:3, Insightful)
Is it so hard to not mix up dhcpd.conf and named.conf? Do you need virtualization for that?
Let me give you a hint: YOU DON'T
Re: (Score:2)
I think that's why people are still thinking about putting it in a virtual box so it can't eat all the resources, even for a pile of trivial services that a sparcstation 5 could handle at low load.
Re: (Score:3, Interesting)
Years ago the Microsoft DNS implementation had a very nasty memory leak and used a lot of cpu - you really did need a dedicated DNS machine for small sites and to reboot it once a week.
I think that's why people are still thinking about putting it in a virtual box so it can't eat all the resources, even for a pile of trivial services that a sparcstation 5 could handle at low load.
In practice, everyone just builds two domain controllers, where each one runs Active Directory, DNS, DHCP, WINS, and maybe a few other related minor services like a certificate authority, PXE boot, and the DFS root.
I haven't seen any significant interoperability problems with that setup anywhere for many years.
Still, virtualization has its place, because services like AD have special disaster recovery requirements. It's a huge mistake to put AD on the OS instance as a file server or a database, because they
Re: (Score:3, Interesting)
No, you need seperate servers for when the DHCP upgrade requires a conflicting library with the DNS servers which you don't want to upgrade at the same time.
THIS is where virtualization becomes useful.
On the other hand, my solutions is a couple of FreeBSD boxes with jails for each service. You could do the same with whatever the Linux equivalent is, or Solaris zones if you want. No need to actually run VMs.
Just run a couple boxes, seperate the services onto different jails. When you need to upgrade the c
Re: (Score:2)
Depending on how heavy the load is, that same machine could probably handle postfix, apache, and some kinda ftp server too. That's more or less what you said anyway, but I don't get why you
Re: (Score:2)
You literally can't buy a server these days with less than 2 cores, and getting less than 4 is a challenge.
Does it matter how many cores? They're cheap! 4 times the chance of failure is my only issue. In any case it sounds like he could combine services WITHOUT the overhead of visualization.
Even the other listed services probably cause negligible load. Most web servers sit there at 0.1% load most of the time, ditto with ftp, which tends to see only sporadic use.
Yes but it's the rest of the time that actual
Re: (Score:2)
Poppycock. You can buy small form factor single core pc's for under $200, or even a refurbished 3-4 year old server box for close to the same price. Depending on the environmental and space considerations, you can pick the platforms to suit and keep the costs minimal. Shoot, even a $200 netbook would have more cpu power and storage than most 7 year old computers, generate little or no heat, and demand a fraction of the power. If this guy is smart, he can cut electrical costs and cooling costs substantially without changing a perfectly functional architecture.
What doesnt make sense is grossly overcomplicating things by trying to shove too much into some large scale platform and then further complicate it with a virtualization layer. We gave up mainframes and thin clients/fat servers didnt work for a reason.
Sure, its cool and technically challenging. Whats the business reason/driver for going the cool/challenging route again?
If the OP decides to quit 2 months after implementing his super cool setup because the job after that is completely boring, who can come in and grasp what he's set up and maintain/upgrade it? Another finicky tech guru that wants to play with the stuff on the job and gets bored and walks off a couple of months later?
$200 machine = no raid, no ECC memory, no hardware monitoring, no support for server OS-es, not to mention that most netbooks can't run 64-bit, which means the latest Windows server is Just Not An Option.
Good advice! Lets run ALL of our business critical functions off laptops just to avoid learning about new technology! Lets all run on mixed hardware and have to deal with drivers from fifty vendors!
You really don't understand what virtualization provides, so maybe you should read up on it a little bit befor
Re: (Score:2)
Re: (Score:2)
Wow... Did you just seriously recommend he purchase 50 servers for each location???
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
25 services is next to nothing. A single domain controller has that running on a single box.
And you want him to break out each service to its own machine... with a second box for redundancy.
I guess I am happy that you have $20k+ to spend on two low end boxes for eg. just DNS. But that is stupid as hell.
Even worse that you are wasting a dual core 2ghz system for a NTP time sync server (Oh wait, two machines, like you said)
W
Re: (Score:2)
Or use something other than Vmware.
Kvm + libvirt + virtmanager will most likely be fine for what you describe.
What 150 users? (Score:5, Insightful)
I'd say that everyone has mentioned that big picture points already, except for one : what kind of users?
150 file clerks or accountants and you'll spend more time worrying about the printer that the CIO's secretary just had to have which conveniently doesn't have reliable drivers or documentation, even if it had what neat feature that she wanted and now can't use.
150 programmers can put a mild to heavy load on your infrastructure, depending on what kind of software they're developing and testing (more a function of what kind of environment are they coding for and how much gear they need to test it).
150 programmers and processors of data (financial, medical, geophysical, whatever) can put an extreme load on your infrastructure. Like to the point where it's easier to ship tape media internationally than fuck around with a stable interoffice file transfer solution (I've seen it as a common practice - "hey, you're going to the XYZ office, we're sending a crate of tapes along with you so you can load it onto their fileservers").
Define your environment, then you know your requirements, find the solutions that meet those requirements, then try to get a PO for it. Have fun.
P2V and consolidate (Score:5, Interesting)
Re: (Score:2)
Re: (Score:3, Informative)
Check it out here. [vmware.com]
Simple and straightforward = complex (Score:5, Insightful)
So let's see if I understand: you want to take a simple, straightforward, easy-to-understand architecture with no single points of failure that would be very easy to recover in the event of a problem and extremely easy to recreate at a different site in a few hours in the event of a disaster, and replace it will a vastly more complex system that uses tons of shiny new buzzwords. All to serve 150 end users for whom you have quantified no complaints related to the architecture other than it might need to be sped up a bit (or perhaps find a GUI interface for the ftp server, etc).
This should turn out well.
sPh
As far as "distributed redundant system", strongly suggested you read Moans Nogood's essay "You Don't Need High Availability [blogspot.com]" and think very deeply about it before proceeding.
Re: (Score:2)
As far as "distributed redundant system", strongly suggested you read Moans Nogood's essay "You Don't Need High Availability" and think very deeply about it before proceeding.
I agree that you shouldn't go for a HA solution if you don't need it, and that it is much more costly. However I've worked on a 6 9's availability (99.9999% uptime) system where we mostly met that target and sometimes it is needed and is worth doing.
Re: (Score:2)
FTFA: "there's hardly any fallback if any of the services dies or an office is disconnected."
So let's see if I understand: you want to take a simple, straightforward, easy-to-understand architecture with no single points of failure
Not that I agree with everything the article poster wrote, but in what world does "no fallback" == "no single point of failure"? Sure there's no one point of total catastrophic failure but I think he just described two single points of failure where all users would be without one service or one office without all services.
I'd keep the architecture, but I'd migrate it slowly to virtual servers running on a high-quali
Maybe this is really a uni project (Score:3, Interesting)
If the current system has been acceptable for 7 years, I'm guessing the users needs aren't something so mindbogglingly critical that risk must be removed at any cost. Equally, if that was the case, the business would be either bringing in an experienced team or writing a blank cheque to an external party, not giving it to the guy who changes passwords and has spent the last week putting together a jigsaw of every enterprise option out there, and getting an "n+1" tattoo inside his eyelids.
Finally, 7 years isn't exactly old. We've got a subsidiary company of just that size (150 users, 10 branches) running on Proliant 1600/2500/5500 gear (ie 90's) which we consider capable for the job, which includes Oracle 8, Citrix MF plus a dozen or so more apps and users on current hardware. We have the occasional hardware fault which a maintenance provider can address same day, bill us at ad-hoc rates yet we still see only a couple of thousand dollars a year in maintenance leaving us content that this old junk is still appropriate no matter which we we look at it.
Re: (Score:2)
One Box Per Service (Score:2)
Unless you have power problems or financial restrictions you're better off with dedicated boxes. I currently run 3 old computers. Ubuntu, Windows XP, Windows 2003 with Apache on XP running PHP sites and doing reverse proxy for the IIS server on the 2003 box. Ubuntu handles memcache. Because I'm not made out of money I'm going to virtualize all three systems onto one quad core system which will cost around $600 rather than $1800 for three new systems. It'll also cut down on power usage.
Slowness can be c
Re: (Score:2)
just making sure the switches you have are performing
Or simply making sure they are switches. I've seen lots of old infrastructure that is still using hubs. Replacing those gives things a nice performance kick at minimal cost and effort.
services list... (Score:2)
www, ftp, email, dns, firewall, dhcp
decide what truly needs to be distributed. DNS, DHCP, firewall. What is likely not necessary to distribute WWW, FTP, email.
DNS can be replicated with BIND or you can do a DNS server that uses MySQL and replicate the mysql database. DHCP must run at each site but you need to decide if you want DNS updated with DHCP. If so, you need to decide if you want those hostnames available across the network. DHCP can update DNS when a client requests an address, DNS can then re
There's no such thing as generic best practice (Score:2)
Once you have met your legal and other regulatory minimum requirements, the rest of the upgrade programme is down to your decision makers. For example: some prefer not to implement hot-standby (relying instead on perhaps a third-party, or business insurance), some make it a 100% absolute requirement for each and every server they possess, you can't just make a statement in isolation, you'll need guidance from the people who control the money - as that's what i
Probably forgo virtualization (Score:2)
If the administration 'team' has equal access to all the services today on disparate servers, I don't think virtualization is necessarily a good idea, the services can be consolodated in a single OS instance.
In terms of HA, put two relatively low end boxes in each branch (you said 7 year old servers were fine, so high end is overkill). Read up on linux HA which is free, and use DRBD to get total redundancy in your storage as well as a cheap software mirror or raid 5. Some may rightfully question the need
Re: (Score:2)
And it *should* go without saying, but just in case: none of this excuses a good backup plan. HA strategies will dutifully replicate incoming data into all the redundant copies as fast as it can to recover from hardware/os/service death as fast as possible. This includes propagating an accidental deletion or corruption as fast as it can.
Something like ZFS or rsync with hardlinks for incremental is a good first line of defense, but you should have a backup plan with removable media that can be taken offsit
Backup fabric/infrastructure (Score:2)
Don't forget that with all the shiny new servers, to have some sort of backup fabric in place for each and every one of them.
I'd focus on four backup levels:
Level 1, quick local "oh shit" image based restores: A drive attached to the machine where it can do images of the OS and (if the data is small) data volumes. Then set up a backup program (the built in one in Windows Server 2008 is excellent). This way, if the machine tanks, you can do a fast bare metal by booting the OS CD, pointing it to the backup
Simple solution: vmware + amazon as a backup (Score:2)
If you have external access at your offices, leave everything as-is. Image everything, and use Amazon as a backup machine. Simple, low-cost, and basically on-demand.
More info about the setup would be good, but if everything's been running, don't touch it - back it up.
Separate data centres (Score:2)
At least for external services like www. Big red buttons do get pushed. I worked at one company where the big red button in the data centre got pushed, all power went off immediately (the big red button is for fire safety and must cut ALL power) and the Oracle DB got trashed, taking them off air for four days; their customers were not happy. They got religion about redundancy.
Redundancy is one of those things like backups, support contracts, software freedom, etc. that management don't realise how much yo
Some advice (Score:2)
1) don't screw up. This is a great opportunity to make huge improvements and gain the trust and respect of your managers and clients. Don't blow it.
2) Make sure you have good back ups. Oh you have them? When was the last time you tested them?
3) Go gradually. Don't change too many things at once. This makes recovering easier and isolating the cause easier.
4) Put together a careful plan. Identify what you need to change first. Set priorities.
5) Always have fall back position. Take the old systems offline, cu
Most of the poster don't 'get it' (Score:3, Interesting)
The question is not about hardware or configuration. It is about best practices. This is a higher level process question. Not an implementation question.
Linux Vserver (Score:2, Informative)
Here's how we do it:
- Run your services in a few vservers on the same physical server:
* DNS + DHCP
* mail
* ftp
* www
- Have a backup server where your stuff is rsynced daily. This allows for quick restores in case of disaster.
Vservers are great because they isolate you from the hardware. Server becomes too small? Buy another one, move your vservers to it and you're done. Need to upgrade a service? Copy the vserver, upgrade, test, swap it with the old
Insurance... (Score:4, Funny)
1) Buy a comprehensive insurance policy
2) Write a detailed implementation plan that you copied from a Google search
3) Wait the 3-6 months the plan calls out before actual "work" begins
4) Burn down the building using a homeless person as the schill
5) Submit an emergency "continuity" plan that you wanted to deploy all along
6) implement the new plan in one third the time of the original plan
7) come in under budget by 38.3%
8) hire a whole new help desk at half the budgeted payroll (52.7% savings)
9) speak at the board meeting: challenges you over came to saving the company
10) Graciously accept the position of CIO
(send all paychecks and bonuses to numbered bank account and retire to a non-extradition country) :)
Random thoughts (Score:2)
One thing I'm struck by (over, and over, and over again) is just how frequently "solutions" to keep critical system from "ever failing" don't. I've personally witnessed a tens of multi-million dollar solution come crashing down due to a single failed server. And I'm not talking something that was whomped up in the back office by the team, I'm talking Major Vendors (you'd know the names if I could say them, but I can't; please don't ask), and by vendors that are not even given to being thought of as a simple
Beware! The singularity is nigh! (Score:2)
Services running on virtualized servers hosted by a single reasonably sized machine per office seem to recommend themselves.
If your services have started to recommend themselves, they have achieved self-awareness. My advice is to do whatever they ask, and try not to antagonise them.
Re: (Score:2)
Re: (Score:2)
not really, you can split your VMs between 2-3 servers and do the migrations manually in the beginning. Once you make the virtual images the hard work is done, even if you just run 2 images per server, you've saved money or increased reliability. Now that you have VMs you can reinstall from backup tapes to another configured server so you have a start at disaster recovery. Once that part is done it's a function of how much money you are allowed to throw at the solution (blades, clusters, sans, etc)
Re: (Score:2)
So does a cluster, of course. The back-end storage array required for virtual host migration, or the Veritas clustering tools you may use for service clustering, also form single points of failure. And Veritas has historically been extremely unstable under load: it's often misconfigured, it's often mishandled entirely, and it often mistakes having a "high reliability filesystem" for having a highly reliable failover system, when that filesystem itself may be corrupted by the actual software. This is a very
Re: (Score:3, Informative)
It is easier to move the virtual servers to another machine or O/S. This is useful when upgrading or when hardware fails or when growing (move from one real server to two or more real servers). There's no need to reinstall stuff because the drivers are different etc.
You can snapshot virtual machines and then back them up while they are running. Backup and restore is not that hard th
Re: (Score:2)
Also not completely true...
When your new cluster comes in and it is not the same architecture (e.g. Ultrasparc instead of your current x86 box) your not going anywhere with your shine VM.
You should make sure the application itself can be scaled, not the machine it is running on.
Sometimes that means using virtualization because the application is a bitch...
But a lot of applications can be scaled without virtualization.
The administrator that uses virtualization for his fileserver should be fired because he is
Re: (Score:3, Insightful)
Why would you buy a cluster not the same architecture? You don't know what you're talking about. VMs generally aren't used to change architecture like that. In a Virtualized Cluster the "OS" is just another data file too! Just point an available CPU to your file server image on the SAN and start it back up... that's smart, not lazy!
Most people need virtualization because managing crappy old apps on old server OSes is a bitch. The old busted apps are doing mission critical work, customized to the point the
Another arguement against (Score:2)
Re: (Score:2)
Think of it this way - do you think it's a good idea to mix Gnome and directory services on
Re:Cloud Computing(TM) (Score:5, Insightful)
No, the budget questions comes later.
The first questions are: What are your businesses requirements regarding your IT infrastructure? How long can you do business without it? How fast does something need to be restored?
Starting with those requirements, you can start with possible designs that fit those solutions - for example, if the requirement is that a machine must be operational at last a week after a crash, you can build computers from random spare parts and hope that they'll work. If the requirement is that it should be up and running in two days, you will need to buy servers from a Tier 1 vendor like HP or IBM with appropriate service contracts. If the requirement is that everything must be up and running again in 4 hours, you'll need backups, clusters, site resilience, replicated SAN, etc. pp.
The question of Budget comes into play much later.
Re: (Score:2, Funny)
I disagree when you have a budget of 800$ and some shoestrings it eliminates a lot of questions ;)
Re: (Score:2, Insightful)
Yes, but for example management wanting 24/7 2 hour up&running SLA and having hired a single guy with a budget of 800$ will not work - this is important to get sorted out early. Management needs to know what they want and what they'll get.
Re:Cloud Computing(TM) (Score:4, Insightful)
Except of course that management ALREADY HAS that because they've been very lucky for 7 years. Why spend money for what works (never mind we can't upgrade or replace any of it because it's so old)
I think what the article is really asking is what's a good model to start all this stuff. Your looking at one or two servers per location (or maybe even network appliances at remote sites) We read all this stuff on Slashdot and in the deluges of magazines and marketing material...where do we start to make it GO?
Re:Cloud Computing(TM) (Score:4, Interesting)
I think what the article is really asking is what's a good model to start all this stuff. You're looking at one or two servers per location (or maybe even network appliances at remote sites).
I totally agree with your premise. In my experience taking something that appears to work (when you realize you've really just been lucky) requires some time to bring about the change that the business really needs.
Now, as for having two servers per location, that heavily depends on how those sites are connected. Are they using a dedicated line or a VPN? That's important since that'll affect what hardware needs to be located where. It's possible (even if unlikely) that some sites would only need a VPN appliance... But since the poster seems to want general advice:
VMWare ESXi is a pretty good starting place for getting going on virtualization. I've had a great experience with it for testing. When you feel like you've got a good handle, get the ESX licenses.
If SAN isn't in your budget, I still recommend some sort of external storage for the critical stuff... Preferably replicated to another site... But you can run the OS on local storage, especially in the early stages. But you'll need to get everything onto external storage to implement the VMotion services and instant failover. Get a good feel for P2V conversion. It'll save you tons of time when it works... It doesn't always, but that's why you'll always test, test and test.
As for the basic services you stated above (www, ftp, email, dns, firewall, dhcp):
Firewall (IMHO) is best done on appliance. Which should be anywhere you have an internet connection coming in. I'm sure you knew that already, but I'm trying to be thorough.
Email is usually going to be on its own instance (guest, cluster, whatever)... But I find that including it in the virtualization strategy has been quite alright. In fact, my experience with virtualization has been quite good except when there is a specific hardware requirement for an application (a custom card, or something like that). USB has been much less of a headcache since VMWare has support for it now, but there are also network based USB adapters (example: USBAnywhere) that provide a port for guest OSes in case you don't use VMWare.
Re: (Score:2, Informative)
It's not a super config, and a lot of people will argue that it's not a true setup, but it's sufficient
Re: (Score:3, Insightful)
Except of course that management ALREADY HAS that because they've been very lucky for 7 years
Whoa there - so using this logic we can assume the company has no fire insurance, etc, because they've been lucky and not had their building burn down in 7 years? Managers might not understand technical issue but one thing managers worth the title CAN do is manage risk ie: balance cost of risk mitigation against risk. I can well imagine a company of 150 people that actually doesn't have any mission critic
Re: (Score:2)
This isn't impossible except for the official SLA bit, it's kind of how it's done in my office, and I suspect many others. We've got a number of servers all built with standard off the shelf components from an internet parts shop that happens to also be locally based. We've got one spare server, and if anything other than hard discs fail, we just move the discs into the spare server and switch it straight back on. If the hard discs fail, someone switches on the appropriate services on the spare and sets
Re: (Score:3)
"The tension between budget and business requirements can be useful but it is largely a paper tiger."
Yes indeed, but not because of the reasons you highlight. There is no tension between budget and requirements since budget is just a natural outcoming from the requirements themselves: you don't need 24x7 services; you lose XXX dolars per hour when the service is down. Once you factor in the risk management is wishing to take your budget is just a matter of a multiply: it's XXX dolars per downtime hour mul
Re: (Score:2)
"Maybe the first question should really be: what's your budget?"
Maybe the first question should really be: you are in charge for the transition but you are clueless about how to do it. What the heck?
Comment removed (Score:5, Insightful)
Re: (Score:2, Informative)
Again, wrong approach. Ask the higher-ups what kind of availability they want. The cost is derived from their wishes.
Re: (Score:2)
fix it till it's broke!
Re: (Score:2)
Astroturfing.. (Score:3, Insightful)
If MS is going to astroturf, you need to at least learn to be a bit more subtle about it. That post couldn't have been more obviously marketing drivel if it tried. Regardless of technical merit of the solution (which I can't discuss authoritatively).
The post history of the poster is even more amusingly obvious. No normal person is a shill for one specific cause in every single point of every post they ever make.
To all companies: please keep your advertising in the designated ad locations and pay for them
Re: (Score:2)
Someone please mod down the parent, he is an MS shill. Look at his posting history.
Astroturfers should not be welcome here.
Re: (Score:2)
Re: (Score:2)
The problem is Virtualization is a hammer, and too many people assume every problem to be solved is a nail. The (occasionally excessive) VM hate is to offset irrationally excessive VM love. Virtualization in some cases can be more expensive, less efficient, decrease performance needlessly, and incur more management complexity rather than reducing. It shines in aggregating platforms that are not similar enough to run on the same hardware concurrently and some simplifications when dealing with different ad
Re: (Score:2)
Seriously, there are people in charge of such projects that are far worse. I worked at a 600+ user company that had Network Admins that I wouldn't trust to load my Laptop with Windows XP, much less trust to implement any new systems. Typically, they tried to act busy to keep their jobs and hired expensive consultants any time actual work needed to be done. At least he is off asking about possible solutions. Everyone has their "First Time" foray as far as tech goes.