Scheduling Large Scale Server Upgrades/Outages? 85
thesandbender asks: "I've inherited my companies DST patching project and I have to schedule upgrades for 7000+ servers over the course of the next few weeks. Of course each group inside the company has different SLA's and outage windows. I need to somehow turn the pile of spreadsheets I have into a database and create a schedule that spreads the load over our pool of system administrators. There is no way I can reasonably accomplish this by hand, and even software for other industries/applications that could take a few steps out of the process would be appreciated. Does anyone know of a rule based scheduling system where I provide the available outage windows and a priority ranking for each system and the scheduler will recommend the order in which they should be upgraded?"
Why micromanage this? (Score:4, Insightful)
You just pull the plug (Score:1)
My advice: (Score:5, Funny)
Fuck the users! They exist solely to bemuse the sysadmin! Odds are they've been getting uppity lately and need to be taught a lesson, anyway.
Re: (Score:2)
Some administrators believe that if a server dies for whatever reason, leave it off - this way they're sure to be aware of the outage. These folks will set the eeprom to not automatically boot the box. After the power spins up the obp, it stops at an "OK" prompt.
Others believe that the server should just come up after a crash - sadly, thi
Outage? What outage? (Score:1)
*Admin reboots server*
User: I'm getting an Outlook error.
Admin: Reboot your computer.
User: Okay, it's working now.
Admin: Must have been your workstation.
*Click*
Re: (Score:2)
DST? SLA? I don't think either of those are obscure... but I manage servers in a LSDC so maybe it's just part of that world.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I Googled. Daylight Savings Time? Dynamic Stress Test? Data Systems Test? Data Storage and Transfer?
Lovely. So now this is google and boolean lesson time? You may have noticed, that daylight savings time was changed this year in the US. Or maybe you didn't. It's kicking in on a different time than planned, you see. Systems know when they _think_ it is to change, and that's not the right week based on the recent legislation.
It's _fine_ if someone doesn't understand an acronym. Really, it is. What is pointless and a waste of time, is posting a snarky comment whining that they don't know what it is
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
Procrastinate (Score:5, Funny)
Which reminds me... (Score:2)
Re: (Score:2)
The question is, was he not reading them or did he have someone else prepare them because he had no idea how his old secretary got them.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
[not really...]
Re: (Score:2)
Run away, into a hole somewhere (Score:2, Insightful)
Somebody bought 7000 servers with no plan for upgrades?
(Patching for DST, get a new OS...)
Re: (Score:3, Insightful)
Sorry friend, but every OS in the world, that's used in the United States, that implements automatic time shifting due to Daylight Savings Time / Daylight Standard Time changes, has to be patched.
The reason being, the start and stop dates changed.
Why? Because someone told GWB that it was a good idea, and that it would help in the war on terror. Who really knows for sure, unless he just bought stock in the consumer electronics companies that stand to make a killing on n
Re: (Score:3, Interesting)
That being said, IMHO the whole DST thing is stupid and obnoxious.
Re: (Score:3, Insightful)
Re: (Score:1)
I feel compelled to point out the 435 other someones had to act before the pen touched the paper.
Re: (Score:2)
Re: (Score:2)
What I find stupid, and what you may want to glom onto as the reason to find DST stupid, is that the root problem is the idea that we should get up at 7am, regardless of 7am's relationship to the sun itself. I also find it stupid that everybody has to get off of work at the exact same minute. If we were more flexible (for real, not just l
Re: (Score:2, Funny)
How could this have been modded insightful, when everyone knows that you turn it back at 2AM?
Re: (Score:2)
ie - for 1 hour, there's a 2 hour difference?
ehh.... sorry - thanks again - in this instance, both systems revert simultaneously.
One at 2AM, the other at 1AM - in this case, my office works out of CST, while the corporate is EST.
For what it's worth - almost everyone rolls their clocks back BEFORE THEY GO TO SLEEP, not at 2AM.
But that's okay, I forgive you. It's not often that you get to one up anoth
Re: (Score:1)
Re: (Score:2)
Used to be my problem (Score:5, Insightful)
The moral of the story: never try.
Re: (Score:2)
If I could mod this up, I would.
Re: (Score:1)
Re: (Score:1)
Use Windows Correctly (Score:4, Informative)
Sure you'll spend a large amount of time sorting out which server[s] (server group[s])should be patched when, but once that is done - you should be able to schedule them within your chosen solution.
Take WSUS for example. Organise your servers into groups, approve the update and set each group's Windows Update GP properties appropriately.
Re: (Score:2)
This post is right on the point. You don't need a spreadsheet or database, you need a good management box to handle it for you.
WSUS would work, but there are better products out there that and they would give you a lot more function. Hercules, from Citadel, is a good one and can handle 7000 boxes with a few systems in the right place. But it is not limited to Windows only patches, you can custom write you own upgrades for any of the apps on the box. They have scheduling, an inventory (your database),
Re: (Score:1)
BladeLogic (Score:2, Interesting)
B
Re:BladeLogic (Score:5, Funny)
SCREW YOU! I'M GOING TO REVIEW YOU, AND IF I LIKE YOU, I'M GOING TO IMPLEMENT YOU, AND YOU'LL LIKE IT!
(Lameness filter says I have too many caps. But I think they were appropriate. Bah.)
Re: (Score:1)
get your boss to forbid to you look at it... then look at it.
Re: (Score:1)
Script it? (Score:2)
MP2 (Score:2)
I don't know much about it, but I found one site that discusses it here. [datastream.net]
Daylight Saving Time patching ?! (Score:2, Funny)
Re: (Score:1)
SLA = Service level agreement
WSUS/Shutdown Command (Score:2)
http://www.microsoft.com/windowsserversystem/updat eservices/default.mspx [microsoft.com]
That, along with proper scripting of "shutdown -r /m \\computername" should get you through it.
Re: (Score:2, Informative)
Oh, and Java needs to be patched separately too. They store their timezones internally, instead of consulting the operating system.
You are fucked (Score:2)
The good news is most Linux systems don't require a reboot for this change, so they can be done sans outage.
Delegate (Score:4, Funny)
This request surprises me for this many machines. (Score:2)
There are also some GPL things that may work. Can't think of them right off hand. If these are *nux desktops/servers, you have plenty of time to write a perl/bash/python to accomplish the task. Some other slashdot user is going to have to give
Re:This request surprises me for this many machine (Score:3, Insightful)
Hi, I'm "some other Slashdot user," and my advice for the Windows environment is the same as for Linux. Well...almost. If you are running Windows XP on the desktop or 2003 on the server (or
Re: (Score:1)
Re: (Score:2)
Let's just say the company I work for doesn't have more than 1% WinXP....
Re: (Score:3, Insightful)
Let's just say the company I work for doesn't have more than 1% WinXP....
Yes, the word is that there is a "patch" for Windows 2000. But since Windows 2000 is out of mainstream support Microsoft is only making it available to companies that have purchased extended support agreements for their Windows 2000 systems. Yes, it probably is part of Microsoft's strategy to push
Re:This request surprises me for this many machine (Score:2)
Nah. From personal experience I would say that most of them are pretty disorganized. And since they are very much cost driven they don't have cash for luxuries such as automated patch/upgrade tools. I mean, spreadsheets are free as is overtime for salaried employees, right?
Re: (Score:1)
Yeah, because the cost of WSUS ($0) is just too much to turn a profit when factoring in the jobs they are creating.
For the Acronym Illiterate WSUS = Windows Server Update Services
Re: (Score:1)
Shameless plug (Score:2)
This is not a technical problem (Score:1)
The best you can do is come up with a realistic schedule for the actual timeframe you have available. And by realistic, I mean working off-hours. Then whomever is at the top of the chain tells everyone else that the upgrade happens at this time, and that's that.
I'm gonna get mod-bombed... (Score:3, Insightful)
Once your data is in place, you write a query that includes a calculated field for the heuristics you're looking for. Run a query against that that checks against a table containing your available time slots, and you'll have the data you're looking for. (Or, at least, something that will do most of the work for you.)
You've got to patch 7000 servers in four weeks. Do you really want to spend a few days learning a a new software package that will do everything when you could take a piece of software you probably already know and simplify the problem in only a day?
How are you already handling this? (Score:3, Insightful)
What are you already using to patch your 7000+ servers? By the time you reach 7000+, this should have been a problem long solved. Hell, I'd expect it to be solved by the 100+ point.
What's so special about this DST patch that your current process can't handle it?
Because if the answer is "we have no process", you've long since lost, and good odds your systems are already seething piles of unpatched, compromised machines.
If you do have a process but it's inadequate, and Slashdot might actually be able to help you, you'll need to be a little more clear on exactly what the problem is, if it isn't "we have no process".
(What is it with people lobbing questions onto Ask Slashdot and almost, but not quite, never following up? Is the lead on Ask Slashdot so long that people die before it gets posted, or just give up? Obviously I ask this before I can tell whether "thesandbender" is one of the rare exceptions... as of this writing, no, unless (s)he's been modded into oblivion.)
7000 servers and (Score:2)
no redundancy?
If you had that number of servers you can just take one, upgrade, test, move onto the next and keep on going. There should be 0% downtime.
However if you have crapware that cannot cope in such situations maybe you should be badgering the vendor so that it can be rolled out in a more sensible manner.
automated by hand (Score:3, Insightful)
You don't have the time to put in a system, but you can craft a one off solution.
Your solution starts by sub-dividing your 7k servers into groups based on business units. Poke around to find out what their SLA is, and then _tell_ them that you are going to bend the SLA a little in order to get this 'OMG CRITICAL PATCH' onto your farm.
No offense, but I have found scripting abilities in Unix/Linux shops to be of a lot higher quality than Windows shops. nevertheless, you do have some talent whether you know it or not. Enlist this talent and use scripting for a lot of the nitty gritty details.
Quest Fastlane Reporter, Winbatch, and native WMI are great ways to report on pre and post conditions of servers.
Delegate, delegate, delegate. Let your team plan the methods and schedules for each business unit's servers
Once over the crisis, use the information you have gathered to generate a requirements document and go shopping.
Remember, the key to delegating is trust. You are in charge of managing the 7k servers; you are not in charge of doing the individual upgrades/patches.
I'm sorry to take a bit of a condescending tone, but I'm trying to be clear, not flatter your ego. To reiterate, the bottom line here is that with the time you have, you will be doing an automated manual upgrade. You may find that the process you cobble together will actually become a great plan B when critical patches need to be made; especially if you design with that goal in mind.
Use the 'scare' from the event quickly to get budget money for a Real Patch System(TM).
Good luck!
the cure for your DST woes (Score:2, Interesting)
EDL (Score:1)
Umm... Minkowsky? Google Calendar? (Score:2)
Take a package like Minkowsky [r-goetz.de] , or other group calendar package, enter each of the groups you have an SLA with, and block out their you-can't-do-maintenance-here windows as "meetings" for them.
Then try to schedule a "meeting" with as many of them as possible to do the upgrade, and a second meeting with as
Patch Management products are your answer. (Score:1)
Altiris (or any other vendor, this is just the one I am most familiar with) would probably LOVE to have the oppo
A script anyone? (Score:2)