Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Upgrades

Scheduling Large Scale Server Upgrades/Outages? 85

thesandbender asks: "I've inherited my companies DST patching project and I have to schedule upgrades for 7000+ servers over the course of the next few weeks. Of course each group inside the company has different SLA's and outage windows. I need to somehow turn the pile of spreadsheets I have into a database and create a schedule that spreads the load over our pool of system administrators. There is no way I can reasonably accomplish this by hand, and even software for other industries/applications that could take a few steps out of the process would be appreciated. Does anyone know of a rule based scheduling system where I provide the available outage windows and a priority ranking for each system and the scheduler will recommend the order in which they should be upgraded?"
This discussion has been archived. No new comments can be posted.

Scheduling Large Scale Server Upgrades/Outages?

Comments Filter:
  • by djh101010 ( 656795 ) * on Thursday January 11, 2007 @06:43PM (#17564970) Homepage Journal
    I think if I had to do this, I'd establish a priority ranking of the systems, taking into consideration critical path and cascading dependances, and then assign the highest priority ones first. When you finish that, come back to the pool for the next high priority job. When you're out of high priority jobs in the pool, move on to mid-priority, and so on. Trying to keep a bunch of inter-related steps in synch will drive you, and your sysadmins, crazy. Set priorities and let the big boys and girls do their job.
  • Then plug it back in real quick.
     
  • My advice: (Score:5, Funny)

    by Guppy06 ( 410832 ) on Thursday January 11, 2007 @06:51PM (#17565092)
    shutdown -h now

    Fuck the users! They exist solely to bemuse the sysadmin! Odds are they've been getting uppity lately and need to be taught a lesson, anyway.
    • Reboot, Deny, Deny works well.

      *Admin reboots server*

      User: I'm getting an Outlook error.
      Admin: Reboot your computer.
      User: Okay, it's working now.
      Admin: Must have been your workstation.

      *Click*
  • by mkcmkc ( 197982 ) on Thursday January 11, 2007 @06:54PM (#17565152)
    If you just put this off for a few months, the problem will probably just go away...
    • Once upon a time I worked in operations for a Very Large Telecommunications Company (TM). One of my primary duties was to compile an onerous weekly report on server uptimes and send it to one of the directors, via his secretary. One day I found out that his secretary was moving to a different department, so I stopped sending them, to see what would happen. No one ever asked me about those reports again.
      • by j-pimp ( 177072 )

        One day I found out that his secretary was moving to a different department, so I stopped sending them, to see what would happen.No one ever asked me about those reports again.

        The question is, was he not reading them or did he have someone else prepare them because he had no idea how his old secretary got them.

      • At least now I know why I got asked to start compiling that damned report. But don't worry, I know where you work ;)

        [not really...]
        • by mkcmkc ( 197982 )
          Yeah, but you don't know what I did six months ago...here...in the northern hemisphere... ;-)
  • >I have to schedule upgrades for 7000+ servers ... pile of spreadsheets ...

    Somebody bought 7000 servers with no plan for upgrades?

    (Patching for DST, get a new OS...)
    • Re: (Score:3, Insightful)

      by GuyverDH ( 232921 )
      "(Patching for DST, Get a new OS...)"

      Sorry friend, but every OS in the world, that's used in the United States, that implements automatic time shifting due to Daylight Savings Time / Daylight Standard Time changes, has to be patched.

      The reason being, the start and stop dates changed.
      Why? Because someone told GWB that it was a good idea, and that it would help in the war on terror. Who really knows for sure, unless he just bought stock in the consumer electronics companies that stand to make a killing on n
      • Re: (Score:3, Interesting)

        by ErikTheRed ( 162431 )
        Why? Because someone told GWB that it was a good idea, and that it would help in the war on terror.
        Bzzzzt. Wrong. Somebody told him it would reduce national energy consumption. But than you for playing.

        That being said, IMHO the whole DST thing is stupid and obnoxious.
        • Re: (Score:3, Insightful)

          by GuyverDH ( 232921 )
          That.... was a joke, as was the line of bullshit that others fed the President. They told him it would reduce energy consumption? It's not going to reduce consumption, if anything, it will just increase energy consumption. What with all the goods that will have to be re-manufactured with new chips, plus all the overtime, burning the late night oil, patching all the boxes to make them work with the new standard. Then add in all the new hardware upgrades that will have to be purchased, or operating syste
          • by tsstahl ( 812393 )
            Just another idiotic idea, stupidly implemented, and signed into law by someone who seems to have the comprehension of a 12 year old.

            I feel compelled to point out the 435 other someones had to act before the pen touched the paper.
            • Notice I said "Signed into law" - I didn't claim that he made the law, or wrote it, just that he signed it - without signature statement I might add.
          • by Jerf ( 17166 )
            You need to be a bit more careful. DST incontrovertibly "saves" energy, vs. non-DST. It's statistical. You can look it up. The energy savings are in fact quite significant.

            What I find stupid, and what you may want to glom onto as the reason to find DST stupid, is that the root problem is the idea that we should get up at 7am, regardless of 7am's relationship to the sun itself. I also find it stupid that everybody has to get off of work at the exact same minute. If we were more flexible (for real, not just l
      • Re: (Score:2, Funny)

        by jazman_777 ( 44742 )
        This of course will require loads of admins to wait for 1AM, then push the time back to 12AM manually.


        How could this have been modded insightful, when everyone knows that you turn it back at 2AM?

        • Have you tried to keep databases in sync across multiple timezones, without shifting the time simultaneously?

          ie - for 1 hour, there's a 2 hour difference?

          ehh.... sorry - thanks again - in this instance, both systems revert simultaneously.

          One at 2AM, the other at 1AM - in this case, my office works out of CST, while the corporate is EST.

          For what it's worth - almost everyone rolls their clocks back BEFORE THEY GO TO SLEEP, not at 2AM.
          But that's okay, I forgive you. It's not often that you get to one up anoth
          • by ASCIIMan ( 47627 )
            And who keeps databases using local times? Seriously. UTC exists for a reason.
            • Not at the actual database layer, but in the field (or I guess column) layer - some of the programmers actually keep track of local time and tz - for what, I know not - I'm not a DBA, and thankfully don't have to wade through their reasoning.
  • by RabidMonkey ( 30447 ) <canadaboy.gmail@com> on Thursday January 11, 2007 @07:00PM (#17565240) Homepage
    This used to be my problem ... for the DST change, we have thousands of servers and workstations to deal with. I was getting worried, but instead of taking it on, we found a PM and now it's their problem.

    The moral of the story: never try.

    • "The moral of the story: never try."

      If I could mod this up, I would.
    • by pAnkRat ( 639452 )
      Homer: remember son, if something is hard to do, it ist not worth trying.
    • Parent is right. A PM should be able to use their tool of choice to do this easily. For example, MS Project can import your spreadsheet data. Once imported, the PM should be able to assign priorities to groups of servers/patches, then manipulate the schedule and assign resources. Then, when the PM tells you that meeting your schedule with the available resources is completely impossible, you can both transfer somewhere warm and sunny.
  • by CelestialWizard ( 13685 ) on Thursday January 11, 2007 @07:02PM (#17565276)
    If you have 7000+ Windows Servers you should already be running a software patching solution such as SMS, WSUS, etc...

    Sure you'll spend a large amount of time sorting out which server[s] (server group[s])should be patched when, but once that is done - you should be able to schedule them within your chosen solution.

    Take WSUS for example. Organise your servers into groups, approve the update and set each group's Windows Update GP properties appropriately.
    • This post is right on the point. You don't need a spreadsheet or database, you need a good management box to handle it for you.

      WSUS would work, but there are better products out there that and they would give you a lot more function. Hercules, from Citadel, is a good one and can handle 7000 boxes with a few systems in the right place. But it is not limited to Windows only patches, you can custom write you own upgrades for any of the apps on the box. They have scheduling, an inventory (your database),

    • I may have missed it, but I don't think he said what OS he was using. These might be 7000 Unix or Linux boxes or, more likely, a mixture of all the above.
  • BladeLogic (Score:2, Interesting)

    by Webdude ( 5964 )
    I interviewed a while ago with a company called BladeLogic, they provide a suit of products for these type of tasks and all types of DataCenter management. I would defiantly give them a look they could help out on this project and many many in the future. http://www.bladelogic.com/ [bladelogic.com]
    B
    • by Wog ( 58146 ) on Thursday January 11, 2007 @10:10PM (#17567478)
      How do you defiantly look at a product?

      SCREW YOU! I'M GOING TO REVIEW YOU, AND IF I LIKE YOU, I'M GOING TO IMPLEMENT YOU, AND YOU'LL LIKE IT!

      (Lameness filter says I have too many caps. But I think they were appropriate. Bah.)
  • Ok, so you have spreadsheets with admins, windows, servers, priorities, etc., in them, and you're just looking for a way to schedule everything? Can you just export the spreadsheets to CSV and write a script to do it for you?
  • by div_2n ( 525075 )
    Check out MP2. Our maintenance guys use it to schedule and track maintenance of everything in the plant. They swear by it. I believe you could use it for server maintenance, but I haven't tried it.

    I don't know much about it, but I found one site that discusses it here. [datastream.net]
  • Hi, I can't help you. I've no knowledge at all about this field. However, could someone make me a little bit less stupid and explain me those acronyms ? acronymattic has 197 TLAs for "DST" but I couldn't find the one which would fit for sure! SLA, that was standing for "Site Level Aggregator", right?
  • How about using WSUS?

    http://www.microsoft.com/windowsserversystem/updat eservices/default.mspx [microsoft.com]

    That, along with proper scripting of "shutdown -r /m \\computername" should get you through it.

  • Hi. DST changeover is in early March. If you aren't already halfway done with your 7000 server project, and they all require downtime, you are hosed. Find a new job.

    The good news is most Linux systems don't require a reboot for this change, so they can be done sans outage.
  • Delegate (Score:4, Funny)

    by 4of12 ( 97621 ) on Thursday January 11, 2007 @07:38PM (#17565752) Homepage Journal
    When computers get overloaded with work like this (host lookups, for example) they ask for help from other computers. As my stupid first try, how about asking each sysadmin to run a spreadsheet column of hostnames through an md5hash and let them convert servers with a '1' on the first day, 'a' on the tenth day, etc.?
  • I would think that a company managing 7000+ servers would have an automated patch scheduling system similar to BMC Marimba [bmc.com] Altiris [altirispro.com], or Opsware [opsware.com]. You surely don't have time to purchase and install one of these mosters now, but it might be wise to pursue in the future.

    There are also some GPL things that may work. Can't think of them right off hand. If these are *nux desktops/servers, you have plenty of time to write a perl/bash/python to accomplish the task. Some other slashdot user is going to have to give
    • There are also some GPL things that may work. Can't think of them right off hand. If these are *nux desktops/servers, you have plenty of time to write a perl/bash/python to accomplish the task. Some other slashdot user is going to have to give you advice for a windows environment at this stage of the game you are in.

      Hi, I'm "some other Slashdot user," and my advice for the Windows environment is the same as for Linux. Well...almost. If you are running Windows XP on the desktop or 2003 on the server (or
      • With all of the comments, only ocbwilg knew or bothered to set this guy straight? The DST change is simple, even if you do not have a patch management system. Nothing a simple script, a file with all of the server names and a some time to let it run won't take care of. No reboot is required, so SLAs do not need to be considered here. I do agree that any company with seven thousand servers needs patch management. In fact, I call bullshit. There is no way in hell they even operate without one.
      • by karnal ( 22275 )
        Microsoft (from what I've heard from my desktop folks at work) do have a patch for Windows 2000 - it's just not exactly published yet.

        Let's just say the company I work for doesn't have more than 1% WinXP....
        • Re: (Score:3, Insightful)

          by ocbwilg ( 259828 )
          Microsoft (from what I've heard from my desktop folks at work) do have a patch for Windows 2000 - it's just not exactly published yet.

          Let's just say the company I work for doesn't have more than 1% WinXP....


          Yes, the word is that there is a "patch" for Windows 2000. But since Windows 2000 is out of mainstream support Microsoft is only making it available to companies that have purchased extended support agreements for their Windows 2000 systems. Yes, it probably is part of Microsoft's strategy to push
    • I would think that a company managing 7000+ servers would have an automated patch scheduling system

      Nah. From personal experience I would say that most of them are pretty disorganized. And since they are very much cost driven they don't have cash for luxuries such as automated patch/upgrade tools. I mean, spreadsheets are free as is overtime for salaried employees, right?
      • Mod +1 Snarky

        Yeah, because the cost of WSUS ($0) is just too much to turn a profit when factoring in the jobs they are creating.

        For the Acronym Illiterate WSUS = Windows Server Update Services

    • We are doing this with Tivoli for some very large shops. See capitalsoftware.com/Forums We have written a compliance report gui and we can queue 1000's of machines at one time. Also, for SUN and IBM all you have to do (in most cases is update the TZD files using thier utilities. You can contact me at john_williscapitalsoftware.com
  • Tablix [tablix.org] is a free software package for solving various types of scheduling problems. If you have enough time on your hands to write the necessary modules for your particular problem I'm sure it can schedule your upgrades in the most efficient way.
  • This is a political problem.

    The best you can do is come up with a realistic schedule for the actual timeframe you have available. And by realistic, I mean working off-hours. Then whomever is at the top of the chain tells everyone else that the upgrade happens at this time, and that's that.
  • by Short Circuit ( 52384 ) * <mikemol@gmail.com> on Thursday January 11, 2007 @09:30PM (#17567064) Homepage Journal
    I hate to say it, but Microsoft Access fits your needs almost perfectly, in this case. It can import the data from your spreadsheets, if they're properly formatted. (And they'd have to be, if you wanted to have software make your schedule for you.)

    Once your data is in place, you write a query that includes a calculated field for the heuristics you're looking for. Run a query against that that checks against a table containing your available time slots, and you'll have the data you're looking for. (Or, at least, something that will do most of the work for you.)

    You've got to patch 7000 servers in four weeks. Do you really want to spend a few days learning a a new software package that will do everything when you could take a piece of software you probably already know and simplify the problem in only a day?
  • by Jerf ( 17166 ) on Thursday January 11, 2007 @10:21PM (#17567574) Journal
    Some people, as I post this, have sort of strongly hinted at this, but nobody else has directly asked this yet.

    What are you already using to patch your 7000+ servers? By the time you reach 7000+, this should have been a problem long solved. Hell, I'd expect it to be solved by the 100+ point.

    What's so special about this DST patch that your current process can't handle it?

    Because if the answer is "we have no process", you've long since lost, and good odds your systems are already seething piles of unpatched, compromised machines.

    If you do have a process but it's inadequate, and Slashdot might actually be able to help you, you'll need to be a little more clear on exactly what the problem is, if it isn't "we have no process".

    (What is it with people lobbing questions onto Ask Slashdot and almost, but not quite, never following up? Is the lead on Ask Slashdot so long that people die before it gets posted, or just give up? Obviously I ask this before I can tell whether "thesandbender" is one of the rare exceptions... as of this writing, no, unless (s)he's been modded into oblivion.)
  • no redundancy?

    If you had that number of servers you can just take one, upgrade, test, move onto the next and keep on going. There should be 0% downtime.

    However if you have crapware that cannot cope in such situations maybe you should be badgering the vendor so that it can be rolled out in a more sensible manner.

  • automated by hand (Score:3, Insightful)

    by tsstahl ( 812393 ) on Thursday January 11, 2007 @11:49PM (#17568358)
    Managers must manage.

    You don't have the time to put in a system, but you can craft a one off solution.

    Your solution starts by sub-dividing your 7k servers into groups based on business units. Poke around to find out what their SLA is, and then _tell_ them that you are going to bend the SLA a little in order to get this 'OMG CRITICAL PATCH' onto your farm.

    No offense, but I have found scripting abilities in Unix/Linux shops to be of a lot higher quality than Windows shops. nevertheless, you do have some talent whether you know it or not. Enlist this talent and use scripting for a lot of the nitty gritty details.

    Quest Fastlane Reporter, Winbatch, and native WMI are great ways to report on pre and post conditions of servers.

    Delegate, delegate, delegate. Let your team plan the methods and schedules for each business unit's servers

    Once over the crisis, use the information you have gathered to generate a requirements document and go shopping.

    Remember, the key to delegating is trust. You are in charge of managing the 7k servers; you are not in charge of doing the individual upgrades/patches.

    I'm sorry to take a bit of a condescending tone, but I'm trying to be clear, not flatter your ego. To reiterate, the bottom line here is that with the time you have, you will be doing an automated manual upgrade. You may find that the process you cobble together will actually become a great plan B when critical patches need to be made; especially if you design with that goal in mind.

    Use the 'scare' from the event quickly to get budget money for a Real Patch System(TM).

    Good luck!
  • Move to Arizona, where we don't have Daylight Savings Time.
  • Scheduling using the earliest deadline late algorithm from the real time computing field might work. Based on the maintenance windows you should have different deadlines for different systems.
  • I think folks are focusing too much on the patching mechanism (i.e. how do I patch 7000 machines), and missing the point of the scheduling of the upgrade (*when* should I patch each group of machines).

    Take a package like Minkowsky [r-goetz.de] , or other group calendar package, enter each of the groups you have an SLA with, and block out their you-can't-do-maintenance-here windows as "meetings" for them.

    Then try to schedule a "meeting" with as many of them as possible to do the upgrade, and a second meeting with as

  • Any decent, current PM system (Altiris PM, MS-WSUS, MS-SMS, etc.) - using a SQL or other database back end - should have a method to identify the devices to patch and build a collection and allow you to specify a certain time frame for applying the patch to the selected groupings on separate schedules and perform any necessary reboots all in an automated fashion. (Sorry about the run-on sentence).

    Altiris (or any other vendor, this is just the one I am most familiar with) would probably LOVE to have the oppo
  • You know you could...oh I don't know...maybe patch ONE server and then write a script that would sync the other 6999 servers with it!

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...