Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Technology

Required Practices for a Network Operations Center? 26

hayduke.com asks: "I've recently been assigned to a program that is designing a 'Network Operation Center (NOC)'. I started to look for books, online material and other sources to help define a baseline for the Services Level Agreement for our intended customers. Not having any customers yet we are trying to incorporate the design elements that will provide the best possible level of service to the largest number of customers. A search on my favorite search engine brings up a lot of articles that have companies boasting that they have been recognized for being 'Best Practice' leaders in their respective fields but there are no references as to what those practices are. As this will be a NOC (pro-active) as opposed to a Call Center (reactive), I would like to know what other people think that NOC should be at bare minimum or if there are 'standards' that all NOCs should be held to."
This discussion has been archived. No new comments can be posted.

Required Practices for a Network Operations Center?

Comments Filter:
  • If you are being proactive, how about installing a crash reproting software similar to Bugtoaster on your machines. I haven't really looked at Bugtoaster's corprate offerings, but there should be something out there that reports all crashes on all machines to the NOC. (Though in acutality those things don't catch ALL crashes)
  • by jo42 ( 227475 ) on Wednesday November 06, 2002 @11:19AM (#4607764) Homepage
    1) Lots of flat panel monitors showing network status, diagrams, graphs, etc.

    2) Lots and lots and lots of blinking lights.

    3) Biometric-based access, such as finger, palm and retinal scanners.

    4) Big, ugly dude guarding the front door.

    5) Hire Linux weeniers from slashdot to run it all for you.

    • LCD's aren't enough. If you want a real 3733t NOC, rename it a "Command Center" and get bigger gas-plasma displays.

      Give the people working there military sounding titles for wargames. For example, the guy who watches the WAN is the "Night Distributed Network Watch Commander". The guy who watches the mainframe consoles is the "Enterprise Systems Surveillance Officer". The manager on duty is the "Command Post Commander in Chief".

      You also need a electronic map of the world, even if your company only operates in one city.
    • by green pizza ( 159161 ) on Wednesday November 06, 2002 @12:38PM (#4608531) Homepage
      If you're in the Silicon Valley you'll want some industrial or retro funiture, but if your company is located anywhere else, you'll want the "me too" look that only Aeron chairs can provide.

      I know we're just joking about these requirements, but they're scarily familar to three NOCs I've been involved with. I think there must be some unwritten ruleset that goes something along these lines:
      1) 50% of NOC budget must be spent on funriture and flat-panel displays.
      2) Trendy lighting in NOC must seriously interfere with trendy displays. (example: if room is equiped with halogen spot lighting, at least one non-movable light should be aimed at a projection screen).
      3) NOC must be located in the most inconvenient area of the most inconvenient building.
      4) Actual NOC computers must be running the latest, untested wiz-bang buggy software on the latest, untested wiz-band buggy hardware.
      5) Half of the NOC staff must be completely unskilled, impersonal, and unwashed.
      6) The other half othe NOC staff must be anal, uptight, and permanently pissed off.

      Server Room layout is another story... but does match rule #2 quite well... the perfect server room is often located as far from the building's loading dock as possible. With a proper pallet jack, it should take at least 30 minutes to haul a crated SGI Origin or Sun Enterprise server from the loading dock to the server room. Smaller items should take no less than 15 minutes. Shaky ramps, cramped elevators, and narrow hallways are a plus.
    • make it look like that command center in Hackers [imdb.com]...man, i loved that movie, but what a joke the technical aspects were (especially the command center)...
  • Dear Slashdot (Score:3, Insightful)

    by Anonymous Coward on Wednesday November 06, 2002 @11:22AM (#4607796)
    I have been asked to do a job I know jack-all about. I don't know why they asked me, perhaps because I lied and said I knew something about it. Anyway, all that doesn't matter now, because if I don't find someone pretty damn quick to do my job for me, and for free, I'm going to be found out. So please if a few of you nice slashdotters could submit a draft for the design of a Network Operation Center, I'd be ever so grateful. I'd still have a job, and I'll still be able to convince the dummies that I'm not one of them.Of course the only thing in it for you is the satisfaction of saving some dumb freeloader's butt and getting him out of a tight situation.

    Yours,

    L. Ardass
  • Good accounting! (Score:3, Insightful)

    by neitzsche ( 520188 ) on Wednesday November 06, 2002 @11:23AM (#4607798) Journal
    If you are providing and charging for backup services, make logs available to individual clients. At the very least, have SOME way to let me know how many bytes you backed up each night if there is an arbitrary monthly limit.

    Warnings on the first day of the month (if day 1's use * 30 will exceed limit) would be appreciated.

    Warnings on days with more than 50% more than normal days use deserve a warning of some sort. Each client should be able to configure their own warning levels.

    If network badwidth limits are known in advance, warnings at 90% and 95% would be very useful.
  • by sql*kitten ( 1359 ) on Wednesday November 06, 2002 @11:50AM (#4608058)
    I started to look for books, online material and other sources to help define a baseline for the Services Level Agreement for our intended customers.

    Host a box at Exodus or Level3 and have a read of the SLA they give you. Beter yet, just call them up and ask for a quote and a salesman to call, no need to spend any of your own money. It's probably copyrighted so you can't just use it for your own customers, but it'll give you an idea of where to start.
  • by FreeLinux ( 555387 ) on Wednesday November 06, 2002 @11:52AM (#4608079)
    I strongly recommend you read this [theregister.co.uk], definitive guide to establish your procedures and develop your SLAs.
  • by walt-sjc ( 145127 ) on Wednesday November 06, 2002 @12:15PM (#4608311)
    Not a flame or anything, but seriously. Get someone who knows what the hell they are doing to do some consulting for you. There are lots of them on the market, and you can get them cheap. Hey, you plan to make money with this, right? Don't want to lose your ass? Then you need someone with experience. You woudn't start a business without consulting a lawyer and CPA would you? While you don't mention what experience YOU have, I'm assuming that you have SOME, but not much based on your questions.

    If you are gonna provide an SLA, you want TECHNICAL advice and LEGAL advice. Most SLA's are actually toothless in real life. The lawyers give you enough outs that you will never have to pay up with most customers (a few have the talent to see through the crap and make changes to your contract to put teeth back in.) Even though lawyers are expensive, it pays for itself in the long run.

    The advise on slashdot is going to be spotty at best, especially in the light that so many NOC's are run poorly. Without experience on hand, you will run into the SAME traps / problems that most NOC's with inexperienced leaders run into.

    Well, here are a few things that you may need.
    A TESTED disaster recovery plan for servers, network, power, and cooling.
    A trouble ticket system customized for your needs usable by inside and out (internally generated tickets and customer generated tickets.)
    A network monitoring / management system that tracks not only subsystem availability but performance and keeps a history.
    A customer management system that can bring up EVERYTHING you need to know about a customer, their syetems, their people, notification proceedures, etc. (this is VERY non-trivial)
    A change control system - what happened, who approved it, who did it, how long did it take, what did they do, how did they do it, when was it active, how do you revert, etc.

    Sigh. Setting up a NOC correctly is one of the most difficult tasks in IT.
    • Hear, hear.

      Of all the posts so far, this is probably the best. If you're just starting out, I highly recommend a consultant with NOC experience. 800 pound gorillas like me are available for this. Send me email.
    • I'll agree here, and point out that disaster recovery is probably the best place to start. Given that a NOC almost by definition is always doing disaster recovery, if not yours, then someone else.

      A huge part of disaster planning is organizational, and planning, and documentation of the business practices, which will also apply to you.

      If you draw up a plan for every class of disaster you and your clients may have, then the lawyers will have something to work for.
      • Absolutely. AND ensure your disaster recovery plans include regular testing. I've seen sites where they assume that because they tested a tape restore of a test 10MB database two years ago when they first installed their SQL server, that means all the subsequently added multi-GB databases are also safe "because they're being backed up using exactly the same procedure". Perform test restores on a regular basis (say, once a month) as part of regular maintenance procedures.
  • by anthony_dipierro ( 543308 ) on Wednesday November 06, 2002 @01:35PM (#4609123) Journal
    The only book you need is the US Code, Title 11 [cornell.edu]. Pay special attention to Chapter 13 [file-bankr...ter-13.com].
  • There are a couple of ideas that I believe any NOC engineer should hold close if he or she wants to keep it together: 1. When trying to resolve an outage, don't believe any piece of information that anyone (vendor, peer, SA department) gives you until you can prove it to yourself. People make assumptions, which are often wrong. 2. Don't close a trouble ticket until you know the problem is fixed and it isn't coming back. If your NOC has three or four tickets on the same outage within a week or two, then you've got two problems: one with your gear, and one with your process. 3. Stay away from red bull. That crap is evil. 4. Find your own niche, a particular set of NOC duties that you excel in. Find some subject matter on which you can be an expert. 5. Just like they say you should check your problems at the door when you come to work, check your work at the door when you go home. 6. Be nice to people. There is no reason to be a jerk or ugly with people whether they are a vendor or a phone monkey in the call center. Sometimes it is harder to be nice when there's a high-profile outage going on and you aren't getting instant results, but it will pay off when that vendor or TS supe or whoever pulls your hiney out of the fire later on. 7. CSM aka head of lettuce is a pimp. :)
  • Sean Donelan wrote an excellent piece on requirements for various degrees of uptime in NOCs. It's not too specific, but it gives a good idea of the numbers involved.
    Read it here [donelan.com].
  • My company is often hired by other corporations to do exactly this. I'm bound by lots of pesky NDA's, so I can't offer advice unfortunately. I can mention that a lot of this is based on experience, and knowledge of what is done in a NOC. The best way to learn about one, is to see one in action, or work in one that's established. Companies pay a lot of money to us to make sure they get it right the first time. Be prepared for a learning curve, and willing to accept that your initial 'great ideas' may not be so great in practice. Be dilligent and I'm sure you'll build a great NOC, just not in a day.
  • Here goes:
    - Don't use anything by CA
    - Check out NetCool for event trapping
    - If you have a development team, consider making your own app, since off the shelf stuff might be a hard fit, where you are always trying to customize it to meet your needs and it never quite solves all your problems...
    -
  • by PinglePongle ( 8734 ) on Thursday November 07, 2002 @09:23AM (#4615912) Homepage
    I've never worked in a NOC, but I've been a customer with a couple of big names, and the most important thing to a customer is not the SLA - if we have to even read it, things have gone seriously wrong, and rather than litigate, we'll just leave.

    The blinkenlights, CRM processes, trouble ticketing systems etc are all lovely, but the thing that makes a difference is the people. In one case, there were 2 network guys we spoke to - one was great, used his initiative to sort stuff out, never lied to us or tried to fob us off, and kept us in the loop with what was going on. The other guy was technically prob. better, but used all the company's processes to hide from us. He was reluctant to acknowledge problems, rarely responded to voice or email, and gave the impression we were not really important to him. They both worked to the same SLA, processes, standards, etc. One was good at his job, the other merely good at technology.

    So, I would suggest that instead of spending a lot of time on SLAs, you spend time finding good people. Monitor your performance not by "uptime" (one of our suppliers was monitoring our site using the internal network. They got 99.999% uptime, while anyone outside the NOC got "server not found"), but by customer satisfaction - contact your customers once every 3-6 months and ask them to fill out a satisfaction survey. Deal with issues they raise. Treat a customer who leaves you like a company crisis. Encourage your people to think about outcomes, not processes.

    Sure, you need to know how your network is performing, what your customer's uptime statistics are, and have the facilities we have come to expect (including blinkenlights). Just make sure you've also got some cat5, screwdrivers, and free drinks for your customers, and that you don't get carried away with all the fancy stuff.
  • 1. Simplicity - if you can't streamline all of your core operations you spend more time figuring it out when something breaks.
    2. Consistent Documentation/Knowledge Transfer among your technicians.
    3. Consistent equipment, stick with one vendor and develop excellent relations with that vendor. (Example, I stuck with dell on a big purchase, got my servers and 2 extra computers, a box of posters, shirts, and other dell junk and 16 switches for free just for "being cool")
    4. Adaquate troubleshooting database. Find a help desk software suite that fits the nature of your user/clients needs. Make sure it is scalable, and intuitive. Having a web based self-help package is also recommended... The more your clients can help themselves, the less your phone will ring.
    5. Document everything such as set up procedures to password databases. (don't post it on the web interface of your help desk software lol!)
    6. Have security a priority on every direction your NOC has a presence.
    7. Set up your equipment to talk to YOU. Most management software (openview, webtrends, help dest software, etc) can send you emails if not text messaging and whatnot... possabilities are endless, but its nice to know something is broke before your users/clients.

    Thats it. I can go on and on.
  • from experience. (Score:3, Insightful)

    by GiMP ( 10923 ) on Thursday November 07, 2002 @09:40PM (#4622054)
    1. Good ticket system. I've used several and I can tell you that this can be a major issue. RequestTracker appears to be sufficient, although I've never used it for a larger datacenter. The tracking system and the PROCESS can make or break your datacenter.

    2. Phone calls sound like a good idea to clients but they can be a pain in the butt.. especially since many clients have difficult accents. They also require a lot of a technician's time. Consider having no telephone support, or only for large clients.

    3. You need a good customer management and billing systems.
    4. Good inventory system, tie to billing system.
    5. A web-based interface to your router(s) and switches is advised.. it is also advised to tie this into your other software, don't physically unplug suspended servers, just suspend them from the billing interface and have it automatically use SNMP to disable their port.
    6. Make sure the person handling your routing needs is sharp, I've seen datacenters where the 'router god' was learning on the job (not a bad way to learn, just a bad way to run a NOC)
    7. Make sure that you do proper cable management, keep the facilities in good order, working restrooms. I've seen datacenters which would make the homeless cry.
    8. Server monitoring system. For the convience of your technicians, a projection or large lcd would be preferred.. one of those cheaper dot-matrix LED displays would be ok.
    9. I've been places where I've had to answer telephone calls, answer tickets, lookup passwords, and monitor servers.. all from different webpages behind the LAN. INTEGRATE. The closer and easier something is to access, the more useful it becomes.

    Btw, I'm currently writing management software for a NOC; although it is proprietary software belonging to the NOC and I don't think they have any plans to sell it.
  • 1. request tracker - problem tracking
    2. big brother - problem notification
    3. mrtg - bandwidth (and other stuff) monitoring
    4. find, grep, awk, sed, cat, less - problem troubleshooting

    links to these are on http://www.freshmeat.net

When you are working hard, get up and retch every so often.

Working...