Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Networking Technology

Testing Network Changes When No Test Labs Exist? 164

vvaduva writes "The ugly truth is that many network guys secretly work on production equipment all the time, or test things on production networks when they face impossible deadlines. Management often expects us to get a job done but refuse to provide funds for expensive lab equipment, test circuits and for reasonable time to get testing done before moving equipment or configs into production. How do most of you handle such situations, and what recommendation do you have for creating a network test lab on the cheap, especially when core network devices are vendor-centric, like Cisco?"
This discussion has been archived. No new comments can be posted.

Testing Network Changes When No Test Labs Exist?

Comments Filter:
  • by Anonymous Coward

    Whenever you're working in/on a production environment, only one rule matters:

    Don't fuck it up.

    • by symbolset ( 646467 ) on Thursday December 24, 2009 @07:53PM (#30548034) Journal

      Oh, no. We do this all the time. Around the holidays we rewire the production server racks so their ethernet cables droop over the aisles, so we can hang up Christmas cards. Jimmy has a script that blinks the blue UID lights for a festive holiday display.

      • You mean the IPMI LEDs? (Dell Poweredge servers have a dual-color LED that can flash (blue or orange) that signifies errors etc. Accessible via IPMI (along with all sorts of other goodies, like serial over Ethernet at a level higher than the OS))

      • Oh, a variation on blinkenlights?

        ACHTUNG!
        ALLES TURISTEN UND NONTEKNISCHEN LOOKENPEEPERS!
        DAS KOMPUTERMASCHINE IST NICHT FÜR DER GEFINGERPOKEN UND MITTENGRABEN! ODERWISE IST EASY TO SCHNAPPEN DER SPRINGENWERK, BLOWENFUSEN UND POPPENCORKEN MIT SPITZENSPARKSEN.
        IST NICHT FÜR GEWERKEN BEI DUMMKOPFEN. DER RUBBERNECKEN SIGHTSEEREN KEEPEN DAS COTTONPICKEN HÄNDER IN DAS POCKETS MUSS.

    • damn straight, thats why you get paid.

      in theory, theory and practice are the same, in practice its not. you're job is to make it that way.

      replace theory with lab and you see the fundamental flaw with the false sense of security a lab provides.

    • by Bruha ( 412869 )

      It's called a FOA first office application. You do what modeling you can, check what you're changing and Rule #1 is dont fuck with something if you know nothing about it. We do it in the middle of the night and if it screws up things we just restore the changed equipment to the pre change state. Networks are too complex and even the best lab modeling does not catch all situations.

    • Either that, or redundancy, redundancy, redundancy. I always at least try to convince the bosses that hardware needs to be ordered in even numbers- so that we have onsite emergency replacements.

      That extra hardware can then be used to build test beds.

    • Re: (Score:3, Insightful)

      by lukas84 ( 912874 )

      Everyone has a test environment. But not everyone has a production environment.

  • by Lord Byron II ( 671689 ) on Thursday December 24, 2009 @07:20PM (#30547848)

    There are zero replies and the story is already tagged with "youreboned". That's the truth. If your higher ups won't front the money for proper test equipment and expect you to roll out production-ready equipment on the first go, then you really are boned. Of course, you can mitigate this by simple pen-and-paper analysis. What should each piece of equipment do? Are the products we've selected appropriate for the roles we're going to put them in? These sorts of questions can find a lot of bugs without any sort of testing. If you think, "what would I do if it was the 1980's?" then you'll be fine.

    • Comment removed (Score:5, Insightful)

      by account_deleted ( 4530225 ) on Thursday December 24, 2009 @07:36PM (#30547938)
      Comment removed based on user account deletion
      • by BiggerIsBetter ( 682164 ) on Thursday December 24, 2009 @07:44PM (#30547980)

        Not all changes are a one-way trip. Having a rollback plan is also important. Should something very unexpected happen, be prepared to roll back any and all changes to undo what has just been done.

        Couldn't agree more, except to say, don't assume you'll be rolling back from a known state. I've seen roll-back plans that assume they're undoing the changes just put in, not reverting to the state before the changes. Yes, there's a difference between the two! Eg, if your install fails, maybe you can't un-install. Yes, this might mean additional resources and the overhead of FS and DB snapshots, and complete copies of config files, but better that than the alternative.

        • by afidel ( 530433 ) on Thursday December 24, 2009 @08:36PM (#30548244)
          This is networking equipment, other than transitory information like peer maps and MAC tables that can be re-learned you should always be able to revert to the previous state as far as the software and configuration.

          My comments are that out of band management are the networking guys best friend, and POTS is the best OOB available. Also learn how to change the running config without affecting the saved config, that way worst case is you have to power cycle (can be done with the correct OOB config or you can pre-schedule a reboot that you cancel if everything goes well). Oh and downtime windows might seem like a luxury but unless you are Google or Amazon the business needs to be made aware that they are necessary and critical to the smooth functioning of their IT infrastructure, so you should be making these changes during the downtime window where everyone is aware that things might break.
          • Re: (Score:3, Informative)

            by karnal ( 22275 )

            You bring up a good point regarding changing the running config vs the saved config.

            What I'll do if I'm changing a remote system - POTS or no - is set up a reboot of the device in 15 minutes. After verifying the clock. Then, if something in the config causes an unforseen issue, you just need to wait a little for the switch/router to come back online with it's original config.

            Obviously, this can extend the outage window - however, always plan for worst case...

            • by afidel ( 530433 )
              My favorite ultimate backup for rebooting a device is a DTMF controlled PDU, call into the OOB number and hit a magic number sequence and the device reboots =)
            • Regarding running config and saved config, some time ago I did an iptables script that would test a new rule chain for a specified amount of time, then reverting back to the previous one. It has saved me a lot of time many times, and actually a couple of times I locked myself out of the machine (that was a remote one, obviously).
          • Re: (Score:2, Informative)

            by Grail ( 18233 )

            If you truly believe that a simple reversion of a configuration will cause a reversion to a previous state, you're sorely mistaken.

            Once the device you're working on starts misbehaving, other devices around it will start misbehaving too. As an example, one change to a network I'm involved with was supposed to simply prioritise VoIP traffic for one customer. The change was successful, the engineer went home. Then three hours later a major network router failed, because the higher priority voice traffic which

            • by afidel ( 530433 )
              And it's also a change that a lab would have been unlikely to predict, some of the time things break, life happens, and that's why you need dual paths and multiple datacenters to achieve more than about 4.5 9's in the real world.
    • Re: (Score:2, Informative)

      by Anonymous Coward

      Not Pushing Juniper gear, but their Commit functions in JUNOS, and commands like "rollback" are serious things to consider in these scenarios. JUNOS also does things like refusing to perform a commit if you've done something obviously stupid (it does basic checking of your config when you commit).

      Label me a shill. Whatever. JUNOS is a lot better from an operator POV.

      • Re: (Score:2, Informative)

        by mysidia ( 191772 )

        My personal favorite thing about JunOS is "commit confirmed 10"

        This can be a lifesaver, if you fat fingered something, and you break even your ability to access to the device, your transaction should roll back in 10 minutes.

        If nothing goes wrong, you have 9 minutes to do some simple sanity checks, make sure your LAN is still working, and then get back to your CLI session and confirm the change.

        • Re: (Score:2, Informative)

          by POTSandPANS ( 781918 )

          On a cisco, you can just do "reload in 10" and "reload cancel". If you don't know about those commands, you really shouldn't be working on a production network unsupervised.

          As for the original question: Either use similar low end equipment, or use your spares. (please say you keep spare parts around)

          • Re: (Score:2, Interesting)

            by mysidia ( 191772 )

            "reload in 10" on a core router or switch (eg a massive switch that also has routing duties) is insane, and will probably impact the entire network, for 20-30 minutes, if you accidentally lock yourself out (but don't otherwise impact anything) and fail to cancel that reload.

            In addition, reload is risky, and the equipment may fail to come back up correctly.

            Sorry, it's not anywhere close to comparable to the configuration management features in JunOS.

            "Reload in X" is a bad answer, and should never be

    • As long as the downtime that will result is acceptable.

    • by eggoeater ( 704775 ) on Thursday December 24, 2009 @10:06PM (#30548554) Journal
      I'm a call-center telephony engineer. Kinda the same thing as network engineer in that you're routing calls instead of packets.
      Back around '01, I was working for First Union (which later became Wachovia). They had this massive corporate push for anyone and everyone in IT to roll out a standardized Software Configuration Management [wikipedia.org], and of course we were included. The big problem was the lab. The corporate standard was to test changes in a lab environment and then move to production (duh).
      For a telephony environment, we had a pretty good lab that could duplicate most of our production scenarios, but not all. Another problem was there were a LOT of people with their fingers in the lab since so many groups were involved: eg. The IVR team is in there because you have to have IVRs in the system. Same with call routing, call recording, desktop software, Q&A, etc.etc.
      So the lab was in a constant state of flux with multiple products, multiple teams, and different software cycles and endless testing always occurring. We made it work by testing the stuff we weren't sure about in the lab, only doing changes in prod after hours, and having really good testing and back-out plans.
      So when the corporate overlords started telling use we couldn't make any changes to production without running everything through the lab first, we basically laughed and told them we'd need around 500 million for the lab and dedicated resources to run it. I ended up telling them that to duplicate the production environment, we'd need another bank as our "test bank", and we could test changes on the test bank and then put them in the production bank.

      As with so many things in that IT department, it went from being a priority to fading away when something else became a priority.
      • Lemme guess... the priorities changed when a new manger was hired/one was let go, and someone decided they wanted to make a name for themselves?
    • by Stripe7 ( 571267 )
      Pen and paper analysis may not find out all the issues. We had a weird one that flummoxed a bunch of network engineers. It was an IOS upgrade to the built in fiber bridge on a blade server. The old IOS worked fine, the new one worked until you tried to jumpstart a blade. Jumpstarts worked fine with the old IOS but not on the new one. As we rarely jumpstarted the blades, this issue was not caught until after the bridges on all the blade servers were upgraded.
  • Could be worse (Score:4, Insightful)

    by 7213 ( 122294 ) on Thursday December 24, 2009 @07:22PM (#30547858) Homepage

    The best bet is to be ready to blame the vendor when things go south ;-)

    Seriously, I'm right there with you. If management does not want to provide for a test lab & reasonable time to test. Then it's clear they've made a 'business decision' that the network is not of sufficient value / risk is not great enough for such investments.

    This may change quickly once something goes south (assuming they understand why it did) but you're gonna be talking to a brick wall until then.

    It could be worse, you could have management that are afraid of there own shadows & who freak out at the idea of replacing redundant components after a HW failure. (Ever had to get VP approval to replace a failed GBIC? Oh, I have & yes, I hate my life).

    • Re: (Score:2, Interesting)

      by mysidia ( 191772 )

      See how much approval you have to get when the network is down because of a failed GBIC.

      Redundancies against component failure are very good for the enterprise, but also make it harder for engineers to do their job, since "nobody notices that something has gone wrong".

      Perhaps the real redundancies should be reserved for the absolute most business-critical things.

      Make sure less important things are non-redundanct and arranged in a way, so that if any link or GBIC does fail, something noticeable to

      • Re: (Score:3, Insightful)

        by hazem ( 472289 )

        That reminds me of an article by Nelson Repenning, "Nobody ever gets credit for fixing problems that never happened". It's quite an interesting read... The guy who "saves the day" during an emergency always seems to get credit and reward, but what about the guy who keeps the emergency from ever happening?

        • by martyb ( 196687 )
          That reminds me of an article by Nelson Repenning, "Nobody ever gets credit for fixing problems that never happened". It's quite an interesting read... The guy who "saves the day" during an emergency always seems to get credit and reward, but what about the guy who keeps the emergency from ever happening?

          Hey! Thanks for that!!!!!

          I'd heard variations on it several times but assumed it was just folklore or [un]conventional wisdom. Your post prompted me to search and find the article you mentioned: here'

        • by RMH101 ( 636144 )
          Case in point: last year we had a server room outage here at a big retailer. UPS tripped, whole lot went down including 24/7 supply chain etc - millions lost per hour. Cue some phone calls to a few IT people who happened to be out on the beer that night who came in and eventually sorted it out after about 6 hours downtime. This was sold as a triumph of IT's dedication and professionalism - no one asked "why did the bloody DC only have a single UPS and single phase power?"
  • Virtualization? (Score:5, Interesting)

    by bsDaemon ( 87307 ) on Thursday December 24, 2009 @07:22PM (#30547860)
    It's perhaps not the best solution, as a lot of problems I've faced since I started getting more into networking stuff than software configuration and web server administration have been related to bad cables rather than bad IOS settings, but virtualization can help you create test situations on the cheep. Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.

    You can also use QEMU to create virtual network nodes. If you have enough RAM, then this can help at least get the logical issues worked out and the software configurations square. Then you just need to do the real work :) I'm still pretty new to networking myself, and I use it to make little test labs for myself when I need to do more than I can with the two 3600 and the 2600-series routers I got to take home for experimenting with. I actually copied the IOS images off of them via TFTP and then can replicate them as many times as I need to, but I can claim I have whatever interfaces I need, plus it will (thankfully) simulate the ATM switch for me as well.
    • Re: (Score:2, Informative)

      by loki_ninboy ( 992401 )
      I'm using the GNS3 software with some IOS stuff to help prepare for the the CCNA exam. Sure beats paying the money for the extra hardware laying around the house just for learning and testing purposes.
      • by afidel ( 530433 )
        Almost as importantly with a simulator you don't have to POWER all that equipment, my CCNP lab almost maxed out a 20A circuit.
    • Re:Virtualization? (Score:5, Informative)

      by value_added ( 719364 ) on Thursday December 24, 2009 @07:41PM (#30547964)

      Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.

      For anyone unfamiliar with GNS3, a link to the website [gns3.net]. There are versions available for Windows, Linux, and OS X. FreeBSD already has it in ports.

      As a side note, I'd add that maintaining a home lab (to the extent practicable and useful) is one way to side-step limitations of what your employer provides. Consider it a combination of "Ongoing Professional Education" and "Proactive Job Security Measures" (i.e., "I better test this shit to save my ass tomorrow").

    • Re: (Score:3, Informative)

      by Bios_Hakr ( 68586 )

      If you work a pure Cisco environment, talk to your Cisco guy about getting Packet Tracer. Emulates a few routers and a lot of switches. It works really well. Plus, 5.1 adds virtual networking. You can design several networks on several laptops and then join those networks over a virtual internet.

  • Granted, it's not really an ideal solution, but it may wind up being the only way to avoid using production equipment.

  • by jdigriz ( 676802 ) on Thursday December 24, 2009 @07:27PM (#30547884)
    Step 1) Make a formal request for the test lab. Make it as detailed as possible. Explain the impact to business if various components fail. Make a plain-language executive summary calling out risks. step 2) Once the request is denied, make sure you have a paper trail of the rejection step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less step 4) Once the inevitable failure occurs, haul out the paper trail and get the bean counter fired. Repeat until test lab is approved. Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.
    • Re: (Score:2, Insightful)

      by Renraku ( 518261 )

      If you get fired for failing to do a job for which you were not equipped (and they know you aren't equipped for it), you might be able to sue because they created a hostile work environment. Hostile work environment lawsuits aren't just for sexual harassment, folks.

    • There's a potential hitch or two in your plan.

      If it goes smoothly anyway, you might look like a whiner that didn't need the expensive toys to keep on the shelf. They feel vindicated. If it goes poorly they'll assume you didn't really try because you wanted to prove yourself right.

      • by SethJohnson ( 112166 ) on Thursday December 24, 2009 @07:55PM (#30548056) Homepage Journal

        If it goes smoothly anyway, you might look like a whiner that didn't need the expensive toys to keep on the shelf.

        Hence, you have the plug to the main router beneath your own desk. When the sailing looks smooth, you kick out the cord. While everyone freaks out, you open up a terminal window and begin typing nonsensical commands. Say, "Ahaaah! As you re-plug in the router.

        Job security.

        Seth

        • Sounds like Bastard Operator From Hell to me. But it could be the only defense against Incredibly Incompetent Manager From Hell.

        • Please tell us all how you convinced an electrician to install dual L6-30 208V plugs beneath your desk. And how kicking said twist lock plugs -- both of them -- will cause the plug to come loose.
        • by mybecq ( 131456 )

          Say, "Ahaaah! As you re-plug in the router."

          With your feet? You ARE talented!

      • by jdigriz ( 676802 )
        Are we in the same business? No one ever notices IT when things go well.
        • by itwerx ( 165526 )
          On the consulting side people do notice when things are running well because they aren't getting billed!
    • by Keruo ( 771880 ) on Thursday December 24, 2009 @07:47PM (#30548000)

      step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less

      Been there, done that. Sadly the only way to see how your setup works is to try it in production.
      Sure it helps if you can test it beforehand, but sometimes your lab might not reflect what happens in real network when you roll something out.
      Just make sure you can clock those am hours as overtime/nighttime work.
      And remember to backup the running config twice so you can restore the production network if something goes fubar.

      • by dkf ( 304284 )

        Been there, done that. Sadly the only way to see how your setup works is to try it in production.

        The other thing to mention is to be honest with the other technical staff about if you've actually made a change, even if "trivial". This is because sometimes when you modify something, you can end up dumping them in the shitter accidentally, e.g., by putting a critical service on the wrong side of an internal firewall so that no packets get routed to it at all. In fact, I saw that once and networking stonewalled for a week before admitting that indeed they had made a small modification "that shouldn't have

      • by dbIII ( 701233 )

        Sure it helps if you can test it beforehand, but sometimes your lab might not reflect what happens in real network when you roll something out.

        That means your experimental model is not good and needs to be refined.
        You see - all those guys that did a six month course and call themselves "engineers" could have had some benefit of a real engineering education or the experience of working with real engineers.
        Meanwhile I have idiots learning about routing or DHCP on production systems because they can't be bothe

      • yep, and the other fail here is that a lot of production environments are 24/7. there is NOT a slow point, ever.
    • Re: (Score:3, Interesting)

      by Anonymous Coward

      Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.

      And that's what happened to me.

      I was forced into making changes in the production environment, and caused an outage that affected 2 people. Once I realized what happened, I quickly fixed it; however due to internal politics I was terminated the next day.

      Initially I was in shock. 10 years, 2 months employed in a single company. Gone. I have a stay-at-home wife and 3 kids; which made things look even bleaker.

      In hindsight, it may be one of the better things to happen to me. I had spoken with a recruiter a

    • Re: (Score:3, Interesting)

      by SharpFang ( 651121 )

      3) If possible test network changes on the production equipment at 2am so that impact on users will be less step

      That's dangerous. You leave it apparently running and crawl back to sleep at 4:30AM, to get an angry call at 7:05AM when the first users to log in report something essential is fucked up.

      Prepare and test at 2AM, then roll back to original. Then re-apply around lunch break and wait with your fingers on roll-back for the first reports of failure.

    • 3) If possible test network changes on the production equipment at 2am so that impact on users will be less step

      You're a network guy, right? How well do you know the applications that use your network? How sure are you that the application behind, or in front of the change you're making don't need a restart after losing connectivity? Maybe your late night tests are causing all sorts of problems and expense when the apps guys come in to find the system inexplicably down, having visible outages, and have to start raising support requests against vendors to find a solution to their non-reproducible high severity defect

    • by mcrbids ( 148650 )

      4) Once the inevitable failure occurs, haul out the paper trail and get the bean counter fired. Repeat until test lab is approved. Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.

      And this is the part that SUCKS.... A while back, I was part of a three-way integration project, with myself (representing a vendor), another vendor, and the ultimate customer. In advance, I'd talked through everything with the other vendor so we had a clear plan, including a verification st

  • by tchdab1 ( 164848 ) on Thursday December 24, 2009 @07:27PM (#30547886) Homepage

    I call my buddies at RIM and test my mods on their system.

  • I would suggest asking your vendors for demo or evaluation equipment. Cisco, Juniper and 3Com have pools of demo equipment as do the resellers like PC Connection and CDW.

    I've done deployments of new switching infrastructure based on work I've done with loaners from my vendors. It can be tough because the typical evaluation period is 30 days. Although you can get 45 and even 60 days.

    If you have a good relationship with your sales rep. It would be easy to push them to get the necessary items to do basic testi

  • Packet Life (Score:3, Informative)

    by z4ns4stu ( 1607909 ) on Thursday December 24, 2009 @07:40PM (#30547960)
    Stretch, over at Packet Life [packetlife.net] has a great lab [packetlife.net] set up that anyone who needs to test Cisco configurations on can sign up for and use.
  • Tools (Score:5, Informative)

    by Tancred ( 3904 ) on Thursday December 24, 2009 @07:43PM (#30547976)

    Here are a few tools:

    GNS3 - http://www.gns3.net/ [gns3.net] - free network simulator, based on Dynamips Cisco emulator
    Opnet - http://www.opnet.com/ [opnet.com] - detailed planning of networks, from scratch
    Traffic Explorer - http://packetdesign.com/ [packetdesign.com] - plan changes to an existing network

  • by wintermute000 ( 928348 ) <{ua.moc.sserpxetenalp} {ta} {redneb}> on Thursday December 24, 2009 @07:45PM (#30547988)

    Older Cisco equipment can function just as well as newer for 95% of lab scenarios. You are very unlikely to be needing to use all the newer features.

    Anything that can run IOS 12.3 and is newer than a decade old can do a lot more than you think. We do all our BGP testing on a stack of 2600s and 3600s and never an issue even though in production its 2800s, 3800s etc.
    Granted there are features that you do need the newer kit esp when syntax changes (e.g. IP SLA commands, newer netflow commands, class map based QoS to name three off the top of my head) but none of the core routing and switching features/commands has changed much since the introduction of CEF - they all do ACLs, route maps, OSPF, BGP, EIGRP, vlans, spanning tree, rapid spanning tree, IPSEC vpns. I'm speaking from an enterprise POV not a service provider but I'd imagine if you are in a telco environment you wouldn't be lacking gear.

    For many minor test scenarios, you can pick a test branch office and use the good old 'reload in XYZ' command to ensure that no matter how badly you stuff it up, everything will bounce and come back (just remember NOT TO COPY RUN START lol).

    Then there's the sleight of hand methods:
    - always ordering more for projects than you really need. Par for the course really esp as most project managers haven't a clue when it comes to the nuts and bolts of a big cisco order.
    - pushing for EOL replacements as early as possible, intentionally conflate end of sale with end of life.
    - getting stuff in for projects as early as possible, then you have a month or two to use it as test gear.
    - remember that your lab need not mirror reality, scale down as much as possible. e.g. to simulate a pair of 4506 multilayer switch running in VRRP, use a pair of 3560s. Use your CCO login and flash away to your hearts content (I know its breaching licencing but for test scenarios, meh).

  • It doesn't save you from doing stupid things; but putting your device configurations under revision control, using something like Rancid [shrubbery.net] can make rolling things back easier, as well as generally encouraging sanity around device configuration.
  • Go virtual! (Score:3, Informative)

    by leegaard ( 939485 ) on Thursday December 24, 2009 @07:58PM (#30548064) Homepage

    If you are unable to recycle old equipment into your testlab you should go virtual.

    For Cisco routers, GSN3/Dynamips (www.gns3.net) is your friend. Any recent PC or laptop will allow you to build a large and complex topology that will satisfy most experiments and even support you when doing certification preparation. It will only work for routers so switch-based platforms are out (like the 3570,6500 and 7600). The good news is that the features are more or less the same and they more or less behave the same way. If "more or less" is not close enough you need a replica of your production network or at least a few devices of each to test what can be labelled as critical.

    For Juniper routers, google juniper Olive. It will run a juniper router the same way dynamips runs a Cisco router.

    In both cases a proactive partnership deal with the vendor will be a good idea. Both Cisco and Juniper (and I am sure all other major network vendors) have programs where they will more or less advise, test and prepare the configurations for you. If you run a critical network this is money well spent.

    In the end it comes down to the level of risk your management is willing to take. Ask them if they will allow the network to be less up since you are unable to properly test your changes before implementation.

  • by anti-NAT ( 709310 ) on Thursday December 24, 2009 @07:58PM (#30548066) Homepage

    For any sort of medium to large network, you can't fully simulate it. That means you're always going to be making "untested" environment. So, you make very few changes rather than lots, you make sure after each change they've had the desired effect, and you have backout plans.

  • Borrow a lab! (Score:4, Interesting)

    by jimpop ( 27817 ) * on Thursday December 24, 2009 @08:02PM (#30548092) Homepage Journal

    Cisco have many (large) labs located around the world. Sign up for some time in one of them.

  • by Anonymous Coward

    Been there, done that (A LOT!!)
    But it has failed quite a few times too..

    If no money available for test labs, make good plans... Tell the dudes that wanted the changes (or if you are the dude that wants the changes inform the correct people that you will be doing stuff) Agree on a service window. Have backup plans.. Have all configurations saved.. Let all users know that after 10pm on that saturday network will be down for 10 mins etc etc..

    Have tons of contengency plans, and let the 'responsible' people know

  • by Anonymous Coward

    You do not mention that this has ever made shit hit a fan. I conclude that so far this has not occured.

    Consequently, you have proved that you are able to work without expensive test equipment by a combination and motivation and elbow grease. Congratulations!

    Now, what is the logic for someone with a finite pool of money to provide equipment for someone who obviously does not NEED it? Yes, None At All!

    You can therefore:
    1) Wait until shit hits a fan and say "well, that's what happens when we don't have test eq

  • Paper Trail (Score:4, Interesting)

    by tengu1sd ( 797240 ) on Thursday December 24, 2009 @08:24PM (#30548176)
    >>>refuse to provide funds for expensive lab equipment, test circuits and for reasonable time to get testing done before moving equipment or configs into production.

    Make sure that every change request implementation documents that this change is being placed intro the production environment for testing. Document impact ranging from total network failure to moderate inconvenience and include roll out time tables. The roll out needs include travel times such drive to site B or fly cross country.

    Of course the downside of this is that management may go out and hire someone who knows, or at least pretends to know, how to drop changes into place without whining about ignorance and making customers uncomfortable.

  • It depends a lot on your environment and the complexity you are dealing with. Test labs are wonderful things, but typically you end up in a situation where your network is so limited that a lab won't help much, or your network simply too complex to create a sane lab environment without dedicated staff and a huge budget.

    Building a full scale lab is a large undertaking. It takes time and effort. You will need taps (for routing information), traffic generators, topology management and more. In my experience it

  • Don't forget SOX (Score:3, Informative)

    by jackb_guppy ( 204733 ) on Thursday December 24, 2009 @09:04PM (#30548348)

    1) You should not be making any direct changes to the network with out correct design, test and sign off.

    2) You should already have a redundant network structure, so "half" can be loss without any loss to network operations. This way the change can be tested in parallel.

    3) You should always report to SOX officer when a request outside correct operations and management is made. It makes it their responsibility to solve the legal issues, for not following their written standards, before you began.

    • by butlerm ( 3112 )

      Not that this isn't a good idea for other reasons, but how exactly does this requirement flow from the Sarbanes-Oxley Act? I mean, the whole thing is about financial controls, accuracy, disclosure, and reporting. I suppose your network could impair the timeliness of some reports, but it is hard to see in general how it is going to affect their accuracy.

      • Access Control - using VLANs and "firewalls" can be broken with errors
        Business Continuity - system is unavailable for *any* work
        Damage - system needing to be reload to return to working status w/ loss of intermediate work.
        Notification - Is there some thing that maybe wrong
        Missed Posting - due to todays separated servers one being "down" could cause *lost* postings and revenue.

        That is just a short list. SOX opens a whole array issues, Best to let the SOX team to worry.

        The network is now the system!

      • If what you know about Sarbanes-Oxley comes from Slashdot, you probably think it is significantly wider in scope than it actually is. You're correct: Sarbanes-Oxley is only tangentially related to IT, but somehow Slashdot thinks that it's this big huge-ass problem. See here for more information: http://en.wikipedia.org/wiki/Information_technology_controls#IT_controls_and_the_Sarbanes-Oxley_Act_.28SOX.29 [wikipedia.org]

        Specifically this:
        "The 2007 SOX guidance from the PCAOB[1] and SEC[2] state that IT controls should only

  • polish your resume.

  • Download an iso from Vyatta and build a test network with old PCs and spare NICs for testing. Sure, it's not the exact same as Cisco, but if they're too cheap to buy the real thing for a test lab then you'll at least be somewhat close.

    Then, once you realize what you're not getting for your money with Cisco, you can buyt $1000 1U servers and build your own routers (or buy them prebuilt from Vyatta for about $2000) to replace the ciscos and make a profit selling the used Ciscos on ebay.

    I do NOT work for nor

    • Re: (Score:3, Funny)

      by itwerx ( 165526 )
      Tired of the VI vs EMACS war? Try the new Vyatta vs pfSense conflict instead! :) (pfSense is great...)
  • The UNH-IOL is a neutral, third-party laboratory dedicated to testing data networking technologies through industry collaboration.

    http://www.iol.unh.edu/ [unh.edu]

  • Make your objections in writing, email it to the manager demanding the change you believe to place production at risk with the risks clearly outlined in bullet points. if he then insists you proceed, make him send you the request in writing/email and print out a duplicate, keep it in a safe place and then make his change. This way he owns the failure, not you. paper trails exist for a reason, to cover arses, and arse covering is often a worthwhile exercise.
  • As you already said, we secretly test on production in such cases.

  • Management hates paying for double the equipment, but for any production environment, it should be the cost of doing business. It minimizes risk and provides hot spares faster than an HP (or whatever) tech shows up. You should get some duplicate hardware for staging.

    If you can't do that, then refer to the earlier post [slashdot.org] - don't fsck up.

  • Re: (Score:2, Insightful)

    Comment removed based on user account deletion
  • ...You're as guilty - if not MUCH more - than they are here....

    Quoting you: "Management often expects us to get a job done but refuse to provide funds for expensive lab equipment"

    Well, have you considered it might be that you may not have informed the management from the start what's to be expected in the future? If there is ONE THING that the management does well and knows better than most of us - is how to EARN and KEEP money, they trust YOU to do your job and know everything about it so it doesn't have t

  • You hire Professional Services from a lab/test equipment manufacturer (Spirent, Ixia, BPS) or dedicated testing companies (EANTC or others). Most of them will accept to work during the night, so you need to get a "maintenance" window where they can inject traffic. I do that all the time, from the testers side. It's stupid to do, by the way, because you should always test *before* production.

    But that's really dangerous and the best way is still to test in the "lab". A lab can be a temporary rack where you pu

  • Not a cure-all by any means, but one more trick for the toolbox. Very useful during a maintenance window. Obviously Cisco specific.

    (tftp/scp/etc new-config to router)

    router# reload in 2
    router# copy flash://new-config run

    (something along those lines, this is off the top of my head, basically copy your new config to the running config)

    if it works, wr it to startup config, if you get disconnected, wait 2 minutes for the router to reboot and automatically load the previous startup-config. Adjust
  • It's highly distressing to encounter these people, but many, tech and manager alike, actually think there's nothing wrong with working on production systems. To them that's just how it's done. They know no other way. Trying to educate them is met with blank stares and sometimes even harsh resistance.

  • seriously, buy a new router to replace a 'broken' one from a location and then somehow fix the broken one for your lab/office.

    The truth is that sometimes you not only lack the equipment for lab testing, but also the real world usage scenario. I am often stuck in a situation where I must backup a config and then experiment with production equipment and so am forced to do this outside of business hours. I usually get a chance to do some functional testing offline but cant really put new systems through ther

"Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

Working...