Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Networking Technology

Testing Network Changes When No Test Labs Exist? 164

vvaduva writes "The ugly truth is that many network guys secretly work on production equipment all the time, or test things on production networks when they face impossible deadlines. Management often expects us to get a job done but refuse to provide funds for expensive lab equipment, test circuits and for reasonable time to get testing done before moving equipment or configs into production. How do most of you handle such situations, and what recommendation do you have for creating a network test lab on the cheap, especially when core network devices are vendor-centric, like Cisco?"
This discussion has been archived. No new comments can be posted.

Testing Network Changes When No Test Labs Exist?

Comments Filter:
  • by jdigriz ( 676802 ) on Thursday December 24, 2009 @07:27PM (#30547884)
    Step 1) Make a formal request for the test lab. Make it as detailed as possible. Explain the impact to business if various components fail. Make a plain-language executive summary calling out risks. step 2) Once the request is denied, make sure you have a paper trail of the rejection step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less step 4) Once the inevitable failure occurs, haul out the paper trail and get the bean counter fired. Repeat until test lab is approved. Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.
  • Re:Virtualization? (Score:2, Informative)

    by loki_ninboy ( 992401 ) on Thursday December 24, 2009 @07:32PM (#30547914) Homepage
    I'm using the GNS3 software with some IOS stuff to help prepare for the the CCNA exam. Sure beats paying the money for the extra hardware laying around the house just for learning and testing purposes.
  • Packet Life (Score:3, Informative)

    by z4ns4stu ( 1607909 ) on Thursday December 24, 2009 @07:40PM (#30547960)
    Stretch, over at Packet Life [packetlife.net] has a great lab [packetlife.net] set up that anyone who needs to test Cisco configurations on can sign up for and use.
  • Re:Virtualization? (Score:5, Informative)

    by value_added ( 719364 ) on Thursday December 24, 2009 @07:41PM (#30547964)

    Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.

    For anyone unfamiliar with GNS3, a link to the website [gns3.net]. There are versions available for Windows, Linux, and OS X. FreeBSD already has it in ports.

    As a side note, I'd add that maintaining a home lab (to the extent practicable and useful) is one way to side-step limitations of what your employer provides. Consider it a combination of "Ongoing Professional Education" and "Proactive Job Security Measures" (i.e., "I better test this shit to save my ass tomorrow").

  • by Anonymous Coward on Thursday December 24, 2009 @07:42PM (#30547970)

    Not Pushing Juniper gear, but their Commit functions in JUNOS, and commands like "rollback" are serious things to consider in these scenarios. JUNOS also does things like refusing to perform a commit if you've done something obviously stupid (it does basic checking of your config when you commit).

    Label me a shill. Whatever. JUNOS is a lot better from an operator POV.

  • Tools (Score:5, Informative)

    by Tancred ( 3904 ) on Thursday December 24, 2009 @07:43PM (#30547976)

    Here are a few tools:

    GNS3 - http://www.gns3.net/ [gns3.net] - free network simulator, based on Dynamips Cisco emulator
    Opnet - http://www.opnet.com/ [opnet.com] - detailed planning of networks, from scratch
    Traffic Explorer - http://packetdesign.com/ [packetdesign.com] - plan changes to an existing network

  • by wintermute000 ( 928348 ) <bender@plane t e x p r ess.com.au> on Thursday December 24, 2009 @07:45PM (#30547988)

    Older Cisco equipment can function just as well as newer for 95% of lab scenarios. You are very unlikely to be needing to use all the newer features.

    Anything that can run IOS 12.3 and is newer than a decade old can do a lot more than you think. We do all our BGP testing on a stack of 2600s and 3600s and never an issue even though in production its 2800s, 3800s etc.
    Granted there are features that you do need the newer kit esp when syntax changes (e.g. IP SLA commands, newer netflow commands, class map based QoS to name three off the top of my head) but none of the core routing and switching features/commands has changed much since the introduction of CEF - they all do ACLs, route maps, OSPF, BGP, EIGRP, vlans, spanning tree, rapid spanning tree, IPSEC vpns. I'm speaking from an enterprise POV not a service provider but I'd imagine if you are in a telco environment you wouldn't be lacking gear.

    For many minor test scenarios, you can pick a test branch office and use the good old 'reload in XYZ' command to ensure that no matter how badly you stuff it up, everything will bounce and come back (just remember NOT TO COPY RUN START lol).

    Then there's the sleight of hand methods:
    - always ordering more for projects than you really need. Par for the course really esp as most project managers haven't a clue when it comes to the nuts and bolts of a big cisco order.
    - pushing for EOL replacements as early as possible, intentionally conflate end of sale with end of life.
    - getting stuff in for projects as early as possible, then you have a month or two to use it as test gear.
    - remember that your lab need not mirror reality, scale down as much as possible. e.g. to simulate a pair of 4506 multilayer switch running in VRRP, use a pair of 3560s. Use your CCO login and flash away to your hearts content (I know its breaching licencing but for test scenarios, meh).

  • by Keruo ( 771880 ) on Thursday December 24, 2009 @07:47PM (#30548000)

    step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less

    Been there, done that. Sadly the only way to see how your setup works is to try it in production.
    Sure it helps if you can test it beforehand, but sometimes your lab might not reflect what happens in real network when you roll something out.
    Just make sure you can clock those am hours as overtime/nighttime work.
    And remember to backup the running config twice so you can restore the production network if something goes fubar.

  • Go virtual! (Score:3, Informative)

    by leegaard ( 939485 ) on Thursday December 24, 2009 @07:58PM (#30548064) Homepage

    If you are unable to recycle old equipment into your testlab you should go virtual.

    For Cisco routers, GSN3/Dynamips (www.gns3.net) is your friend. Any recent PC or laptop will allow you to build a large and complex topology that will satisfy most experiments and even support you when doing certification preparation. It will only work for routers so switch-based platforms are out (like the 3570,6500 and 7600). The good news is that the features are more or less the same and they more or less behave the same way. If "more or less" is not close enough you need a replica of your production network or at least a few devices of each to test what can be labelled as critical.

    For Juniper routers, google juniper Olive. It will run a juniper router the same way dynamips runs a Cisco router.

    In both cases a proactive partnership deal with the vendor will be a good idea. Both Cisco and Juniper (and I am sure all other major network vendors) have programs where they will more or less advise, test and prepare the configurations for you. If you run a critical network this is money well spent.

    In the end it comes down to the level of risk your management is willing to take. Ask them if they will allow the network to be less up since you are unable to properly test your changes before implementation.

  • by anti-NAT ( 709310 ) on Thursday December 24, 2009 @07:58PM (#30548066) Homepage

    For any sort of medium to large network, you can't fully simulate it. That means you're always going to be making "untested" environment. So, you make very few changes rather than lots, you make sure after each change they've had the desired effect, and you have backout plans.

  • by mysidia ( 191772 ) on Thursday December 24, 2009 @08:20PM (#30548160)

    My personal favorite thing about JunOS is "commit confirmed 10"

    This can be a lifesaver, if you fat fingered something, and you break even your ability to access to the device, your transaction should roll back in 10 minutes.

    If nothing goes wrong, you have 9 minutes to do some simple sanity checks, make sure your LAN is still working, and then get back to your CLI session and confirm the change.

  • by karnal ( 22275 ) on Thursday December 24, 2009 @08:41PM (#30548264)

    You bring up a good point regarding changing the running config vs the saved config.

    What I'll do if I'm changing a remote system - POTS or no - is set up a reboot of the device in 15 minutes. After verifying the clock. Then, if something in the config causes an unforseen issue, you just need to wait a little for the switch/router to come back online with it's original config.

    Obviously, this can extend the outage window - however, always plan for worst case...

  • Don't forget SOX (Score:3, Informative)

    by jackb_guppy ( 204733 ) on Thursday December 24, 2009 @09:04PM (#30548348)

    1) You should not be making any direct changes to the network with out correct design, test and sign off.

    2) You should already have a redundant network structure, so "half" can be loss without any loss to network operations. This way the change can be tested in parallel.

    3) You should always report to SOX officer when a request outside correct operations and management is made. It makes it their responsibility to solve the legal issues, for not following their written standards, before you began.

  • by POTSandPANS ( 781918 ) on Thursday December 24, 2009 @10:01PM (#30548528)

    On a cisco, you can just do "reload in 10" and "reload cancel". If you don't know about those commands, you really shouldn't be working on a production network unsupervised.

    As for the original question: Either use similar low end equipment, or use your spares. (please say you keep spare parts around)

  • Re:Virtualization? (Score:3, Informative)

    by Bios_Hakr ( 68586 ) <xptical@g3.14mail.com minus pi> on Thursday December 24, 2009 @10:43PM (#30548688)

    If you work a pure Cisco environment, talk to your Cisco guy about getting Packet Tracer. Emulates a few routers and a lot of switches. It works really well. Plus, 5.1 adds virtual networking. You can design several networks on several laptops and then join those networks over a virtual internet.

  • by tlhIngan ( 30335 ) <[ten.frow] [ta] [todhsals]> on Friday December 25, 2009 @01:49AM (#30549436)

    In some environments, that is frustrated by other (lazy) technical staff, who immediately start automatically blaming _every_ problem they find for the next few weeks, on that one change, without even doing any helpful troubleshooting, or finding any reason at all to suggest it might be the case.

    The problem is unrelated and would happen anyways, but because they heard of a recent change, there is a cognitive bias towards immediately suspecting the new change, just because it's a change they know about.

    "I didn't change anything, so if I just started getting a few problem reports it must be your change"

    Which is why you announce the change will happen on X, but actually wait a week or two before actually committing the change. Then any bellyaching that happens, you can file as their problem. If any real issues happen, you can even hold off doing the change in case your change might aggravate the problem.

    It's the same when new cell towers or other equipment are installed - people will complain of headaches and other crap caused by the tower right after it's "turned on", when in reality, it's been running months beforehand, or hasn't even been turned on yet.

  • by Grail ( 18233 ) on Friday December 25, 2009 @08:30AM (#30550430) Journal

    If you truly believe that a simple reversion of a configuration will cause a reversion to a previous state, you're sorely mistaken.

    Once the device you're working on starts misbehaving, other devices around it will start misbehaving too. As an example, one change to a network I'm involved with was supposed to simply prioritise VoIP traffic for one customer. The change was successful, the engineer went home. Then three hours later a major network router failed, because the higher priority voice traffic which was now flowing over the router tripped some magic number of MACs that it could remember, at which point the card had to keep referring routing decisions back to the CPU.

    The router's CPU became overloaded, other routes started dropping packets, and we ended up trying to resolve the problem by rebooting that router (because that's what was broken). The router on starting up was immediately overloaded and crashed again. Overall, it took about four hours to get to the problem resolved, which required reverting the VoIP change and turning off some customer networks to allow the core router to start up without the huge packet load. The customer networks were down for about three and a half hours.

    In this instance, simply reverting the change to the VoIP services would not have resolved the problem. Once the camel's back was broken, removing a straw would not have fixed it.

With your bare hands?!?

Working...