Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Networking Software Linux

Trying to Help a Troubled Network with Linux? 68

vmehta asks: "I was recently put in a situation where I am trying to help a troubled network with many students accessing it. There are issues with broadcast packets and random outages which seem to be plaguing the network. What tools and methods are the best practice when trying to use Linux and Open Source to analyze and fix a network?"
This discussion has been archived. No new comments can be posted.

Trying to Help a Troubled Network with Linux?

Comments Filter:
  • Assess the problem (Score:5, Insightful)

    by madaxe42 ( 690151 ) on Tuesday October 25, 2005 @07:34PM (#13876740) Homepage
    First step isn't to blunder in and migrate - the first step is to work out what's causing the outages etc. use ethereal or some other packet sniffer to establish where the broadcast floods are coming from - use nmap to find insecure hosts - also, investigate what kind of routers are being used, and what rules are being employed.

    Basically, OSS/Linux are great, but don't rush in without establishing the issues first.
    • And then clean up the windows boxes. It sure sounds like there are many pwned machines.

      • by moro_666 ( 414422 )
        You can attempt a scan&sniff at first, plenty of stuff to choose from,
        but if your 100mbit network is being overhauled, it's quite difficult
        to isolate single responsible instances.

        I guess that probably you will end up doing that :

        1) get rid of cheap hubs(made in paiwan) and get some real network switches in place, like those from SMC. Having an old buggy hub talking to several cheap NICs in several machines ends up in massive packet collision, resulting a network that doesnt carry much but is totally jam
    • by tverbeek ( 457094 ) on Tuesday October 25, 2005 @08:32PM (#13877080) Homepage
      Did you read the part of the question where he explained that he was looking for tools to analyze and fix the problem? And did you notice that he didn't mention or imply any kind of migration?

      Here's an idea: Before you blunder in with an answer, the first step is to work out what the question is. :)

      • He did specifically mention 'Linux'

        Chances are, he's not going to install linux under vmware to solve the problem ;)

        • He did specifically mention 'Linux'
          Chances are, he's not going to install linux under vmware to solve the problem ;)

          Maybe he just happens to have a Linux laptop that he's willing to plug into the network to do the scan/diagnostic stuff. Maybe he wants OSS because he doesn't want to use / can't afford to spend expensive software solutions.

          I use Linux at work, while most of my coworkers have either WinXP or OSX. Although sometimes a task is better accomplished from one of my linux boxes, I'm not rushing

        • You know what's worse than someone who answers a question without understanding it?

          Someone who tries to justify the first person's cluelessness by trying to posit some alternate universe where it would make sense.

    • Find out what is connected to what and how. More than 90% of the "network problems" I encounter are basic cable issues.

      Remember, when a NIC is connected to a switch, they only auto-negotiate if both are set to auto-negotiate. If someone sets them to a certain configuration, but doesn't get pair correctly matched, you will have a lot more collisions and such.

      Make sure that your collision domain is setup correctly. Pay attention to the length of the cables. This is where the physical map comes in. You can che
    • I'm replying to this comment but my response is directed toward the OP

      I agree with madaxe42, First things first. Diagram the network. Figure out where hubs and switches are. Figure out where the firewalls are. Figure out how packets traverse the network(s). If it's a single network with a single point of access to the internet this should be (relatively) easy. If you are looking to save the day with linux what you could do is set the switches to use "port mirrors" to capture every packet on the network
  • by Anonymous Crowhead ( 577505 ) on Tuesday October 25, 2005 @07:45PM (#13876812)
    Almost any time I see this, its some random box flooding the network. Just go to your switches...the light that is on solid continuously will point you in the right direction.
    • by Anonymous Coward
      Ok, I found it and unplugged it.

      Now people are shouting at me, something about an Oracle.

      Who the fuck is this Oracle dude and why has he hacked into our network? Is he like that Mitchick character? I hope they never let him out of prison!

    • Actually, another cause could be a looped network connection. We have problems with students who will connect two network jacks together, thus creating a loopback in the switch they are connected to. Generates a whole lot of network traffic. Basically they were doing this when they had exams requiring computers, because bringing down the network ensured no exam...
  • by Usquebaugh ( 230216 ) on Tuesday October 25, 2005 @07:54PM (#13876874)
    No use fixing symptoms go after the root cause.
  • OSS? Linux? WHY? (Score:4, Interesting)

    by Gothmolly ( 148874 ) on Tuesday October 25, 2005 @08:03PM (#13876940)
    Whats next, "How do I produce PDF files, using Linux and Open Source?" "How can I leverage Open Source to surf the web?"

    Christ, this is like the late 90's, when everything suddenly had "e" in front of it. Dude, get Ethereal, slap it on any Windows box, and be done. No need to get nerdy with Linux. If you know enough that its broadcast traffic, you're halfway there.
    • " when everything suddenly had "e" in front of it. Dude, get Ethereal,"

      you mean eThereal don't you :-)
    • Re:OSS? Linux? WHY? (Score:3, Informative)

      by Anonymous Coward
      From the readme.win32:

      If you want to use a PC running Ethereal to monitor 802.11 traffic to or from other machines, rather than using Ethereal only to look at traffic to and from the machine on which you're running Ethereal, you should seriously consider running it on a recent version of Linux or of one of the free-software BSDs, rather than on Windows.

    • by smartin ( 942 )
      I have a better idea. Get Linux and slap it on all your windows boxes and be done. For good.
  • by Webmoth ( 75878 ) on Tuesday October 25, 2005 @08:07PM (#13876960) Homepage
    The first step in troubleshooting is in knowing the network topology. How are network segments separated? How are the connected? Where are routers, hubs, switches, etc.? Which switches are managed, and how are the VLANs set up on them? Where are the DHCP servers, and what do they serve? Where are all your network drops?

    Do your network segments have multiple subnets attached to them?

    Is everything subnetted properly?

    The first set of questions are ones YOU should be able to answer. After all, it's YOUR network, and YOU should know how it's set up. The last two are harder to deal with, because these settings may be on computers not in your control.

    Answer the first questions first, then when you are looking at packet traces, TCP/IP dumps, logs, etc. and you see a problem, you'll have a better idea where the problem is physically located, saving much time and energy.

    And then there's the "dumb questions" I shouldn't have to ask: Do you have a loop? Are your cables wired to T568A or T568B standards? Are all your cables in good repair?
    • Are your cables wired to T568A or T568B standards?

      It makes no functional difference which standard you use for a straight-thru cable. You can start a crossover cable with either standard as long as the other end is the other standard. It makes no functional difference which end is which. Despite what you may have read elsewhere, a 568A patch cable will work in a network with 568B wiring and 568B patch cable will work in a 568A network. The electrons couldn't care less.

      • Just make sure that you're only using two twisted pairs and not all four or else you'll have all sorts of apparently random problems. To keep it simple you want a pair on each end of the plug another in the middle with a pair straddling that one. The wiring really is important, when done incorrectly you will have problems, even if it does seem to work
  • I agree with one of the earlier posters; it is probably an infected system or 10.

    The best thing you can do is use a tool such as Ethereal to find the IP of the system or systems causing it, and subject them to a good cleanup.

    For a good toolset, check out the Auditor Security Tools LiveCD for a collection of tools you can take with you wherever you go...

    Auditor tools [remote-exploit.org]

  • It's a NIC (Score:4, Insightful)

    by Fished ( 574624 ) <amphigory@gma[ ]com ['il.' in gap]> on Tuesday October 25, 2005 @08:27PM (#13877058)
    Without any more information, you've got a bad NIC, almost certainly. Look on the switch for the port whose light is always on. As you've describe it, software has almost nothing to do with it. This is a NIC, or a bad switch, or bad cabling, or something.
    • Bingo. First thing I thought of when I heard this.

      While it's *possible* this is a virus (as others have said), I'd look at hardware first. A bad tranciever will generate more bad traffic than a virus could ever hope to.
    • To the parent:

      Or it could be arp flooding, or it could be a virus, or it could be a greedy student downloading music, or it could be too much bittorrent traffic, or it could be a million other things.

      Troubleshooting these things for a living, trust me, nothing is certain until you've figured out what it is.

      To the poster:

      Use ethereal and watch where the traffic is coming from. Use management built into your switches to watch for ports going down when there are outtages. Use traceroutes to find a dead hop (if
      • I strongly qualified my response in several ways. And, for what it's worth, I've been diagnosing networks for 15 years, so I feel qualified to have a strong opinion. When I see a network exhibiting the kind of erratic behavior described by the questioner, first thing I check is for a bad NIC, because 90% of the time that's the problem.

        It certainly could be any of the things you mention. With the vagueness of the original post, it could even be a layer 7 problem (i.e. a crappy Windows server.) But with

  • See man command for further info on these commands.

    Use to ping ip-address to see if you can get to the router and beyond. Make sure "allow ICMP" is enabled in the router.

    Use traceroute -n ip-address to see where the traffic is failing.

    Is it a DNS problem? Try host some.host.name to make sure you can resolve names.

    Is it a DHCP problem? Try dhclient to see if you can get an IP address. (maybe pump on some systems.)

    Connect a hub (not a switch) to some strategic place on the network. Give yours
    • Connect a hub (not a switch) to some strategic place on the network. Give yourself an IP address and check for excessive traffic with iptraf. This will give you a breakdown of what bandwidth is being used by what services.

      I'm only a student, not a systems administrator so I wouldn't pretend to suggest I know what's acceptable and what's not, but this would piss me off if I knew someone was doing this to me. I imagine this kind of behaviour should be kept under one's hat

      Further, random unplugging of cables
      • I'm only a student, not a systems administrator so I wouldn't pretend to suggest I know what's acceptable and what's not, but this would piss me off if I knew someone was doing this to me. I imagine this kind of behaviour should be kept under one's hat

        This would be because you're a student. Students tend to think they have some right to a network and every network resource they can imagine. They don't. On the other hand, the administrator has the responsibility of making sure the network and it's resourc

        • Provided my usage is within the policies handed down from the administrator, I am granted the right to private Internet usage.

          If my activities are suspect, an administrator can and should investigate and this should be mandated in the policy.
  • map, isolate, trend (Score:5, Informative)

    by grattwood ( 533456 ) on Tuesday October 25, 2005 @08:38PM (#13877120) Homepage
    Step 1) Map the network both logically (which networks, what is the routing, etc.) and physically... the "tug test". Label everything, and put it all in a spreadsheet. Tools are nmap, pen and paper, and a label printer. Access to the routers, or being friendly the the router admin is a must.

    Step 2) Isolate the problem protocols and hosts. Be on the lookout for appletalk, IPX, or old netbios. All very chatty protocols. Look for old hubs and replace them with switches. Look for comprimised boxen. Try to VLAN things logically (by department, or usage which ever is best for the environment). Tools are snort, ethereal, ntop, and syslog (any managed switches should be sending to a syslog server (I've used syslog-ng))

    Step 3) Trend as much as you can. Even before the network is cleaned up, start to collect statistics from the switches, and/or hosts on your network. Any gateways should be monitored as well. This will let you see if there are problems corelated to a particular time of day, if your're going over your bandwidth etc. Tools are MRTG, or for more in depth try Cacti http://www.cacti.net/ [cacti.net]

    There is much more after you get to this point, but people will be much happier the faster you get here.

    Good luck
  • Take a step back (Score:3, Informative)

    by tmasky ( 862064 ) on Tuesday October 25, 2005 @08:45PM (#13877158)
    You're attempting to help diagnose a (presumably) large network. Very honourable, but attempting to do this gung-ho with a few responses from slashdot is very silly.

    Grab a consultant from a local small Linux shop for a few days. Someone with good knowledge about system/network architecture.

    Get them to poke around on your network. Provide all documentation you have available.

    After the first day, you should have all the information necessary to write up a document regarding your existing issues. Make notes while he's using tools to investigate. From there you work with the consultant to come up with a separate document for resolutions with a criticality rating.

    From there, you want systems in place to monitor the health of your network. Have a chat to him about it, but I'd be inclined to build a solution which was centered around using Nagios.

    While consultants can (and frequently do) suck when you come to specifics, they are a valuable resource for pointing you in the right direction. And experience counts! They've done this stuff before, they know the pitfalls and proven solutions.
    • Grab a consultant from a local small Linux shop for a few days. Someone with good knowledge about system/network architecture.

      You should read between the lines. He said: I was recently put in a situation...
      Which means he is the consultant. Of course, thanks to a fake curriculum made by the sales representant of the consultancy firm, they sent him while he has no clues about network administration.
  • by Anonymous Coward
    The 10 step Universal Troubleshooting Process
    1. Get the Attitude [troubleshooters.com]
    2. Get a complete and accurate symptom description [troubleshooters.com]
    3. Make damage control plan [troubleshooters.com]
    4. Reproduce the symptom [troubleshooters.com]
    5. Do the appropriate general maintenance [troubleshooters.com]
    6. Narrow it down to the root cause [troubleshooters.com]
    7. Repair or replace the defective component [troubleshooters.com]
    8. Test [troubleshooters.com]
    9. Take pride in your solution [troubleshooters.com]
    10. Prevent future occurrence of this problem [troubleshooters.com]
  • Low-tech (Score:4, Funny)

    by twoflower ( 24166 ) on Tuesday October 25, 2005 @11:10PM (#13877825)
    Low-tech is often a faster and more efficient way to find these sorts of problems. For surveillance and diagnosis, I recommend walking around and watching over students' shoulders. For corrective measures, a couple of taps with a ball-peen hammer usually suffices.
  • Start using tcpdump along with ethereal. Put the Linux box on different parts of the network to see what is happening. If you're in a switched environment, you will see mostly broadcasts. Some broadcasts are required and good (like the necessary ARP requests and possibly DHCP requests when a computer boots and initializes its network devices). However, unnecessary broadcasts are very bad for network performance and can cause "packet storms" which cause outages.

    Start tracking those broadcasts down and find o
  • Fuck (Score:4, Insightful)

    by jericho4.0 ( 565125 ) on Tuesday October 25, 2005 @11:43PM (#13877956)
    You should not be 'helping' anyone with a network.

    Go on, mod me 'insightfull' or mod me 'flamebait', it's one or the other.

    • Shame there's no 'doesn't make the first bit of sense' mod ;)
      • I think his point was that if the network admin was competent, the questioner wouldn't even be needed.

        Of course, if they were competent, there would be no market at all for conslutants in the first place.
    • I work in a school as an catch-all tech guy, hell desk to maintaining the damn *xchange/file cluster. It sounds like he was hired for one thing, and has been told this 'help' is his real job.... it happened where I am at, but not to me :-/ I was twisted enough to be already involved.

      You are right though, he should seek assistance.

  • The first step it to document and baseline the systems.

    For baselining, I'd enable SNMP for all the managed devices. Then use something like MRTG with RRD Tool and chart every port for every switch for week or so.

    While that's happening in the background, start mapping your LAN. Use something like Visio on a laptop and start visiting switches and routers. Confirm the connections between all the routers and switches. Then use good labels (no, not scotch tape and paper) to document those connections with F
  • top 75 list (Score:2, Informative)


    More tools than you could learn in a reasonable timeframe can be found here: http://www.insecure.org/tools.html [insecure.org]

    I would have posted sooner, but T-Mobile's data coverage has been spotty since Wilma hit. Still no power or fuel, but at least I can can get my geek-fix now. :) (at least until my battery dies)
  • What tools and methods are the best practice when trying to use Linux and Open Source to analyze and fix a network?

    These are some of the tools to consider, in no particular order:

    • Nagios [nagios.org]
    • Snort [snort.org]
    • ethereal [ethereal.com]
    • dsniff [monkey.org] (not updated in ages)
    • ncat [sourceforge.net]
    • nmap [insecure.org]
    • nessus [nessus.org] v 2 (or one of the forks of version 3)
    • SARA [www-arc.com]

    You'll have to read the descriptions to decide which ones to try.

    • I'll add ntop [ntop.org] to the list. Plug a box running that into a monitor port and watch the traffic for a while.

      As others have said good documentation of the Network is a must. I was thrown into a similiar situation a year or 2 back at my highschool (I graduated in 94, so it wasn't as a student). Aftering doing a walk through of the network and finding every single hub (there where 2 switches) and what was attached to it we could then easily locate some of the problems. In some cases they have hubs chained
  • read your network (Score:3, Informative)

    by graf0z ( 464763 ) on Wednesday October 26, 2005 @04:39AM (#13878983)
    Troubleshooting a network is a matter of experience, not of some particular tools. But these things help:

    * Put you box on the monitor/mirror/analysing port of the switch an read the traffic with tcpdump/tethereal/ethereal (If you just want to check the broadcasts, it does not have to be a monitoring port). Edit the packet filter expression until you do not see the legal/uninteresting traffic anymore but only the suspects. (They are students? Have fun to filter all the p2p traffic ;-) Let ethereal make statistics over the traffic.

    * Watch out for ICMP errors, especially ICMP-redirects. Watch out for TCP-resets. Watch out for fragments. Watch out for malicious Spanning-Tree packets. Watch for SMTP to many IPs (spamming trojans), IRC (zombies), weird packets eg. fragmented UDP (zombies attacking a target)

    * Check the MAC adresses in the etherframe-header ('tcpdump -e'): are they constant? If there are packets IP_AIP_B, are the accordings MACs really MAC_AMAC_B or MAC_A-->MAC_B and MAC_B-->MAC_C instead?

    * Install an arpwatcher. Stealing the default-gateway's MAC is an effective DoS attack on a network.

    * Put 2 NICs into a fast linux box, bridge ('brctl') them together, put this linuxbridge in front of the default-gateway. Dump again. Install a snort on it and let it see the traffic - what does the snort log say?

    * Do the switches have the feature to log to a remote syslog deamon? Do so and read those logs! Check all the snmp-variables on the switches, especially the "errors". Read the logs of the default-gateway.

    * Watch the amount of traffic (snmpget the port-counters of the switches and make mrtg-graphs of the results). Maybe the problem only strikes if some switch ports are under high load?

    * Scan the network with nessus. Maybe you'll find some bindshells.

    * ...

    Hope this helps.

    g.
  • Get someone who does this for a living. I am sure there are a few in your local linux shop. Someone who works at an isp should have experience with the problems you site.

    Step 2

    Follow his/her recommendations (which will probably be splitting the network in more l3 domains) get a 6500, or a few 3750, or if you really can't afford much a few 3550 switches (which will leave you out of luck when ipv6 starts getting used, but otherwise is a fine choice).

    This is about having L3 switches closer to the end user than
    • So, let me quickly summarize your solution:


      1. Get a consultant.


      2. Blow $50K in Crisco hardware (yah, you heard me, Crisco, not Cisco)


      3. Put a bunch of snot-nosed barely literate retards, err, sorry, students on a L3 network where they can run fscking kazaa all day.



      I haven't laughed this hard in a while :)


  • Make sure you're students haven't started looping back the cables from one network socket to another, always make sure an unconnected network point isn't connected at the patch panel/switch end, - it's just asking for trouble, the more physical restrictions you have on your network, the easier the rest will be to manage. - Rogue access points may also be a downfall of you network, check for them!
  • Always check the physical layer first.

    Just this summer I tracked down an error that was caused by a cisco wireless access point trying to pull electricity from the cat5. It was UNPLUGGED from the power! It took down a whole segment of the network.

    The way we found it was from the solid light on the switch.
  • If you are going to try to write up docs....

    If they run Cisco equipment, a show cdp neighbor will help you a lot. Keeping up to date documentation on a network (especially a large one) is a difficult task, but it will make solving future problems much easier.
  • Here are some tools I use for just about the same thing your about to do. And a brief reasons why I use each. Start with one, then once mastered move to another.

    • ethereal - good for interactive network monitoring and also for analyzing caputures from tcpdump
    • tcpdump - You can use this to caputre the network traffic. Have the router bridge a port so you can monitor everything. Can be a fundimaental component in a network recorder to record while your not there.
    • snort - if a packet matches known goofy stu

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...