Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Software Linux Hardware

Best Linux Hardware Diagnostics? 42

An anonymous reader asks: "I've been running Linux for a little while and usually hardware problems have shown up quite easily - kernel panic, no module, no networking, etc. - but recently I've encountered some problems with network disk access causing very high load, which I think might be hardware related. Under Windows I'd fire up SANDRA or the like and run a full system scan. I did a quick search and nothing really stood out. I was wondering if any Linux gurus out there would like to share their expertise on Linux diagnostics?"
This discussion has been archived. No new comments can be posted.

Best Linux Hardware Diagnostics?

Comments Filter:
  • Well, /var/log/messages is a good place to start. Check your nfs/smb logs, and if all else fails, use a kernel debugger. I think this is what you are asking, if not please clarify.
    • perhaps you could recommend a kernel debugger for him
      • Once again, if you need to ask...... Well..

        OK, gdb and kdb are the only debuggers I know of - not to say there aren't more but these are the old standards. Take a gander here [kernelhacking.org] for more information. There is too much to go over in a post.
      • To mod this as funny, or insightful

        Why dear god won't they put a real debugger into Linux - it is the only thing missing to make it a world class operating system

        • by karnal ( 22275 ) on Wednesday August 03, 2005 @05:25PM (#13234921)
          Actually, I would think "World Class" would mean the terminal would scream at you if you had an issue:

          "HEY, YOUR HARD DRIVE IS FUCKED!"

          That would truly rock. Instead, today you can only sift through "/dev/hda: device not ready" errors in the logs... :)
          • Aw, come on, the /dev/hd? errors are a pretty good clue the drive is screwed. :)

            We've been doing data center moves this month, and checking/refurbishing every machine as it gets shipped over. I take any /dev/hd? error as "the machine got dropped, and the drive is dead". If it doesn't turn on or it kernel panics for no aparent reason, I consider it a dead motherboard. For most of our machines, that's a fairly good guess, since everything's integrated.

            We have no expectation
            • So, uh, whatcha doin' with the old hardware...? Need a place to get rid of it? :)
              • I'm gathering a nice pile of 500Mhz machines with 256Mb ram. I'm thinking of making a beowulf cluster in my living room. :) We're keeping anything newer than 1Ghz online. That's my cutoff for this upgrade cycle.

                Actually, I have plenty of things to try out on 'em. Sometimes it's nice to have a few dozen old boxen laying around. The power company loves it, and my girlfriend ... well ... she has words for my own personal server farm in the living room. I can't repost most of them here. :
            • "That array was carried on an airplane, to be "sure" it was safe. It had more damage than any other piece of hardware, even the ones that UPS drop-kicked. I'm amazed by some of the damage I've seen. I haven't figured out how they bent some of the metal, since the boxes looked fine, and they were well packed."

              Are you sure you don't want to work for UPS? I'm sure [slashdot.org] some companies pay for better reviews than "was drop-kicked"

              (yes, I'd agree with you for every parcel carrier I've used!)
              • They're pretty lucky I'm not posting the pictures of the damage we've gotten. By far, the internal damage done by the TSA has been the worst, but some of it's pretty good.

                The Promise VTrak 15100 arrays are very large, heavy, and sturdy boxes. They have very thick plastic handles on the front to ease putting it in the rack (theoretically). In all reality, you can lift them from the floor by the handles, but unless you can manage to hold over 100 pounds horizontally just by the handles, y
          • "Actually, I would think "World Class" would mean the terminal would scream at you if you had an issue: "HEY, YOUR HARD DRIVE IS FUCKED!"

            I actually managed to derive that information from watching a friend's Windows PC boot... was right as well, but people always assume it's a software problem.
        • I know. It would make diagnosing a strange network problem of mine (which only appears on Linux, not Windows or *BSD) much, much easier. I remember reading that Linus doesn't want it built-in for some reason or another. *sigh*
          • i can't understand why you would want to use a kernel debugger to troubleshoot network problems. Wouldn't a sniffer be a better idea? That way you can check out what's happening on the network level, where the problem seems to be.
            • No. The problem is with the kernel (ie, the networking stack). The exact same hardware has no problems with other OSes, as I said.
              • You could be right. Another possibility: the nic-drivers. Excuse me for asking, but did you check things like network speed and duplex-settings? there might be a difference after operating system installs.
                my point was that if you use a sniffer you can see the communication on the network itself and where and why it fails. that's more informative than staring at kernels.
  • There isn't really a "suite" that I know of like SANDRA;however, using your system logger (sysklogd, metalog etc) you can get a real good view of what is going on with your system. You may want to enable some debugging settings in kconfig and recompile the kernel so you get more info in the log. If you have any programming experience you can try profiling the kernel and any problems you can attempt to correct or post to the LKML. But really it sounds to me like your in need of a distro change. Try debia
  • I thought the article was titled "Best Linux Distrobution" when my eyes passed by it, wouldn't that have been a fun discussion.
  • /var/log/messages or dmesg

    Both should display kernel messages from boot-up. Kernel boot messages usually contain the information you need to track down IRQ conflicts.

    MemTest86

    Not really a Linux program, but something I usually stick as a boot option in grub. Does a great job at detecting bad Ram. MemTest86 can also be booted from a floppy.

    BadBlocks

    This utility can be used to find bad blocks on a disk partition. I've used it before to check disks.

    You might also want to check out some syst

    • LILO used to be the best check against bad kernel entries. It seems like GRUB is now the standard, though you use to just run "LILO". If you entries are bad, you'll know asap from the slew of errors in your telnet window.

      • I had to switch a RedHat machine from Grub to Lilo. that was a lot of fun. The version of lilo that they included didn't actually work right. For some reason, it had no concept of SCSI drives. I don't ask why... I replaced it with one I had compiled from scratch and packaged when I was trying to build my own distro. Sometimes that practice comes in very handy. :)

        Really, I like lilo much better, just as long as you don't do something silly like "lilo ; shutdown -r now". Always make sur
        • Really, I like lilo much better, just as long as you don't do something silly like "lilo ; shutdown -r now". Always make sure it kicked back a friendly message. :)

          Duh. Thats why you do "lilo && shutdown -n now".
    • BadBlocks - This utility can be used to find bad blocks on a disk partition. I've used it before to check disks.
      Use smartmontools [sourceforge.net] to get S.M.A.R.T disk info (smartctl -a /dev/hdX). Nowadays hard disks substitute unreadable sectors with spare ones - transparently to I/O subsystem.
  • Heard of knoppix? (Score:5, Informative)

    by mnmn ( 145599 ) on Wednesday August 03, 2005 @05:21PM (#13234879) Homepage
    We keep knoppix CDs just for this purpose; hardware diagnostics. dmesg and the /var/log/messages provide information that is otherwise hard to obtain from Windows 2000 or XP, especially if you cant boot the windows.

    Another crucial thing is lspci, which is absent from windows. Say you do a fresh install of windows, which does not detect the network card. How do you know what card is it to obtain the drivers for? In windows you just cant so easily get the PCI information. Enter knoppix.

    I have also used memtest in knoppix and found memory issues before, where windows simply acted up. The problem with windows is you have to boot the entire OS and take ~130MB of Ram and resolve all IRQs before you can run Sandra or the likes. Memory issues, disk issues or IRQ issues will prevent you from booting even.

    Knoppix when booted in single-user mode takes little memory, and you can boot it not to use ACPI, not to use HLT instruction, not to detect SCSI that might freeze the system etc. Then you can diagnose the system. Just get a CD and read the man pages of various tools on the CD.
    • I remember having to dig into the registry entries to get PCI IDs of devices, then looking them up on sourceforge [sf.net]. But those days, astonishingly, are past. Windows XP's Device Manager has a nice bit where it has a "Details" tab, with "Hardware IDs". For instance, the 3C905-TX in this computer reads as PCI\VEN_10B7&DEV_9200&SUBSYS_100010B7&REV_6C
      . And yes, it does this for unknown devices as well. So no more registry digging.

      The problem with their "recovery mode" being seriously weaker than the e
  • if network access is causing "load" and not "cpu usage", you need to look at kernel stuff - drivers, TCP/UDP windows, ethernet statistics, etc.
  • good tools (Score:4, Informative)

    by krakrjak ( 227602 ) <`krakrjak' `at' `gmail.com'> on Wednesday August 03, 2005 @05:57PM (#13235228) Homepage
    lspci
    cat /proc/cpuinfo
    lsusb
    cat /proc/scsi/scsi
    ls /dev (if using udev)
    dmesg|less (or more depending on your PAGER)
    free

    These usually are enough to determine if BIOS thinks your hardware exists. And also this should help determine if the kernel has loaded a driver and given a device node to your hardware. If you need to know if a harddrive is bad (or partition) you can use the old standby:
    dd if=/dev/ of=/dev/null

    That will tell you if you can read all the data on the device or not. Hope that helps.
  • by runswithd6s ( 65165 ) on Wednesday August 03, 2005 @06:12PM (#13235367) Homepage
    Just to add this to the suggested list of applications:
    smartmontools
    control and monitor storage systems using S.M.A.R.T.
    lmbench
    Utilities to benchmark UNIX systems
    memtest86
    Test your memory on x86 platforms
    nictools-nopci
    Diagnostic tools for many non-PCI ethernet cards
    nictools-pci
    Diagnostic tools for many PCI ethernet cards
    lm-sensors
    utilities to read temperature/voltage/fan sensors
    mbmon
    Hardware monitoring without kernel dependencies (text client)
    sensord
    hardware sensor information logging daemon
    crashme
    Stress tests operating system stability
    fuzz
    stress-test programs by giving them random input
    spew
    I/O performance measurement and load generation tool
    stress
    A tool to impose load on and stress test a computer system
    cpuburn
    a collection of programs to put heavy load on CPU
    ltp
    The Linux Test Project test suite
  • UBCD (Score:4, Informative)

    by Jsutton1027w ( 757650 ) on Wednesday August 03, 2005 @06:34PM (#13235583) Homepage
    The Ultimate Boot CD [sourceforge.net]: It's basically a compilation of different boot disks, all put in a nice menu system on a freely-downloadable ISO image. While it's not really Linux (though it contains a number of Linux-based boot disks), it is one of the best utility CD's that I've ever encountered for testing hardware.

    Also, Knoppix is another one that I would suggest, though I use it more for data recovery these days. ;)
  • Eurosoft do a product called PC-Check. It's not cheap (£150) but it works very well. You get a bootable floppy (which you can copy) that tests just about everything in your PC.

    Best of all, you just slap it in a machine, let it run for an hour, and come back to see the results.
    • Bootable floppy is great for those that still have a floppy drive.
      • We keep a couple of USB floppy drives on hand.

        You can also make bootable CDs quite easily from a bootable floppy disk (in Nero anyway). The only downside to a bootable CD is that you can't save and later print the log files.
  • less /var/log/debug
    less /var/log/dmesg or dmesg | less

    Varioues files under /proc

    I prefer less as it gives more options such as MOVING , SEARCHING etc

    Also you can write your own custom script to digout information not just from one linux server but from other Linux/BSD servers and email/page back the results.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...