Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Software Operating Systems

Why Do Computers Still Crash? 1533

geoff lane asks: "I've used computers for about 30 years and over that time their hardware reliability has improved (but not that much), but their software reliability has remained largely unchanged. Sometimes a company gets it right -- my Psion 3a has never crashed despite being switched on and in use for over five years, but my shiny new Zaurus crashed within a month of purchase (a hard reset losing all data was required to get it running again). Of course, there's no need to mention Microsoft's inability to create a stable system. So, why are modern operating systems still unable to deal with and recover from problems? Is the need for speed preventing the use of reliable software design techniques? Or is modern software just so complex that there is always another unexpected interaction that's not understood and not planned for? Are we using the wrong tools (such as C) which do not provide the facilities necessary to write safe software?" If we were to make computer crashes a thing of the past, what would we have to do, both in our software and in our operating systems, to make this come to pass?
This discussion has been archived. No new comments can be posted.

Why Do Computers Still Crash?

Comments Filter:
  • crashes? (Score:4, Interesting)

    by Moridineas ( 213502 ) on Tuesday May 20, 2003 @09:03PM (#6003338) Journal
    Well the computers that I manage we've got an OpenBSD server hat never crashes (uptime max is around 6months--when a new release comes out) and a FreeBSD server that has never crashed--max up time has been around 140-150 days, and that was for system upgrades/hardware additions.

    On the workstation side they are definitely not THAT stable, but since we've switched to XP/2K on the PC side, those pc's regularly get 60+ days of uptime. Just as a note--I had a XP computer the other day that would crash about two or three times a day. The guy that was using it kept yelling about microsoft, etc etc etc. Turned out to be bad ram. After switching in new ram it's currently at 40 days uptime (not a single crash).

    For some reason the macs we have get turned off every night so their uptime isn't an issue, but from what I hear OSX is quite stable.
  • Touchy subject (Score:5, Interesting)

    by aarondyck ( 415387 ) <aaron AT ufie DOT org> on Tuesday May 20, 2003 @09:04PM (#6003346) Homepage Journal
    I remmeber years ago having a conversation with an IT manager at IBM. We were talking about the inability of computer programmers to make their code foolproof. His point was that we don't see problems like this with proprietary hardware. When was the last time someone crashed their Super Nintendo? Of course, with a PC platform (or even Mac, or whatever else) there are problems of unreliability. His idea is that this is because of sloppy programming. The reason we were having this conversation is that I had a piece of software (brand new, I might add) that would not install on my computer. You would think that a reputable software company (and this [sierra.com] was a reputable company) would test their product on at least a few systems to make sure that it would at least install! The end result was that I ended up never playing the game (not even to this day), nor have I purchased another title from that company since that time. Perhaps that is the solution to the root problem?
  • by Hanji ( 626246 ) on Tuesday May 20, 2003 @09:04PM (#6003347)
    Scientific American actually had an article [sciam.com] on a similar topic. Basically, they seem to be accepting crashes as ineveitable, and were focusing on systems to help computers recover from crashes faster and more reliably...

    They also propose that all computer systems should have an "undo" feature built in to allow harmful changes (either due to mistakes or malice) to be easily undone...
  • by IvyMike ( 178408 ) on Tuesday May 20, 2003 @09:05PM (#6003363)

    People upgrade for new features. That computer/OS/gizmo you have today does a lot more than the one from 10 years ago. That's a lot more code that needs to be written, and thus a lot more opportunity for errors. It's that simple.

    (I'm actually ok with that. I'd rather have a moderately crashy Windows XP box capable of playing GTA:Vice City than the hypothetical alternative: a super-stable Windows95, capable only of playing "Doom 2".)

  • Re:Simple ... (Score:5, Interesting)

    by The Analog Kid ( 565327 ) on Tuesday May 20, 2003 @09:07PM (#6003377)
    Yes, on my parents computer, which has 2000 on it(tried Linux it didn't work for them). I set most of the services to manual that aren't needed. Disabled Auto-update. Put it behind a router ofcourse. The only problem remained was Internet Exploder, well I just installed Mozilla with an IE theme, haven't noticed a difference). I think killing most of the services keeps it up. Haven't had a problem with it. This was done before KDE 3.1.x so who knows Linux might work after all.
  • Re:OS X (Score:1, Interesting)

    by aarondyck ( 415387 ) <aaron AT ufie DOT org> on Tuesday May 20, 2003 @09:08PM (#6003389) Homepage Journal
    I've crashed OS X. It wasn't even that hard, really. I just did a bit of extremely intensive stuff with Adobe InDesign and it died. I found that it was also far more resource-intensive than most other Operating Systems I use. Perhaps it's just the way I use it, but I think that the only OS I haven't been able to crash is DOS.
  • by dsanfte ( 443781 ) on Tuesday May 20, 2003 @09:09PM (#6003401) Journal
    The ultimate solution to the problem is to let computers write the software themselves. Give them a goal, set up evolutionary and genetic algorithms, and let them go at it on a supercomputer cluster for a few months.

    Of course, you'd need to make sure the algorithms that humans wrote aren't flawed themselves, but once you got that pinned down, you would be more or less home-free.

    Even if you didn't take this drastic a step, another solution would be computer-aided software burn-in. Let the computer test the software for bugs. A super-QA Analysis if you will. Log complete program traces for every trial run, and let the machine put the software through every input/output possiblity.

  • by Anonymous Coward on Tuesday May 20, 2003 @09:11PM (#6003420)
    "How often does your 4 function pocket calculator crash?"

    Well, maybe never, but...

    My calc professor once put a question on an exam that was designed to crash a TI-89.
  • by KentoNET ( 465732 ) on Tuesday May 20, 2003 @09:12PM (#6003427)
    Windows 2000 crashed just as much as most modern Linux distributions. Zaurus's problem is more likely due to the crappy GUI they decided to use (Qtopia). I'd be willing to bet that the jerk-off never attempted pinging the Zaurus to see if it actually crashed. Mine has had no crashes yet, after 11 months of use. It's WLAN capabilities are quite nice, too.
  • by hawkstone ( 233083 ) on Tuesday May 20, 2003 @09:15PM (#6003454)
    I'm sure it's harder to accomplish this for kernel level code (it's primarily OSes being pointed at right here) but you can think everything is working hunkey-dorey and not realize something is going wrong under the covers.

    Most errors of this can be found with testing under tools like valgrind [kde.org] or Rational's purify [rational.com]. I'm sure there are others (I've heard of ParaSoft Insure++, ATOM Third Degree, CodeGaurd, and ZeroFault), but the quality of these tools really matters.

    The issue is that tiny errors can cause crashes intermittently, and not immediately. For example:
    uninitialized memory reads -- usually not a problem, but if this value is ever actually used, it will be.
    array bounds reads -- never acceptable, but depending on the structure of memory, may not always cause an immediate crash.
    array bounds writes -- like ABRs, may not be immediately fatal, but these are going to crash your code sooner or later.

    Since they don't always cause an immediate crash, these errors are likely to creep in to released code without use of one of these tools. And if you want to know why we shouldn't always run programs in an environment that checks these kinds of things, try it once; you'll notice a speed hit of usually an order of magnitude. C/C++ is a perfectly acceptable language -- not all debugging has to be done by the compiler/interpreter or only after you notice a problem.

    Anyway, hope that wasn't too pedantic....
  • by Samir Gupta ( 623651 ) on Tuesday May 20, 2003 @09:18PM (#6003495) Homepage
    In the world of games, especially console games, a crash immediately spoils the user's gameplay experience, and it's doubly so if you don't have a mechanism to patch games as in the PC world.

    In the GameCube, crashes are alleviated by having only a thin OS layer between the hardware and the game, and restricting only a single task to be run in a single privilege level of the CPU, avoiding context switches and going back and forth between user and kernel mode which introduces complexity and can wreak havoc if malicious data is present.

    Furthermore, we have a set hardware configuration, running a well defined consistent set of drivers, which are again, minimal, and this eliminates another factor that often leads to crashes in the PC world.

    The most important thing though is robust software design. In our games, we all code exception handlers for the software, so that a single errant NULL pointer doesn't bring the whole thing down with a "Segmentation fault" message as PC users seem to experience with their software, but rather, we gracefully recover, perhaps immediately rolling back to the previous iteration in the game loop and "moving" the player a bit, for instance, in a FPS where the player might have entered into an area in a orientation that happens to create a divide by zero error due to numerical imprecision.

    In the future with CPU and memory speeds increasing, we are investigating new designs, such as microkernel based architectures where individual game entities are separate protected "processes" that communicate via some fast IPC mechanism such as shared memory or a "tuplespace", so that a bug in one entity doesn't bring the whole universe crashing to a halt, and I hope that such techniques are adopted by the general computing world.
  • Re:And (Score:1, Interesting)

    by Anonymous Coward on Tuesday May 20, 2003 @09:19PM (#6003508)
    FACT:

    People who lack with facility with English generally write shitty interfaces that people loathe using, even if the code is "clean".

  • by Christopher Thomas ( 11717 ) on Tuesday May 20, 2003 @09:20PM (#6003519)
    The ultimate solution to the problem is to let computers write the software themselves. Give them a goal, set up evolutionary and genetic algorithms, and let them go at it on a supercomputer cluster for a few months.

    Unfortunately, this only works if you can distinguish between buggy and non-buggy code produced by the algorithm. You can do tests, but no test suite will be exhaustive (otherwise we'd just use it on human-developed code to find the bugs).

    Perfect software can only be produced if a formal proof of correctness is possible. Even then, you're limited by the assumptions the proof makes.
  • Re:Microsoft (Score:5, Interesting)

    by VTS ( 673706 ) on Tuesday May 20, 2003 @09:20PM (#6003520)
    Some time ago I would have agreed with you, but not anymore, If media player crashes playing some video then the whole system becomes unstable and then even doing something like sending a file to the recyclebin freezes the UI...
  • by jone1941 ( 516270 ) <jone1941@nOsPAM.gmail.com> on Tuesday May 20, 2003 @09:21PM (#6003525)
    There are a lot of moving parts in a working linux system (I'm talking CLI here), however, it seems to be less prone to crashing. As someone previously mentioned, software that is larger and more complex is more likely to have a bug. The point I'm getting at is that the design priciples of *nix dictate many small programs to create a large working system. When a program is small it can be designed and developed with care. This leads me to my final though, modern Operating Systems with GUIs are less stable because they are generally designed as large monolithic systems.

    I'm going to claim that the prime reason systems with GUIs (and I'm including everyone) are unstable is because noone has come up with a rock solid base for such a system. X is not solid, windows explorer, mac os x's application manager, no one has it right.

    The one thing I am leaving out, is that drivers also tend to be a major cause of instability. I cannot run the nvidia driver on my gentoo box, certain usb events can bring a system to a screetching halt. What needs to happen is better design around the unstable interfaces, such that in the worst case scenario, things can still be recovered.
  • by Jeremi ( 14640 ) on Tuesday May 20, 2003 @09:24PM (#6003545) Homepage
    The ultimate solution to the problem is to let computers write the software themselves. Give them a goal, set up evolutionary and genetic algorithms, and let them go at it on a supercomputer cluster for a few months.


    That only works if you can write a fiteness algorithm that can tell whether the program did the correct thing or not -- otherwise, you have no way to decide what to "breed" and what to throw away. And for many types of program, that fitness algorithm would be more difficult to write than the program you are trying to auto-generate...


    Of course, you'd need to make sure the algorithms that humans wrote aren't flawed themselves, but once you got that pinned down, you would be more or less home-free.


    All you've done is replace a hard problem ("write a program that does X") with a harder problem ("write a program that teaches a computer to write a program that does X"). No dice.


    Even if you didn't take this drastic a step, another solution would be computer-aided software burn-in. Let the computer test the software for bugs. A super-QA Analysis if you will. Log complete program traces for every trial run, and let the machine put the software through every input/output possiblity.


    For most modern programs, there isn't nearly enough time left before the heat-death of the universe to do this. Hell, for programs other than simple batch-processors, the number of possible input and outputs is infinite (since the program can do an arbitrary number of actions before the user quits it)

  • by callipygian-showsyst ( 631222 ) on Tuesday May 20, 2003 @09:32PM (#6003616) Homepage
    Of course, there's no need to mention Microsoft's inability to create a stable system
    My Windows XP box, which is my fileserver, has been up for 5 months so far.


    My OS X box, which I use for web browsing and word processing, crashes about once every three days.


    Now, I certainly have some bones to pick with Microsoft, but Apple is no better.

  • by zrm8y5m02 ( 662887 ) on Tuesday May 20, 2003 @09:39PM (#6003664)
    instability is inevitable for fast evolution. A stable system means its not evolving fast enough, or evolution is slow.

    Many softwares have been evolving so fast that there's been no time to perfect the existing features before adding new ones. At some point in the lifetime of indivisual software, it reaches a point where it's somewhat "stable" in the sense that no more major features are needed. For example, TeX reached its relative maturity during 80s and IIRC, there's no known bug at this point.

    If all softwares are given enough time, they will all reach that kind of maturity. The problem is not all of them can survive that long - usually they become obsolete before they become stable...
  • by noda132 ( 531521 ) on Tuesday May 20, 2003 @09:40PM (#6003667) Homepage

    Use OpenZaurus [openzaurus.org] and while crashes still appear (I assume 3.2 will eventually, though I haven't had a full crash since it first came out), crashes will not lose all your data, since it's written to flash.

    Also, my Linux box hasn't crashed this year, and I can't recall any crashes last ye-- no, wait, there was one slew, but it was an icky driver which I got rid of. I'd say a pretty good track record for a system built almost entirely from CVS.

    Can't remember any crashes this year or last on any other Linux boxes I manage that I can think of (8 boxes off the top of my head).

  • Turing showed this (Score:4, Interesting)

    by martin-boundary ( 547041 ) on Tuesday May 20, 2003 @09:41PM (#6003671)
    A crashed computer is a computer that's stopped. Alan Turing proved in 1936 that the halting problem is unsolvable. So, it's impossible to know when and how a computer is going to crash or not under all possible circumstances (inputs).

    Accept it. It's a fact of nature.

  • by dirk ( 87083 ) <dirk@one.net> on Tuesday May 20, 2003 @09:44PM (#6003696) Homepage
    When can we finally give up the FUD of "MS crashes all the time"? Anyone who has used a later MS OS (Win2k or XP) can easily see they crash very rarely. I have had my Redhat install have more problems than my Windows install in the past 6 months, and on the MS system most of the problems have been 3rd party software while on the Linux most of the problems have been the OS itself. The reason systems crash is that there are many pieces, written by many different people, interacting with each other. This is the same whether the OS is Linux of Windows. The harping on the instability of Windows does nothing but hurt the Linux cause, since anyone who actually uses a newer version of Windows knows that the person has no basis in reality.
  • Time is Money. (Score:5, Interesting)

    by Rimbo ( 139781 ) <rimbosity@sbcgDE ... net minus distro> on Tuesday May 20, 2003 @09:53PM (#6003744) Homepage Journal
    I think this is basically the right answer.

    A couple of months ago, the company I worked for spent a lot of time and effort developing a robust testing methodology. We had a software product that through blood sweat and tears would not crash unless you basically blasted the hardware in some way.

    But that led to two problems. First, we only had so many people working, and resources spent testing and bugfixing were not being used to add new features. Second, the time it took to get it that robust delayed the product's release beyond the point where we could recover the investment. [Time developing] * [Cost of operating] was greater than [expected number of units sold] * [price per unit].

    What ended up happening was that we lacked the features to justify the price and number of units we needed to sell to cover the cost of developing it. We had no bugs -- and we could be certain of it -- that would crash the machine.

    As of last month, the company could no longer afford to pay me. I'm not there any more.

    The moral of the story is that trying to make a bug-free product will bankrupt your company, especially a startup. Software tools have improved, but the benefit largely goes towards adding new whiz-bang features that sell the product for more money, not to being able to fix more bugs.

    What we should do as engineers and managers of software products is to not be afraid of getting the product out the door with a few bugs in it if we want our company to do well; this business reality is ultimately why bugs will a big part of software for the forseeable future.
  • by Blkdeath ( 530393 ) on Tuesday May 20, 2003 @09:58PM (#6003787) Homepage
    It takes signifigantly more skilled developers and more testing (i.e. expensive) to make systems that don't crash, and consumers(including you) won't pay for them.

    Is there a time where the development methods and quality control learned from these large, mission-critical projects will find their way to the consumer product market? If not, why?

  • by Jerk City Troll ( 661616 ) on Tuesday May 20, 2003 @10:02PM (#6003820) Homepage
    My OS X box, which I use for web browsing and word processing, crashes about once every three days.

    The Ti PowerBook G4 I am writing this post on is running Mac OS X 10.2.x. It goes in an out of sleep on an irregular basis, and not always when it is idle. I swap PCMCIA cards in and out. It hops from network to network. I do a lot more than browsing and word processing.

    According to my Konfabulator uptime widget, I have 83 days, 23 hours, 20 minutes. My load average at the moment is 1.7. It has not been rebooted since I installed OS X (I did it myself after buying it just for messing around purposes).

    You sir are either lying, have bad hardware, or you've severely corrupted your installation. This operating system (which is BSD) is solid as a rock.
  • by Anonymous Coward on Tuesday May 20, 2003 @10:06PM (#6003843)

    The current issue of Scientific American states that 51% of crashes are due to user error. 15%=software error. 34%=hardware error. Refer to article for further info.

    You made a little "user error" there yourself-- the article says that 34%=software error and 15%=hardware error.

    Oh, and those figures are just for Web applications, not software applications in general.

    It's an interesting article. Unfortunately, they're not very clear about what constitutes a "user error." I've filled out Web forms that gave me an "error" when I included hyphens in my phone number or credit card number. That's far from an error, it's just poor user interface design.

    In my opinion, something the user does should never cause a program or operating system crash. If this can occur, it is the developer who is at fault, not the user.

    Apple's Human Interface Guidelines are a nice introduction to user-fault tolerance, even if you're developing for other platforms.

  • by bgalehouse ( 182357 ) on Tuesday May 20, 2003 @10:15PM (#6003914)
    Yeah, ok, code crashes most often because it is written incorrectly. And cars and planes crash most often because people drive incorrectly. Entirely true, but not at all usefull.

    My pet explanation is that computer code is in many ways like legal code, with computers playing the part of honest criminals. They follow the law exactly, and walk through loopholes without even thinking about it.

    So you patch the loophole. This, oddly enough seems to make the code bigger. To further contribute to the code bloat, at different times different legislators/developers have different opinions about what the goal of the code is, and different areas which they own. Small patches are thought of as being safer, but interactions with other bits lead to... more loopholes. This is also why both Windows 2000 and the US tax code seem to take up a lot of storage space.

    This then makes clear the value in refactoring, not that I really expect the tax code to be replaced with something sane anytime soon. Following this line of reasoning we also see why carefull encapsulation is so important - so that one can rewrite one module of the system without affecting others.

    Advanced language features such as garbage collection and strong typing don't eliminate bugs. However, they do eliminate certain classes of bugs (segmentation faults) and so reduce the bookeeping required to produce bug-free code. Since this class of bugs is one of the most expident for hackers to take advantage of, there are also disproportinate security benefits to using safe languages.

    On a final note, testing is a defense against bugs, but I believe that testing, especially black box integration testing, should be a final defense that is rarely sees action. If the developers can't find all but a few of the bugs with their own testing, the developers have lost their perspective on how the code works. If a separate QA team finds 1000 bugs, in my experience, the development process has failed and the system will always seem buggy.

  • Re:Speed (Score:4, Interesting)

    by mackstann ( 586043 ) on Tuesday May 20, 2003 @10:16PM (#6003921) Homepage

    Well said, I would have to agree with the majority of your post. The only thing I have an issue with is:

    I'm not quite an old fogie yet in the software world, but can at least claim to have been around (professionally) for about a dozen years and worked for about half a dozen Fortune 500 companies (plus a few startups).


    [..]

    It's soured me so much that I'm actually considering ditching the IT world outright.

    It took you over a decade? I've been working in "light" IT for about 4 months, and I already have come to this unfortunate conclusion. Writing commercial software just isn't fun, not only do you have to write software that you may not find all that interesting, but you also are denied the opportunity to use your skills to the fullest and create something that you are truly proud of. Corners are cut, and in the end, you realize, that it's just a "product", or an in-house "app", it only needs to work "good enough", nevermind if the code needs cleaning up or whatever other issues there are (they don't (seem to) exist if you're not staring at the code!).

  • by jonadab ( 583620 ) on Tuesday May 20, 2003 @10:20PM (#6003961) Homepage Journal
    Computers crash (and have any number of other problems) largely
    because almost all software is still developed using third-generation
    ("high-level") languages. These languages place on the programmer
    the burden of such fiddly details as allocating and freeing memory
    and checking the size of allocated memory to see that it's adequate
    for the data being copied in.

    *Most* of the time when an application crashes seemingly at random,
    it's a memory allocation problem of one kind or another: a buffer
    that was allocated to small and gets overrun, or a pointer error,
    or something of that nature. When an application (or your whole
    system) grows more sluggish the longer you leave it running, that's
    usually a memory leak: something was allocated and not released
    properly -- repeatedly. All of these problems result from a lack
    of excruciating vigilence on the part of the programmers when using
    a language that requires it. In a large project, maintaining that
    ceaseless caution is a nightmarish prospect.

    Languages (both interpreted and compiled languages) have been around
    for over a decade that handle these things, freeing the programmer
    to concentrate on developing the more high-level features of the
    software, but because this checking imposes some overhead (in terms
    mostly of CPU time and sometimes some memory footprint), they don't
    get used for most applications. Yet.

    The time is coming, though. The value of VHLLs is beginning to be
    recognised, *finally*. When software is written in a language with
    built-in memory management, problems like segmentation faults (core
    dumps in Unix; in the Windows world these are known as Illegal
    Operations, formerly known as General Protection Faults) and buffer
    overruns go away entirely.

    Add proper garbage collection (not reference counting like Perl5
    does, but real gc, which I hope we will get in Perl6), and you
    also dispense with memory leaks once and for all.

    It's coming. Applications are *beginning* to be developed in this
    next generation of languages, but it takes time, because all the
    existing apps are mostly C and C++, and you have to throw them out
    and start over, which nobody wants to do for obvious reasons.

    There will of course always be room for a certain amount of
    inherently low-level code written in C or one of its kin: code
    that absolutely can't spare a nanosecond per run, code that has
    to run on the bare metal (kernels, bootloaders, ...), and code
    needed to bootstrap the VHLL tools (compilers and whatnot). But
    when C is no more common than assembly language is today, then
    you'll be done with random crashes.

    Applications will of course still have bugs -- circumstances
    wherein they don't perform as they ought. And you'll still have
    hangs, because nobody's figured out how to design a compiler or
    interpreter that can detect an infinite loop, and nobody except
    Mel[1] has coded up an implementation for completing an infinite
    loop and passing on to what follows. Perhaps quantum computing
    will one day change this, but that's outside of the forseeable
    future. But crashes of the sort where the app suddenly terminates
    should be mostly a thing of the past within twenty years, ten if
    we're quite lucky.

    [1] Google for "The Story of Mel, A Real Programmer".
  • by ncc74656 ( 45571 ) <scott@alfter.us> on Tuesday May 20, 2003 @10:20PM (#6003963) Homepage Journal
    From 1986 or '87 until about '94 or '95, all my word-processing/database/spreadsheet stuff got done on an Apple II (first a IIe, then a IIGS) running several versions of AppleWorks, up to v3.0. Even with some 3rd-party addons (mainly SuperFonts and UltraMacros), AppleWorks never crashed.

    I'm willing to concede that the codebase was considerably smaller. It had to be, in order to produce an executable that would fit in 800K (the size of a 3.5" double-density floppy) and would run reasonably well on a 1-MHz 8-bit processor with as little as 128K of RAM...but I don't find myself doing sufficiently more advanced stuff in Word or Excel than I used to do in AppleWorks (actually, AppleWorks was probably doing more sophisticated stuff with UltraMacros added to it). I would be willing to wager that 95% of Office users use no more than 5-10% of its features. All that extra code that keeps getting added in with every new release means there's that much less time spent making sure the core functionality (and all of the chrome added in previous releases) is bug-free.

    (I'll admit that I haven't had much trouble with Office...but then you've noticed that I don't push it particularly hard either.)

  • by ConceptJunkie ( 24823 ) on Tuesday May 20, 2003 @10:20PM (#6003968) Homepage Journal
    ...because we aren't willing to wait for, or pay for, software that has been adequately tested to any reasonable level of reliability.

    With something like Windows XP, no amount of testing could eliminate every conceivable bug, but there is no doubt in my mind that Microsoft, along with almost every other software company in the world, rushes poorly designed, inadequately tested products to market to meet customer demand.

    Remember, a product's success is due largely to a check list of features created by the marketing people. A product with 90% reliability and 100 features will sell better than a product with 98% reliability and 10 features. Otherwise, how can you explain the success of Microsoft Office? OK, bad example, MS Office is successful because it's been bundled with so much hardware, but you see my point.

    The bottom line is computers are now a commodity. They have become so ubiquitous and cheap that I can go down to the Salvation Army and purchase what would have been considered a supercomputer 10 years ago, for $50. Software is quikly reaching the same state. How much software can you buy for $10 or less? A lot. And not all of it is bad, though most is. On the other hand, you can drop hundreds or thousands of dollars on software that is just as quirky, hard to use and even just as buggy.

    Here's the thing that always interested me. Why don't console games crash? I'm sure they do sometimes, but I've got a Dreamcast and about 50 games. I've seen a small bug here and there, but I've never seen the machine blue-screen or whatever DC's do when the OS lunches itself. I realize that the standardized hardware platform has a lot to do with it, but games are every bit as complex as other software, perhaps more so. So why don't these games crash? Well, if they did, they would never sell. I'm sure there would be /. articles and Ars Technica articles for weeks if a console game came out that crashed, but when PC games are released that have those kinds of problems, it's hardly news.

    Kinda makes me wonder...

  • by dghcasp ( 459766 ) on Tuesday May 20, 2003 @10:22PM (#6003973)
    Think of the systems used by the telcos, or NASA. Are they perfect? No, but they are much, much more stable than Win32, or Mac, or Linux. The reason is simple, the owners demand them to be.

    This reminds me of a story I read in the internal magazine of a telecomunications equipment supplier that I used to work for. It was about an international toll switch somewhere in the U.K. that had been up for 17 years (or something extreme like that.) Furthermore, this included having all of its hardware upgraded and replaced. Twice.

    Just stop and think about that for a while in PC terms... "I replaced my motherboard with the power on without rebooting my system, while it was serving 10,000 web pages a second."

    Granted, this is a higher level of hardware with full redundancy, but it still boggles my mind.

  • by Chris Carollo ( 251937 ) on Tuesday May 20, 2003 @10:25PM (#6004001)
    Jets are complex too. So is the Space Shuttle. Cruise ships. CARS are pretty complex.
    Then again, if one of the overhead bin latches get stuck, or my overhead light burns out, or my seatbelt gets stuck, the entire plane or car doesn't instantly explode. The issue isn't complexity, it's fragility.

    Software is incomprehensibly fragile -- any single thing can cause a crash, taking the whole system or application down. And even those critical parts of things like airplanes have multiple redundancies, something that's hard to build into software. You can do things like catching exceptions, but you typically can't recover as gracefully as if there was never a problem at all.

    The shuttle is actually not a bad analogy -- it's also very fragile due to the stresses it endures. And we've effectively had two crashes in 100 runs. Most software is more stable than that.
  • by LazloToth ( 623604 ) on Tuesday May 20, 2003 @10:35PM (#6004060)
    These are true statements:

    -In our server room, which, admittedly, is a little crowded, a Windows 95 box was disconnected from the network but accidently left running. It stayed up for more than a year. No load, of course, but it stayed up. It made the hair on my neck stand on end.

    -In the same server room, a clone PC running Suse Linux 7.0 ran for just short of two years without a reboot. It would have gone longer had the old, 2 gig hard disk not died a clunking death. Fortunately, the web data was on a different disk. We loaded another system drive and had our departmental web/Samba server up in minutes.

    -We have a Compaq Prosignia 200 running NT4 and Raptor 6.0 Firewall. It has seen uptimes exceeding 9 months on more than one occasion. Would have gone longer, I think, were it not for some memory leaks in the Raptor management console snap-in.

    I point these things out so as to ask the question: how stable is stable? Hey, *nix has been my passion for years, but I've seen for myself that NT4 and, now, Windows 2000, can perform well if they are set up by someone who knows what s/he is doing. I believe impressive uptimes can be attributed to many things, but I do not always blame the OS code for the bad things that happen.We all know what bad firmware and drivers can do. I'll take NT4 on an Alphaserver over Linux on a Packard-Bell any day.

    Of course, Linux on the Alphaserver is better yet . . . . : )

  • Re:Microsoft (Score:4, Interesting)

    by CognitivelyDistorted ( 669160 ) on Tuesday May 20, 2003 @10:47PM (#6004140)
    Yes, NT5+ is very stable. MS is working on the driver problem. SLAM [microsoft.com] is a tool for verifying drivers. Given a requirement, e.g., after acquiring a kernel lock the driver must release it exactly once on all control paths, and some driver source code, SLAM can find all the ways the driver can fail the requirement. They have specifications for various driver types and are using them to test some drivers. It's a research project by the Software Development Tools group in MSR, but they're working on getting it stable and powerful enough to verify more drivers. If they can get it to work well enough, they'll supply it to hardware vendors.
  • by Anonymous Coward on Tuesday May 20, 2003 @10:51PM (#6004166)
    Byte magazine had a great article [byte.com] too, with a nicely done chart [byte.com].

    Memorable line: Men are from Mars. Women are from Venus. Computers are from hell.
  • Re:and (Score:2, Interesting)

    by |Cozmo| ( 20603 ) on Tuesday May 20, 2003 @10:53PM (#6004177) Homepage
    I don't remember having any games screw up my system since I stopped playing half-life. I built a new system a couple months ago and it hasn't crashed once.
    I had a win98 system last a bit past 30 days with regular use once and it was terribly hosed by the time I rebooted. Win2k or XP can last until your power goes out, you kick the surge protector, or you need to reboot to install drivers/software/hotfixes ;)
  • by deathcow ( 455995 ) on Tuesday May 20, 2003 @10:55PM (#6004189)
    We had a Cisco router wigging out the other week. Our Network Admin decided to reset it, and it offered this up:

    Kodiak_Rtr uptime is 6 years, 9 weeks, 3 days, 10 hours, 43 minutes

    System restarted by power-on

  • by shaitand ( 626655 ) on Tuesday May 20, 2003 @11:14PM (#6004294) Journal
    Although I certainly have no problems with apple or macs in general, I do have a problem with their user interfaces. Personally I don't think not giving the user the option of defining any settings which could cause malfunction to be the answer. The reason? Well it's pretty simple, when set properly those same settings give flexibility, added functionality, and performance (at least one, sometimes two, often all three of the above).
  • by ChrisCampbell47 ( 181542 ) on Tuesday May 20, 2003 @11:17PM (#6004308)
    Interesting that the first two posts in the thread had English syntax errors in their first sentences. We can still understand it, but compilers/CPUs would have problems. Seems that the real problem is the difference in the natures of wetware and hardware.

    Actually, "syntax errors" like this DO cause a problem for wetware systems -- they cause the brain (well, mine at least) to kind of glaze over and take the remainder of the sentence/thought much less seriously. Kind of like aborting/returning out of a subroutine.

    Here in the Slashdot world of "definately" and "righting", I've learned that any posted comment that makes high-school-level grammatical or spelling errors is not worth my time and I immediately skip the post. I've been doing this quite rigorously lately -- blah blah blah "seperate" PAGE DOWN.

    OK now, everybody nod and think I'm talking about someone else's posts ...

  • by Blkdeath ( 530393 ) on Tuesday May 20, 2003 @11:31PM (#6004377) Homepage
    Actually... BMW has had some problems with car-puters crashing, causing serious problems with the car's functionality.

    Guess whose OS they used.

    Call it reciprocity. ;)

    Otherwise, the functionality of cars and their safety mechanisms have evolved, and that evolution has made it from the $79,990 cars to the $13,990 cars that are being mass produced. Otherwise, who'd feel comfortable driving 160KPH in something that costs a mere 3-4 months' salary?

    Probably one of the sources of problem in the software development industry, I'd say, is duplication of effort. Rather than take existing code and improve upon it, people seem either egotistically or somehow legally (copyright++) bound to constantly re-invent the wheel.

    The GPL development model is great in theory, however in practise it tends to lead to "My camp is better than your camp" rather than "Our camp is approaching perfection".

  • by shaitand ( 626655 ) on Tuesday May 20, 2003 @11:35PM (#6004399) Journal
    Intending this as a genuine comment and not a shameless chance to bash microsoft or vote pro linux. This is where an open source system such as linux excels... it does so because alot of the same code that goes into making those critical platforms goes into the main stream releases, thus carrying over to the average user at home. This is a big part of why linux is so stable even on desktops.

    The desktop applications for linux are less stable but benefit from similar developement models and sometimes having the same coders involved so tend to be more stable than the competition. After all, after a hard day of coding stable server code, that programmer goes home and listens to mp3's. He runs the same platform at home that he uses at work (linux)... but at home he's running the gui and playing mp3's, one day he decides to scratch and itch because a feature is missing he would like. This gets him looking at the code, and like many other coders he can't stand to see instability... especially since that is what he does. He invariable fixes things and adds a patch for whatever feature he wanted to add.

    ok now here comes the shameless plug:

    This is why I believe linux will continue to grow and be accepted as the dominate platform. Current software in other areas is stagnating, it has for a while, some applications cannot significantly improve without major revamps in technology (IM's come to mind), a slow steady approach to development (and yes it is slow considering the number of manhours spent on opensource... there are just so many more men to spend hours that it amounts to rapid development.) leads to fewer bugs in the final code that faces the test of time... more code faces the test of time because it was done right (or closer to right) the first time and thus gets the bugs ironed out of it. Open source development is free... it has no pressure to release final versions, no pressure to release features until they are stable... In the course of time (maybe 5 yrs, maybe 50) it's an eventuality that this will win because it cannot be killed, there is nothing to fight after all, no business to put out.

  • by Anonymous Coward on Tuesday May 20, 2003 @11:46PM (#6004456)
    My experience is quite different. I have 7 computers that run 24/7 playing a popular MMORPG.*

    The systems themselves are a collection of spare parts and old workstation purchased off Ebay. At the low end is a typically configured Pentium II 400 and at the high end is a typically configured Duron 900. All of them are running Windows 98 SE.

    The game and the scripts keep all the systems at or very near 100% cpu utilization at nearly all times. The only time they are not working is when the game servers are down or my internet connection is down. Both of those are not very frequent.

    Even under that somewhat heavy load, I go months without rebooting them. In fact, the only time they are rebooted is when I lose power or I'm leaving for on an extended vacation. One of them is an exception to that rule and has blue screened on occasion, perhaps 3 or 4 times in the past year.

    Of course, on the system I actually use(not one of the seven described above), I left windows 98 a long time ago and I remember being plagued with BSOD's, lock ups, and constant reboots to keep things working.

    What explains these two opposing performance comparisons? I have no idea really, but I have a guess...

    On systems I use, I am constantly adding/installing software and hardware. On the systems that just macro 24/7, I don't do any of that. There is nothing but the bare essentials installed. Perhaps that has something to do with it.

    Anyhow, back to the main point, I disagree that Windows based systems crash even if they are not doing anything. I have a whole bunch that work hard all day and they don't have that problem.

    *No, that wasn't a typo, scripts on the computers "play" the game. It is known as macroig in the MMORPG world.
  • by Anonymous Coward on Tuesday May 20, 2003 @11:47PM (#6004466)
    You may not like it, but the original "because it doesn't matter to you" comment is actually correct in most respects. Even with all the money and people at Microsoft's disposal, they can only do so much in each product release. That means they need to make trade-offs. Fairly often, the trade-off is between quantity and quality.

    MS has a long history of success built on choosing to add new features rather than focusing on quality. This strategy started long before MS had any monopoly power and has always been successful. At least until recently, people have been willing to *pay* for new features while also fairly willing to overlook mediocre quality. This is understandable given the rapid advance in computer capabilities.

    But that was the past. I actually think this strategy could be the eventual downfall of MS. It is so ingrained in their culture, and with such a dominant market share there isn't much pressure to change. But eventually, MS will find a hostile market if they are not able to adapt to consumers who put more importance on quality.
  • [OT] your sig (Score:2, Interesting)

    by Mr Z ( 6791 ) on Tuesday May 20, 2003 @11:48PM (#6004470) Homepage Journal

    About your sig: Actually, I currently write games on a machine with about 1.5K of memory and an 895kHz CPU. [spatula-city.org] And I am grateful.

    --Joe
  • Re:Human Error (Score:5, Interesting)

    by JohnsonWax ( 195390 ) on Wednesday May 21, 2003 @12:00AM (#6004548)
    "All programs (for the most part) must be written by people. ... Computers crash because people cant catch that one little fatal error in 10,000 lines of code."

    All bridges (for the most part) must be built by people. Bridges collapse because people can't catch that one little fatal error in one or two million components.

    The shit coders put out there, I swear... The reason software crashes is that by-and-large it's hacked together, not engineered. You hack a bridge together, and yes, it'll fail. You engineer software, and yes, it will run reliably. It's not fun to do - no easter eggs, no cool tricks, no cramming features in weeks before ship.

    I'm stunned at the amount of code that goes out that was written by interns, by unexperienced coders, by people that just don't have a clue. The software industry really has no concept of best practices, no leadership, no authority body. The fact that buffer overflows still happen is stunning.

    It's not small projects that work well because out of dumb luck they happen to not fail, or larger projects that work okay because we have 34,000 people looking at the code. If that's 'best practices', then we're doomed.

    "Mozilla (www.mozilla.org) has a feedback option to help them debug, many software companies are including this."

    Uh huh. Let's translate that to my car: "Hi. Yeah, I'd like to report a bug. I have a Saturn Ion, version 1.1v4. Yeah, when I turn on the left turn signal and then turn on the lights, the car catches on fire. You might want to fix that in the next version. Just though you might want to know. Bye."
  • by freeweed ( 309734 ) on Wednesday May 21, 2003 @12:06AM (#6004582)
    Windows 9x actually has a bug in it that would lock the computer after 46 days of uptime, but it took years to catch it because no one ever got close to that mark.

    Bullshit, bullshit, bullshit. This urban legend deserved to die years ago.

    I ran several Windows95 OSR2 systems with uptimes approaching 90+ days, and had no problems with them locking up. Sure, 9x wasn't HAPPY with this, and if you ran a lot of applications odds are you won't hit this, but I did it many times in my former employment.

    When the '45 days' (as I heard it first) rumor started going around, I set up a bunch of idle 95 machines for fun, and on days 45-50 watched for anything going on. Not one crashed.

    Hell, for all I know, Microsoft themselves are reporting this, just to cover their asses based on some average uptime limit they worked out, but I will swear on a stack of bibles that I've had Win95 machines go at least twice this supposed limit without locking up.
  • Re:Try the UML (Score:4, Interesting)

    by Billly Gates ( 198444 ) on Wednesday May 21, 2003 @12:18AM (#6004645) Journal
    Architects and engineers use extremely detailed drawings. Have you ever taken any drafting courses in Highschool or College? Every piece and even the size of every screw is accurately detailed as possible. It takes forever to get anything done because the precsion is more important. It drives some people like myself crazy.

    The blueprint is the actual prototype of the product being designed.

    The problem is if you document every step and algorthim in exact detail you will spend weeks, months, and yes years without a single line of code!

    This is unacceptable in today's bussiness world where all the projects are due yesterday and your bosses demand percentage wise how much of the code is being developed. If you spend a month planning and not a single line of code is developered your canned.

    My father took over a project where a clueless IT manager got because she slept with the CIO. Anyway she went to a seminar which talked about over flowcharting everything would be the wave of the future. She then had all the programers draft every single algorithm to the very if statements themselves on paper. After 4 months and not a single line of code my old man took over. From there he finished the project within 3 weeks!

    My point is that drafting programs is too time consuming. In a way your drawing is the program and changes can be made as you go. Its essential to have good flowcharts and notes but they need to be generalized. If there is an error in it you can delete the line and fix it. In engineering you would have to dissamble the actual product and redesign it. Because they would cost time and money it is not accepted. In software that limitation is not there or as sevre.

    UML tries to be the blueprint of all software programs but instead is only used to explain certain subsystems and algorithms. Mostly flowcharts are used so all the developers have a sense on how the program will work and how to invoke different pieces of the program.

    I do not think this going to change unless there is a quick and easy way to debug UML charts. Logic errors are killer and if its perfect I suppose you can compile the uml directly into the language of choice.

    Hmmm infact this might be the way to do it in the future.

  • by doorbot.com ( 184378 ) on Wednesday May 21, 2003 @12:54AM (#6004811) Journal
    I wish there was consumer demand for low power destop computing.

    My mail/web server would run fine off of something rediculously small, like a Sharp Zaurus. Here are my requirements, and I will pay for one if it is available.

    1. Non-x86 hardware designed for lower power -- extra speed is nice, but not required; Pentium 200 speeds or better
    2. Low power, with 9V or AA-based battery backup (changeable while system is running)
    3. 3" - 4" LCD (with manual switch to turn off) at 640 x 480, or some sort of LED array/VFD, because all I really need is a low power terminal supporting 80 x 24 characters.
    4. USB port for keyboard
    5. Serial port
    6. Two or three 10/100 NICs
    7. Full (Debian) Linux support of all hardware
    8. Some sort of expansion (PCMCIA maybe, or via USB)
    9. Support for CompactFlash for backups
    10. Hardware encryption would be a nice goodie but not required


    Yes, I could probably build this with PC104 components, but I want a pre-built product, and I'm willing to pay for it (maybe $300 - $400).
  • Re:and (Score:4, Interesting)

    by sheldon ( 2322 ) on Wednesday May 21, 2003 @01:08AM (#6004865)
    Interesting.

    I play RTCW quite a bit on my WinXP box with no issues. RTCW occasionally crashes, and I have to hit CTRL-ALT-DEL to bring up task manager and kill it, but the system remains stable.

    When I first built this box I had some issues, after a while it would lock up. Turned out it was because the video card was overheating. The system itself wasn't locking up, just the video card. Put the system in a new Antec SX-835II case with better cooling and haven't had a problem since.
  • by pwl256 ( 317110 ) on Wednesday May 21, 2003 @04:21AM (#6005561)
    While the constraints may be cost etc perhaps something I took from a PL/1 book - ;-0 years ago may be relevant.
    'The Meaning of Correctness

    1. The program contains no syntax errors that can be detected by the compiler.
    2. As for 1 and it can be run.
    3. There exists a set of test data for which the program will yield the correct answer
    4. For a typical ( ie reasonable) set of data the program return the right answer
    5. For a deliberately difficult set of data the program returns the right answer.
    6. For all sets of data, valid with respect to the specification, the program restuns the right answer
    7. For all possible sets of valid test data, and for all likely conditions of erroneous input the program returns a correct ( or at least reasonable) answer.
    8. For all possible input the program gives the correct, or reasonable answers.

    Most programmers work at level 3 or 4
    Users at 8.'

    (I am sorry but I have lost the reference to the original book)
  • by smeenz ( 652345 ) on Wednesday May 21, 2003 @04:27AM (#6005571) Homepage
    In the vast majority of cases, it's simply not economic to release bug-free code.

    1. Any programmer knows that 90% of the code is written in the first 90% of the time, and the other 10% of the code is written in the other 90% of the time. (no typo). That is to say, it takes a lot more time, effort, and hence money, to move a project from "working well" to "working perfectly".

    2. Many software companies these days make very little profit on the 1.0 release of their software, and make huge amounts of money through ongoing support charges. Microsoft is a classic example of this type of company.

    3. If you release a piece of software that works really well, does everything the users want, and never crashes or causes trouble, then you may as well pack up shop and go out of business quietly. The unfortunate truth is that nobody is going to buy version 2 if they can do everything they want with version 1, and they're not getting constantly frustrated by crashes. The only carrot you have in this situation is to think up some really great ideas for version 2 in order to encourage people to upgrade - In fact, some of those ideas may have been deliberately left out of version 1 just so that they could be added later. Version 3 is more difficult still, and version 5 is right out. By comparison - how many versions of office are we up to now ?

    A notable except to this business model is the games writers. Companies like valve and id software consistantly produce very near to bug-free code that works well and generally impresses the masses.

    In all the years since half-life was released, there have been relatively few patches and fixes, and many of those were to prevent ingenious new methods of cheating, or to add support for hardware that didn't exist when the game was first released. The unreal engine had a similar history.

    People buy new games because they crave the excitement or challange of exploring and interacting with it. That's not something that could really be said about excel or word, so those sorts of products have to rely on the "draw out the profit over many releases" strategy described above.

    Another (big) factor is people's expectations - most people expect that word will crash from time to time, and given microsoft's past history, they have little reason to expect that to change. On the other hand, gamers have an expectation that the latest game from id software will be as solid as a rock, and that the few problems that do crop up after the release will be fixed quickly.

    If a games company didn't spend that "other" 90% on the last 10% of development, and released something that crashed as often as explorer, their reputation would be mud within days, and people would stop buying their games.

    And lastly, choice.

    People have a choice as to which games they want to buy. It's a competitive market out there, with many people having little disposible income to spend on games. On the other hand, despite what linux advocates (I can't believe I'm saying this on slashdot) say, most people use MS apps and operating systems because they don't have a choice - say due to corporate rules.

    You might think that it is the end user that gets the sharp end of the stick here, but the people that really get screwed are the dedicated and talented programmers, who are working for companies that don't care too much if they release code before it has been fully tested.

  • by anubi ( 640541 ) on Wednesday May 21, 2003 @05:03AM (#6005654) Journal
    "If you release a piece of software that works really well, does everything the users want, and never crashes or causes trouble, then you may as well pack up shop and go out of business quietly."
    Geez! You stated exactly what happened to me. The company I used to work for bought some really neat DOS software for circuit analysis, schematic capture, and PCB layout. It worked flawlessly. Very easy to use. No frustrating DRM/Licensing issues to deal with. User-definable libraries. Nice file structures. In short, what I would have done if I did it myself.

    When they transitioned from an Engineering Company to a Management Company, they surplussed all this neat software. Me, along with my software, was excessed. I was first in line to buy it from the company, being I knew exactly what it was and how I could run it on anything I could get my hands on. The company no longer exists, but I still run the software daily, albeit in another company.

    Here it is, nearly 20 years later. I *still* prefer to use these programs. They are blindingly fast on a Pentium, allow me to update their libraries with all the latest parts I use, and still work perfectly.

    By this time, I understand exactly what these programs do and am quite fast with them.. they are so familiar by now that I no longer have to concern myself with how to get the system to do what I want... now that I have finally perfected a simple DOS-based system thats ready for work about 13 seconds after I turn on power. I still fail to see what everyone is carrying-on about over these finicky new design softwares. I *try* to use them but soon become so frustrated with them that I keep reverting to the simple one.

    It kinda bugs me when I have way too many choices - like do I really care what font or centering options the resistor values show up in the schematic I am preparing to feed to the SPICE simulator or the PCB Layout proggie? Just put the value where I place it and I'm happy. I just want it done NOW. I don't wanna dicker with it. If its gonna get published, I'll dump it into a .DXF file and let the AutoCad and PhotoShop guys gussy it up all they want.

    See? There's an anecdotal evidence supporting your claim. They did the software right, and never sold another to me. All the companies that made the software are now out of business ( one got bought out, the other two are just gone.)

    The favorite concern of the company I now work with is that I am using completely unsupported software. But then, I used a completely unsupported hammer when I built my doghouse. Big deal. If it works, what do you need support for?

  • Paranoia (Score:3, Interesting)

    by Detritus ( 11846 ) on Wednesday May 21, 2003 @05:08AM (#6005669) Homepage
    Many years ago, I had the experience of reading the source code for the device drivers in a multi-user DEC operating system. It was very enlightening. The engineers who wrote the drivers assumed that all of the hardware was buggy, unreliable junk. They wrote code that expected the hardware to fail or lock up, and took the appropriate corrective action. If an operation timed out, the driver would reset the controller and reissue the command.

    UNIX had the opposite philosophy. The hardware was expected to work perfectly. This led to situations where a DEC operating system would run reliably on a particular machine for months at a time and UNIX would crash within minutes on the same hardware.

  • by tagevm ( 152391 ) on Wednesday May 21, 2003 @06:08AM (#6005838)
    I bet the piece of code causing this looks something like this: ... /*
    Check every second....
    Maybe GetTickCount wraps, but I don't care,
    something else will probably break before 49 days anyway
    */

    if (m_dwLastTick-GetTickCount())>1000)
    {
    DoSomeThingImportant();
    m_dwLastTick=GetTickCount();
    }

    GetTickCount returns the number of millisecs since reboot, after 49 days it will wrap and start over, so lazy programmers using code such as above will have a problem.
  • by jo_ham ( 604554 ) <joham999@noSpaM.gmail.com> on Wednesday May 21, 2003 @06:39AM (#6005941)
    I got fed up with just that sort of thing and changed computing platform. I'm not saying that the Mac never crashes, but it's certainly been a massive, massive step in the right direction.

    A quick trip to the terminal reports my uptime as "11:35AM up 57 days, 12:42..." This is by no means a long time by Unix standards, but for a laptop (iBook 600Mhz) that I use everyday, sleeping, waking, starting and stopping multiple programs, working on all sorts of stuff, burning CDs, browsing the net etc, I'd say it was very good.

    The longest I could go on my Windows 2000 box before I'd have to reset was about a week - it wouldn't crash, it would just get confused and start swapping icon images over, so Word would have the Excel icon, and so on.

    The only time I reboot my iBook is for system updates. Very few programs "Unexpectedly Quit" on me (Camino used to do it occasionally, every 2 weeks or so, but I'm using Safari right now). I've never had a kernel panic in 10.2.x (I had two in 10.1.5, but I traced it to the well known Classic environment and a USB device panic bug that was fixed).

    If you want your software to crash less, buy a Mac.
  • by tommck ( 69750 ) on Wednesday May 21, 2003 @08:26AM (#6006275) Homepage
    I'm tired of hearing this. There is nothing unsafe about C.


    I agree completely... This is the same kind of thinking that people use to try to outlaw guns... "If someone can use it to commit a crime, we should just eliminate them!".


    I would say that poor development, insufficient design, (obviously) insufficient testing and a focus on features rather than security are MUCH more to blame for software quality issues than which language was chosen for the implementation.


    I still think we should be able to moderate the whole article as a Troll...

    T

  • by wiredog ( 43288 ) on Wednesday May 21, 2003 @08:33AM (#6006318) Journal
    From the April 1998 [byte.com] (!) issue of Byte (back when it was an excellent printed magazine):

    "The fundamental concept of the personal computer was to make trade-offs that guaranteed PCs would crash more often...The first PCs cut corners in ways that horrified computer scientists at the time, but the idea was to make a computer that was more affordable and more compact."

    "Having 15 million lines of code isn't as bad as having 15 million lines of new code"

    Millions of PC users would be overjoyed with an MTBCF of just one day. Yet mainframes are big, complex systems that often have clusters of CPUs, gigab ytes of main memory, and thousands of users. What makes them so reliable?

    Mainframe experts say that it's a matter of priorities. ... . When a mainframe crashes, however, it's a major catastrophe. It's General Motors calling up IBM to demand answers.


    It's interesting how little has really changed in the past 5 years...

  • by TheCarp ( 96830 ) * <sjc@NospAM.carpanet.net> on Wednesday May 21, 2003 @10:17AM (#6007044) Homepage
    Impossible.

    How can a "user error" cause a crash. Software should do proper bounds checking and should act appropriately (which may mean giving and error message) no matter what input it is given.

    About the only crash due to user error that I can imagine really being due to user error would be the user killing the proicess with killall or pkill or its moral equivalent.

    Other than that, its just bad bounds checking and blaming it on user error is really bad form.

    Part of the problem IMNSHO is the commodity desktop. There are so many machines and they are all cheap and its more important to get the work done than it is to make sure the crash doens't ever happen again.

    On real systems, if the system crashes, crash dumps are sent off to the OS vendor and they track down the problem and fix it. I know, we have had to collect and send off crash dumps in the past.

    Each round of that makes the system more stable.

    Thats one of the advantages of Linux, and why there are some systems that don't crash (my linux boxes pretty much only crash when the power goes out, and the UPS battery drains). That is, that these OSs like Linuxs and BSD are used in real enviornments and there are people commited to fixing the problems... so even the lowly common desktop user reaps the benefits.

    See there is the differnce.. Windows, even the "server" versions grew out of a desktop OS with a desktop way of doing things. "Oh the server crashed, well lets reboot and hope it doesn't happen again", whereas Linux and BSD come from the land of the server down to the desktop "Oh the server cashed? get DEC on the phone" or "Get out those crash dumps".

    -Steve
  • by junkgoof ( 607894 ) on Wednesday May 21, 2003 @10:41AM (#6007215)
    I think this brings up a good point. Hardware may have improved, software development tools may have improved, the people writing software have gotten much worse. A few years ago most people who were in the computer industry were there because they knew something. Now they are there because they wanted money, some HR droid picked their CV out of a pile because of the acronyms, and some manager does not know enough to fire them. Layoffs haven't helped either, generally the knowldegable people with higher salaries get booted first. Security vulnerabilities are up (including old stuff that has not been patched) and successful projects are down.
  • by lpq ( 583377 ) on Saturday May 24, 2003 @06:34PM (#6032545) Homepage Journal
    As someone said before -- no product liability -- you have to pay money just to report a bug ...

    Training of Software Engineers. With point and click interfaces you have people with an average reading ability of a 5th grader writing code. Even hinting that someone wasn't a good writer of code was considered "unprofessional" at some workplaces (i.e. -- you are not a 'team player').

    Capitalism -- it's not cost effective to fix bugs until a customer finds them.
    Even in code for Secure OS's under Common Criteria CAPP/LSPP, vendors aren't required to fix bugs that are not discovered by the independant evaluator or the customer. So even if the product manager knows of bugs in the OS that is intended for 'high security' government projects, there is no law saying he has to list them or fix them (unless they are found by a 3rd party or the customer). Spending time fixing bugs that are NOT found by the customer is not only not cost-effective, it is considered not working on "assigned priorities" and can be grounds for lower reviews.

    This isn't pessimism -- it's reality. Quality doesn't pay when you can sell customers faulty products then charge the customers to fix the faulty product you sold them in the first place -- one might argue that it pays to have more bugs in the code -- you can charge more for service contracts and rack up more incidents that you then charge the customer, per incident, to handle.

  • by lpq ( 583377 ) on Saturday May 24, 2003 @08:50PM (#6033047) Homepage Journal
    When I was in college in Computer Science (how many programmers today have a formal degree in Computers, vs. say, a liberal arts degree?), Sophmore year, University of Midwest - CS201 - required for Computer Science majors -- beginning assembly language in Compass (CDC assembler).

    The price of perfection is taught early -- an early lesson was when for a final project we were to work with 2-3 other people to make a final program. The deadline was approaching and our program still wasn't running. Turning it in late was a letter grade drop/day. Two of us felt we were close and didn't want to turn in a non-running program. The third wanted to turn it in. They also felt that they'd done their part and there were no problems in it.

    The third turned in the project with his name on it. My partner and I spent another day cleaning up his code to get it to work and turned it in. We got a a "C" on the project, with a downgrade for bad coding practice in his section of the code and being a day late. He got a "B" even though it didn't work. In the final grade both he and my partner got "D"s while I got a "C", which sorta sucked for my major -- but it turns out that 60% of the class got "D"s and "E"s. Made a big stink about the course material being too difficult and the teacher made a public 'booboo' comment "It was the same material he'd taught before, it was just an exceptionally dumb class." Major ire of parents.

    Anyone who got a "D" or "E" had it stricken from their academic record. It as the only "C" I got in my comp-sci curriculum (str8 A's in 300 level and above classes). But on that project, I learned that deadlines were more important than code quality.

    Spin forward 15 years -- at small startup before Xmas. Deadline for demo approaching and I and other team member had parties to go to that evening. He was programming a DSP chip (he was a PhD wizard), and I was handling the drivers on the 286 DOS box. I checked my code backwards and forwards and he swore it couldn't be his stuff. Finally, I displayed output he was sending and it was 'wrong'. Unfortunately, my party had been out of town and I'd already missed the deadline for getting there because it was emphasized to me how important the project was to complete before leaving. When the problem was discovered in his code -- guess what -- he could't stay to fix it (I didn't
    know anything about the DSP chip he was using) because, the VP told me, he was married and his wife was gonna leave him if he missed the party (I don't think he was serious, but maybe). I had no such excuse -- only a partner who went to the party alone.

    Again -- what do I learn? Personal relationships take presidence over
    product and code quality, so far we have code quality below deadlines and below personal relationships (though that has more disappeared in the modern
    world).

    more later...
    -l
  • by lpq ( 583377 ) on Saturday May 24, 2003 @09:23PM (#6033166) Homepage Journal
    The core of the problem was delineated in the book "Weird Ideas That Work: 11 1/2 Practices for Promoting, Managing, and Sustaining Innovation" [amazon.com]. It it he makes the main point -- that those people who are most creative are the people who don't do things the "normal way". They are the 'loners' -- the 'slow adopters of company culture'. They aren't the team players and they are slow to be programmed with the company way of doing things. As a result, they see problems differently than those that have been trained in the "correct way" to do things.

    Those who spend time going to lunch, drinking beer together, palling around together -- they begin to think alike -- they develop synergy -- but they also develop a closed system. The ones who don't pal around come up with the completely off-the-wall ways of doing things because they haven't been indoctrinated into the 'normal way' of doing things. Quite often these ideas are shot down because of their eccentricity. But Steve Job's personal computer idea he presented to HP -- shot down by corportate culture was a brilliant success. He gives countless examples of the most brilliant people generally not being very good with "people skills".

    A correllary of this is that those who push for perfection far past the 'norm' are going to be unpopular outsiders -- they are the nit-pickers, the one's who aren't team players. Again, they might be the ones that would nit pick the code to perfection, given the chance, but the larger group says "enough" -- it's "good enough, it boots, let's ship it".

    In both instances the people most likely to increase quality in software are those that have the least political clout and are often least liked by their peers. Their peers often feel like the 'nitpicker' has a prideful, superiority complex -- overly prideful and sometimes go out of their way to sabotage work that might otherwise have turned the company around and saved millions.

    I specifically was involved in a group who had to choose between 2 vendors of Microsoft compatible software. I became the lone supporter of company B. I was adamantly opposed to "A" for reasons I coudn't articulate at the time -- my gut told me "A" was untrustworthy but I couldn't tell why. I was overruled and 4-5 months into the project "A" sued MS for non-cooperation effectively killing our project. It was too late to go with company "B" who's price had doubled now that they were the only game in town. It turns out "A" had been having trouble with MS all through the negotiations with us, but no one picked up on it. Reminding anyone of the decision made me decidedly unpopular. But it was precisely because I hadn't gone out and been wined and dined by "A" and hadn't formed a "Good 'ol boy" relationship with them that I could see something was amiss. It was precisely the fact that I wasn't a hobbnobber/ polical animal that I caught the 'off' vibes. Those who were "good team employees" went along with the majority decision and the 'friendly team "A" who came onsite to woo us. Its the same principle at work.

    Those who make the world work -- are also those most likely to compromise and most likely to compromise quality. It's because of their willingness to compromise that they are liked by many but it's the same compromise that resultes in compromised code -- both in terms of bugs and security.

    I sure as heck don't know the answer. Successful combinations are highlighted in the book mentioned above where one person knows the almost anti-personal nature of the 'idea' person, and handles the media and external interactions, but the it's rare to find groups that work well like that.

    It has often been said that the best software doesn't come out of committee but out of 1 or a few people -- while companies like to think that 9 women can have a baby in 1 month, it ends up more often that the 9 women argue over who

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (5) All right, who's the wiseguy who stuck this trigraph stuff in here?

Working...