Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Software Operating Systems

Why Do Computers Still Crash? 1533

geoff lane asks: "I've used computers for about 30 years and over that time their hardware reliability has improved (but not that much), but their software reliability has remained largely unchanged. Sometimes a company gets it right -- my Psion 3a has never crashed despite being switched on and in use for over five years, but my shiny new Zaurus crashed within a month of purchase (a hard reset losing all data was required to get it running again). Of course, there's no need to mention Microsoft's inability to create a stable system. So, why are modern operating systems still unable to deal with and recover from problems? Is the need for speed preventing the use of reliable software design techniques? Or is modern software just so complex that there is always another unexpected interaction that's not understood and not planned for? Are we using the wrong tools (such as C) which do not provide the facilities necessary to write safe software?" If we were to make computer crashes a thing of the past, what would we have to do, both in our software and in our operating systems, to make this come to pass?
This discussion has been archived. No new comments can be posted.

Why Do Computers Still Crash?

Comments Filter:
  • Simple ... (Score:4, Insightful)

    by Vilim ( 615798 ) <ryan.jabberwock@ca> on Tuesday May 20, 2003 @08:57PM (#6003281) Homepage
    Well, basically as software systems get more complex there is more things to go wrong. That is why I like the roll-your-own-kernel of linux. Don't compile the stuff you don't need and fewer things can break.
  • by zedge ( 133214 ) * on Tuesday May 20, 2003 @08:59PM (#6003295)
    Don't allow people to use languages that allow you to access memory not assigned to you or to access array positions that don't exist. This would fix 95% of software problems.
  • by drink85cent ( 558029 ) on Tuesday May 20, 2003 @08:59PM (#6003301)
    As I've always have heard with computers you can't prove something works, you can only prove it doesn't work. As long as there are an almost astronomical number of states a computer can be in, you can never test for every possible case.
  • Human Error (Score:5, Insightful)

    by Obscenity ( 661594 ) on Tuesday May 20, 2003 @09:00PM (#6003302) Homepage
    All programs (for the most part) must be written by people. People crash, they're buggy and they dont have a development team working on them. Computers crash because people cant catch that one little fatal error in 10,000 lines of code. Smaller programs are less succeptable to errors and big scary warning messages that make even the most world-hardend geek worried about his files. Yes, it's getting better with more and more people working on something at once. Mozilla (www.mozilla.org) has a feedback option to help them debug, many software companies are including this. But even with that in place, there is always that small human error, that will screw something up.
  • Re:Simple ... (Score:5, Insightful)

    by Transient0 ( 175617 ) on Tuesday May 20, 2003 @09:01PM (#6003316) Homepage
    More specifically... As hardware gets more complex, software gets more complex to fill the available space. More complex software not only means more things to go wrong but also means that the hardware never really gets a chance to outpace the needs of the software.

    Also, as I'm sure someone else will point out, it is very hard to right code that will not crash under any circumstances. Even if you are running a super-stripped down linux kernel in console mode on an Itanium, you can still get out of memory errors if someone behaves rudely with malloc().
  • It's bugs! (Score:5, Insightful)

    by madprof ( 4723 ) on Tuesday May 20, 2003 @09:01PM (#6003317)
    Applications are getting bigger. Code is growing in size. As computing power grows so does the complexity of the code that is written. This means there is a greater chance of bugs.
    You can write in any language you like, but bugs will still get through. Lack of proper planning, non-anticipation of working conditions etc. all combine.
    If you can make all programmers perfect then you may eliminate this problem. Otherwise I'm afraid we're going to be stuck with bugs.
  • Speed (Score:5, Insightful)

    by holophrastic ( 221104 ) on Tuesday May 20, 2003 @09:01PM (#6003318)
    Why spend time testing and debugging and designing for situations which are too rare to be profitable?

    I program my applications as properly as I can. But when the client wants to save money by testing it themselves and ignoring non-fatal bugs, they save money and are happy doing it.

    So in other words, economy.
  • by ziggy_zero ( 462010 ) on Tuesday May 20, 2003 @09:01PM (#6003320)
    ...I remember my teacher saying "Computers do exactly what they're told, not necessarily what you want them to do."

    I think the root of the problem is time. Microsoft doesn't have the time to spend going through every possible software scenario and interaction, or every possible hardware configuration. If they did do that, it would probably take a decade to pump out an operating system, and by that time hardware's changed, and it's a neverending cycle.....

    We just have to accept the fact that the freedom of using the hardware components we want and the software we want, all made by different people, will result in unexpected errors. I, for one, have come to grips with it.

  • by woodhouse ( 625329 ) on Tuesday May 20, 2003 @09:02PM (#6003327) Homepage
    Because reliability is inversely proportional to complexity. Systems these days are generally a lot more complex than those of 10 years ago, and in complex systems, bugs are much harder to find. The fact that you say stability hasn't changed is in fact a pretty impressive achievement if you consider how much more complex hardware and software is nowadays.
  • by Jeremi ( 14640 ) on Tuesday May 20, 2003 @09:02PM (#6003331) Homepage
    It's the need for new features. Every feature that gets added to a piece of software is a chance for a bug to creep in.


    Worse, as the number of features (and hence the amount of code and number of possible execution paths) increases, the ability of the programmer(s) to completely understand how the code works decreases -- so the chances of bugs being introduced doesn't just rise with each feature, it accelerates.


    The moral is: You can have a powerful system, a bug-free system, or an on-time system -- pick any two (at best).

  • by MightyTribble ( 126109 ) on Tuesday May 20, 2003 @09:03PM (#6003335)
    Some crashes aren't the fault of the OS. Bad RAM, flaky disk controllers, CPU with floating-point errors (Intel, I'm looking at *you*. Again. *cough* Itanium *cough*)... all can take down an OS desite flawless code.

    That said, some Enterprise-class *NIX (I'm specifically thinking of Solaris, but maybe AIX does this, too) can work around pretty much any hardware failure, given enough hardware to work with and attentive maintainence.
  • by mrjah ( 574093 ) on Tuesday May 20, 2003 @09:04PM (#6003354)
    ...but humankind has not.
  • by Anonymous Coward on Tuesday May 20, 2003 @09:05PM (#6003360)
    I'm writing thas as anon because I refuse to enter passwords on a computer I don't trust (internet cafe). But if you must know, my nick is TheMMaster.

    I think you misunderstand the problem, using pointers in C/C++ to unallocated memory only occurs with sloppy programing. It is not a "feature" of the language itself. You could easily do the same with visual basic even, if you wanted to. I DO admit that doing stuff wrong is easier with C/C++ (think of a copier in the wrong place).

    People that write bad code will always write bad code, the point is that C/C++ gives you more power to create better code than other programming languages do, because they are much more flexible.

    thanks for your time
  • It's expected. (Score:4, Insightful)

    by echucker ( 570962 ) on Tuesday May 20, 2003 @09:05PM (#6003361) Homepage
    We've lived with bugs for so long, they're a fact of life. They're accepted as part of the daily dealings with computers.
  • by T5 ( 308759 ) on Tuesday May 20, 2003 @09:05PM (#6003364)
    It's all about the bits. There are just so many more of them now, and a great deal more pressure in the marketplace to bring ever newer software and hardware to market. Back in the day of the IBM 360 and the VAX, even though we were mesmerized by the capabilities of these machines, they were years and years in the making, debugged much more thoroughly than we can hope for today, and much, much simpler.

    And let's not forget that this was the exclusive realm of the highly trained engineer, not some wannabe type that pervades the current service market. These guys knew these machines inside and out.
  • by Zach Garner ( 74342 ) on Tuesday May 20, 2003 @09:05PM (#6003366)
    Read "No Silver Bullet: Essence and Accident of Software Engineering" by Brooks. A copy can be found here [berkeley.edu].

    Software is extremely complex. Developed to handle all possible states is an enormous task. That, combined with market forces for commercial software and constraints on developer time and interest for free software, causes buggy, unreliable software.
  • by bravehamster ( 44836 ) on Tuesday May 20, 2003 @09:05PM (#6003368) Homepage Journal
    I've found in my years of repairing pc's that the majority of software problems have their root cause in hardware. A bad stick of memory, corrupt hard drive sectors, overheating components, cosmic radiation causing bit flips-all of these things cause random, bizarre errors. It's pretty easy to tell the difference too. Software errors are repeatable. The exact same situation should produce the exact same error. So all I'm trying to say is that I doubt we will ever reach the point that computers won't crash, because at some point there has to be interaction with the physical world. And no matter how perfect your program is, it's not going to survive a two year old stuffing pennies into the back of the power supply.

  • Re:Simple ... (Score:2, Insightful)

    by MikeXpop ( 614167 ) <mike@noSPAM.redcrowbar.com> on Tuesday May 20, 2003 @09:07PM (#6003384) Journal
    Exactly. I expect that when I enter in [9], [+], [3], [=] on my calculator it will respond with "12", not "ERROR". I expect if I do the same thing on the calculator.app it will do the same thing, agains sans-crash. However, if I'm trying to download a huge file while opening and closing lots of windows, programming some web pages, uploading them to the web, listening to some tunes, talk to 80 different people on AIM, and enjoying a flash animation at the same time, the computer might crash. After all, those are two very different things.
  • Microsoft (Score:5, Insightful)

    by eht ( 8912 ) on Tuesday May 20, 2003 @09:08PM (#6003388)
    Microsoft has made an extremely stable OS, it's called Windows 2000, as long as you use MS certified drivers the OS should never crash, individual programs may crash under Windows, but you can hardly blame Microsoft for that. I have had Windows machines with months of uptimes and no problems, went down 8 days ago due to power failure too long for my UPS's to handle, which also took down my FreeBSD machines, uptime is matched for all of them, and will one day again be measured in months.

    Yes I should probably patch some of my Windows machines, but I have my network configured in such a way that for the most part I don't need to worry and you don't have to worry about my network spewing forth slammer or other nasty junk.
  • Economics? (Score:5, Insightful)

    by iso ( 87585 ) <slash@warpze[ ]info ['ro.' in gap]> on Tuesday May 20, 2003 @09:09PM (#6003395) Homepage
    While it's not the whole story, something definitely has to be said about the fact that while people are willing to pay for features, they're rarely willing to pay more for stability. Quite frankly there's little economic incentive to make software that doesn't crash.

    If your market will put up with the ocassional crash, and never expects software to be bulletproof, why bother putting the effort into stability? Until people start putting their money into the more stable platforms, that's not going to change.
  • actually both (Score:1, Insightful)

    by Anonymous Coward on Tuesday May 20, 2003 @09:12PM (#6003431)
    First complexity is an issue. Today's systems are multiple orders of magnitude more complex than those of yesteryear. This by it's very nature causes problems. Also the complexity of today's programs means that large teams are required to get the work done. Every additional team member introduces a new variable into the equation bringing with him (or her) his own set of propensities for certain types of bugs until you have a whole universe of different bug types appearing in your product.

    Second the capitalist market rewards the "just good enough" software. In a pure economic sense stability is not near as important as new features, ease of use, time-to-market, etc. This may horrify engineers and software architects, Lord knows it horrifies me but it's the practical truth of the matter that ROI is largest on systems that are fast to market as long as crashes are completely catastrophic. Also we've solved the problem of crashes in more cost effective ways at the enterprise level. Rather than spending tons of money fixing all the small bugs we advocate backups. As a "backup" they make sense but all to often we use them to cover problems that could/should? be fixed in code. Again the economics say it's cheaper to buy hardware storage than to pay a skilled coder/architect. I don't know if it's "right" or not but at a very practical real-world level economics have to outweight perfect design/production. At least I hope the companies I hold stock in see it that way.

    There are many reasons that software sucks but you've nailed 2 of the biggest...complexity and economics. Let's hope the economics one holds...I don't want my rates coming down like the price of RAM any time soon ;-).
  • Re:Simple ... (Score:5, Insightful)

    by fishbowl ( 7759 ) on Tuesday May 20, 2003 @09:13PM (#6003436)

    "However, if I'm trying to download a huge file while opening and closing lots of windows, programming some web pages, uploading them to the web, listening to some tunes, talk to 80 different people on AIM, and enjoying a flash animation at the same time, the computer might crash."

    Was it, or was it not, designed to be used in this way? If it was not, why does the system let you try it?

  • by joshtimmons ( 241649 ) on Tuesday May 20, 2003 @09:13PM (#6003440) Homepage
    Sure, hardware is complex and today's software is huge, multi-featured, multithreaded, and event-driven and all of these factors make writing good software hard, but I think that the reason we don't see higher quality OS's is simply that the bar isn't set very high by the market leader. We tolerate applications that freeze, computers that need to be rebooted, or crash, etc. That low bar sets consumer expectations and the result is that companies (and programmers) only work to a certain level of reliability - then they work more on more features instead of more work on stability.
  • by Jeremi ( 14640 ) on Tuesday May 20, 2003 @09:14PM (#6003442) Homepage
    I've found in my years of repairing pc's that the majority of software problems have their root cause in hardware.


    Wow, your experiences are much different from mine, then. I'd say 95%+ of my computer problems are caused by software bugs.


    Software errors are repeatable. The exact same situation should produce the exact same error.


    For a significant percentage of software errors, that statement is false (at least misleading), because it's nearly impossible to reproduce "the exact same situation". For example, take any multithreaded program with a race condition bug -- the chances of the two threads getting the exact same time-slices on two different executions of the program are approximately zero. The result: a crash that happens only sometimes, at random, even given the exact same starting conditions.

  • by PseudononymousCoward ( 592417 ) on Tuesday May 20, 2003 @09:14PM (#6003445)
    The number of bugs is smaller. Think of the systems used by the telcos, or NASA. Are they perfect? No, but they are much, much more stable than Win32, or Mac, or Linux. The reason is simple, the owners demand them to be.

    There are costs associated with fixing bugs and reducing crashes. The more stable an operating system is to be, the more time and money that must be devoted to its design and implementation. PC users are not willing to pay this amount for stability, either in explicit cost, or in hardware restrictions or in trade-offs for other features.

    As Linux evolves over time, its stability will always improve, but it may still never reach the stability of, say, VMS. Why? Because even with the open source model of development, there are still tradeoffs to be made, tradeoffs between new features and stability, mostly. And successive bugs are harder and harder to fix, requiring greater and greater amounts of time. At some point, the community/individual decides that they would rather spend their time going after some lower-hanging fruit.

    Just my $0.02

    Actually, IAAE.
  • Uhhh.... (Score:3, Insightful)

    by swagr ( 244747 ) on Tuesday May 20, 2003 @09:14PM (#6003449) Homepage
    I'm lazy so I haven't bothere to read what others have said. At the risk of repeating what others may have said:

    Isn't this just a matter of economics?
    I bet if you get everyone on the planet, and every company to purchace software solely by merit of stability, you'll start to see a lot more stable software. But as long as people are shopping for *featureful* apps, *fun* games, and eye candy, it's not going to happen.
  • Obligatory anti-MS (Score:5, Insightful)

    by cptgrudge ( 177113 ) on Tuesday May 20, 2003 @09:18PM (#6003500) Journal
    Of course, there's no need to mention Microsoft's inability to create a stable system.

    What exactly is the purpose behind this? Why was it put in here? People are going to need to grow up if people in "our" circle want to be taken seriously. I've used Windows 2000 and Windows XP both. They crash as much as my Red Hat and Debian boxes do. Never. They are all rock solid.

    I work for a public school system. We have a class at the High School that teaches and certifies for A+ (I know, I know). They have all sorts of problems getting stuff to work and to get a system stable. In Windows and Linux.

    It isn't because they are high schoolers.

    It isn't because they are "just learning".

    It's because they buy really shitty hardware. They look for the best cost, and they get their hardware from some loser manufacturer that has fucked up drivers and horrible quality control.

    Properly maintained boxes with quality hardware in them just don't crash anymore. Programs maybe, but not systems.

    Christ, people, this has been beat to death! Microsoft has a great product for an OS now! Get back to making something better than them instead trying to convince yourself that Microsoft is delusional.

    Mod me Flamebait, I don't care.

  • Crash? What crash?

    up 582 days


    Reboot? What reboot?

    Now, when was the last time you tested those init scripts? :)

    -= Stefan
  • Three Words. (Score:2, Insightful)

    by coday ( 628350 ) on Tuesday May 20, 2003 @09:19PM (#6003504)
    "Time To Market". For commercial software developers they are always trying to "balance", quality and getting into the market ASAP. Unfortunately MS (and others) have made it acceptable to release service packs after the "final" product has already shipped. Get it out there now, fix it later is commonplace.
  • by pnatural ( 59329 ) on Tuesday May 20, 2003 @09:22PM (#6003538)
    But this shouldn't be an issue. If your HAL is done properly, there is no possibility of crashes with different software/hardware combinations, because the hardware doesn't matter. If libraries etc are managed properly, and memory space is isolated properly, then there should be no software-software issues.

    And this, ladies and germs, is precisely why computers crash. One system depends on another, and each layer is presumed to be solid. It's the presumption that things at the lower level cannot go wrong that gets most coders into deep do-do.

    The reactionary solution is to code defensivly. Defensive programming has it's place, but it's rarely done correctly (IMO) and it leads to cruft and maintainance nightmares. The solution (again IMO) is to account for failures at the design level.
  • Why (Score:3, Insightful)

    by pjdepasq ( 214609 ) on Tuesday May 20, 2003 @09:25PM (#6003556)
    Massive complexity (even for simple apps) + enless possibilities of user interactions + rush to market + no sliver bullet = likelyhood of crashing
  • Re:Simple ... (Score:3, Insightful)

    by Fulcrum of Evil ( 560260 ) on Tuesday May 20, 2003 @09:25PM (#6003560)

    Even if you are running a super-stripped down linux kernel in console mode on an Itanium, you can still get out of memory errors if someone behaves rudely with malloc().

    It's not crashing if you handle the error gracefully. Sure, the app crashes, but the system remains stable. Now, if you run an embedded system of some sort, you'll be writing that app, and being rude with malloc() is a no-no.

  • by Dr. Bent ( 533421 ) <<ben> <at> <int.com>> on Tuesday May 20, 2003 @09:26PM (#6003564) Homepage
    Back in the Middle Ages, when the Catholic Church wanted a Cathedral built, they would pay a bunch of Freemasons to do it. The Freemasons viewed themselves as creative artisans, and they closely guarded the secrets they used to construct these impressive houses of worship.

    The method they used, however, was less than impressive. Typically, they would start with a general design, and piece together stone and mortar until something collapsed, which happened quite [thinkquest.org] often [heritage.me.uk]. Then they would patch the section that collapsed and keep on going until something else fell down, or they finished. Given the level of understanding with regards to Physics and Material Science, those Freemasons has no other choice than to build them this way.

    Now fast forward to the 21st century. The engineering disasters on par with those medieval collapses can be counted on one hand (Tacoma Narrows Bridge and the Hyatt Regency walkway collapse are the only two I can think of). This is directly due to the fact that a civil engineer can determine if a design is structurally sound before they build it.

    Contrast this with modern day software development. We can't even tell if a system is flawed after we build it, let alone before. So software gets written, deployed, and put into the marketplace that has no assurances whatsoever of actually doing what it's supposed to do (hence the 10,000 page EULA).

    You can't have Civil Engineers until you have Physics. And you won't have 100% bulletproof software until you have Software Engineering. And you won't have that until someone can figure out a way to prove that a given peice of software will perform as it's supposed to. JUnit [junit.org] is a step in the right direction, but there's still a long way to go. It's going to take a breakthrough on the order of Newton to make Software Engineering as reliable a discipline as Civil Engineering.
  • by WasterDave ( 20047 ) <davep@z e d k e p.com> on Tuesday May 20, 2003 @09:28PM (#6003580)
    Thank you, at least somebody got it fucking right.

    Software doesn't have to crash, but for a given quantity of development resources there's a fairly simple tradeoff between feature-richness and stability.

    You want reliable? Strip back features left right and centre, design an elegant architecture, then unit test properly.

    Dave (in a ranty mood)
  • by Anonymous Coward on Tuesday May 20, 2003 @09:28PM (#6003582)
    The difference between a system that crashes and one that doesn't is the development and testing. When you buy something from M$ or a zaurus PDA you are getting a consumer product(i.e. cheap). It takes signifigantly more skilled developers and more testing (i.e. expensive) to make systems that don't crash, and consumers(including you) won't pay for them. You pay for features not stability. If you had said why does my Solaris, OpenVMS, engine management system, air traffic control system, life support sys etc. crash you might have had a point but you are talking about consumer products that emphasise features over stability so you got what you payed for.
  • So what Kernel is that you are running? Hmmm. If it's a linux box that would barely by 2.4. More likely 2.2.

    (Digging through my pile of vulnerabilities...)

    Say, could we get an address on that box? Muhuahahahaha

    My uptime is largely limited by kernel upgrades and the fact I cycle the power once per month to prevent the drive head from sticking.

  • ... is deadlock. Lets say you have two IO devices, for ease we'll call them disk drives, which give exclusive access. Process A grabs one disk drive, then loses their processor turn (happens many times per second). Process B grabs another disk drive, then requests the drive Process A has, and 'blocks'. Process A then requests the drive Process B has, and 'blocks'. This is a very simple example of deadlock. Now if one of these processes is an OS process, well too bad.

    There are mitigation strategies, but in short the all suck. You can constantly monitor every piece of hardware to see who has rights to what, and flat out deny access to people when a deadlock may occure. This is slow and isn't very nice to processes who now have to trap twice as many errors for many IO operations.
    Another method (in avoidance) is to require all processes to request hardware in a certain order. This prevents all deadlock, but is unrealistic to how a program may function, and may require a programmer to hold onto a hardware device for much longer than actually needed.
    The last method is perhaps worst of all: restrict every process to one hardware device at a time.

    Can you think of a better strategy? Patent it and make a few billion. The strategy taken by *nix, Mac and Windows is... well to completely ignore it because it very rarely happens, but as processors in the future become faster and faster, they are more apt to run more and more processes at once, increasing the problem significantly.

    Note this problem only occures for hold-and-wait devices. Usually any number of programs can read a file for instance, and there is no conflict at all. I find that Operating Systems Concepts (Silberschatz, Galvin, Gagne) covers this topic well, and plenty of other hotspots.
  • by Christopher Thomas ( 11717 ) on Tuesday May 20, 2003 @09:31PM (#6003611)
    There are several reasons why software keeps crashing, and they aren't going away any time soon. These reasons are:

    • You can't prove that most software works.

      Except for a restricted set of cases, you can't prove that a given piece of code works or doesn't work. A truly exhaustive set of tests would be impractical to perform, and formal proofs of correctness place strong limits on the type of code you can write and the environment in which you can write it.

      The result is that code is assumed correct when no bugs are found. This only means that there probably aren't _many_ bugs left. Thus, it may still crash (or have a security hole, or what-have-you).

    • Software is very complex.

      Software has been complex for a long time. It just tends to be bigger now. A larger system has more opportunities for unexpected high-level interactions between components, but even a smaller system will have enough twists and turns that formulating a really good test suite, or checking the code by inspection, is very difficult. Bugs will be missed. As was discussed above, many of these missed bugs will slip through testing and reach the world.

      • Nobody wants to pay for perfect software.

        As more effort is applied, you can get asymptotically closer to a bug-free system. However, this is far past the point of diminishing returns on the cost/benefit curve. For sufficiently constrained systems, you can even try proving it correct, but this tends to lead to cutting out a lot of functionality, speed, or both.

        In situations where reliability must be had at any cost - aerospace control systems, vehicle control systems, medical equipment - the money will exist to produce near-perfect code, but even then there are bugs that occasionally bite. With commercial software, the buyer would rather have an application that crashes now and then than an application that costs ten times as much and comes out several years later.


      Free and/or open software avoids some of this by staying in development longer, which allows more of the bugs to be caught, but even free and/or open software evolves. Every change brings new bugs to be squashed. As long as there are new types of software that we want, it isn't going to end.
  • by jabber01 ( 225154 ) on Tuesday May 20, 2003 @09:32PM (#6003614)
    Software crashes because it's complex, yes, but that's just part of it.

    Jets are complex too. So is the Space Shuttle. Cruise ships. CARS are pretty complex.

    While all these things do suffer catastrophic failure from time to time, it is far from the norm. Defective cars get recalled. Space shuttles ALL get grounded at the mere possibility of defect.

    If Q/A as stringent as this was applied to software, Microsoft - and in fact most of the software industry - would be out of business. Can you imagine a Windows recall?

    There is software out there that does not fail. Mind-bendingly complex software of the sort that "drives mere mortals mad" to boot. It is tested and retested, through all possible situations - not just the "likely 80%" of them. It is proved correct, and then verified again.

    COTS software is crap because neither the market nor the regulatory forces (such as they are, but that's a separate discussion) do not require it to be. Nor could they.

    A 747 Jumbo costs a whole lot, and while much of that cost is in the manufacture of the "big and complex thing" that it is, a significant chunk of that cost is also due to the design process, the testing, the modeling and simulation of it.

    Software is easy to scale, everyone can have a copy of the product once one is built. Cake. But spread out the cost of an error free design - tested to exhaustion, passed through V&V and so on, and you have a completely different market landscape with which to contend.

    Consumers, in the COTS context, don't mind "planned obsolescence" in their software. The current state of things proves this. People would rather have pretty features on a flaky system, than a solid system.
  • Re:Touchy subject (Score:2, Insightful)

    by cryosis ( 125841 ) <cryosis&llamaporn,org> on Tuesday May 20, 2003 @09:32PM (#6003621) Homepage
    Wait a second. You're saying that because you can't get a modern game that was designed to run on a damn near infinite number of hardware configurations plus a wide variety of software configurations and have that game always run perfectly every time that the programmers are sloppy?

    You can't expect that programmers predict every condition of every system that their software might run on. It would take decades for a new package to be released and even then it would be huge.

    How can you compare a Super Nintendo where all that games written for it are within very strict guidelines to a PC game where the programmer knows next to nothing about the systems that the game is going to be run on? The PC programmer can only try their best to quash the bugs that they can find. And there is no way that they can stop them all. I don't think that this is due to laziness on their part, it's more due to the fact that they're being expected to ship the product on time. If consumers would tolerate longer development schedules and higher program costs, then I think that software would get more stable. But everyone wants newer, faster, better *now*. Oh, and cheap too.

    The only way you could have complete software stability is to ensure that every system is exactly the same, down to the RAM manufacturer and the library versions. And you're never going to get that. Not everyone wants the same computer as Bob next door.
  • by anoopiyer ( 153786 ) on Tuesday May 20, 2003 @09:33PM (#6003627)
    Is the need for speed preventing the use of reliable software design techniques?

    No it's the need to keep costs low and time to market pressures that is preventing the use of reliable software design techniques.

    If all vendors had a large number of programmers and could select their own timeframe for releases, code would perhaps get more reliable.

    But on the other hand, Microsoft does have a large number of programmers, and they pretty much decide their own release schedules. So the above obviously doesn't hold for Microsoft. I guess that's because all their releases add new features, which introduce bugs...

    That's true for other vendors and other platforms too, isn't it? If all feature enhancements to say RedHat or SuSE Linux were stopped overnight and all future releases were only bug fixes, then said distro would be 100 percent bug-free at some hypothetical point in the future. But they have to add features to compete and evolve, and alas, said distro will never be bug-free.

    The low barriers to software updates also make software a less rigorous practice than hardware design. In hardware design, it takes millions of dollars to tape out a new rev of a chip to fix a bug; not to mention all the bad publicity the vendor gets (Intel fdiv bug, anyone?). Hence rigor in design and validation is much higher for hardware when compared to software.

  • by AmVidia HQ ( 572086 ) <{moc.em} {ta} {gnufg}> on Tuesday May 20, 2003 @09:35PM (#6003645) Homepage
    I'll paraphrase a comment that was said before, don't remember where i read it:

    "We've been building bridges for thousands of years, but only started writing software for a few decades."

    To combat increasing bugs in increasingly complex software, we need better tools. From the low level (more reliable memory handling) to the high level (more abstraction to reduce human programming errors) in software languages and compilers.

    You can't expect to build the Golden Gate with shovels, without expecting it to fall apart do you? (no, i'm not a terrorist)
  • STFP (Score:5, Insightful)

    by rice_burners_suck ( 243660 ) on Tuesday May 20, 2003 @09:36PM (#6003651)

    Software crashes because: Software is an immature field. Good software takes time. Software is unobvious to business managers who want the job done yesterday.

    Businessmen generally do not understand the internal workings of software. They are in a "big-picture" sort of world where software is but one pesky detail that will be taken care of. A computer crash that causes so many thousands of dollars in damages is no different than a truck crash. There is simply a risk to every element of business. If the risk is relatively low, the big shots don't care about it. Grocery stores in earthquake prone areas continue to place glass jars on the edges of shelves. Sure, there will be an earthquake one day, but it's a calculated part of business risk, and the risk is relatively low (the Earth doesn't shake every five minutes).

    Software bugs are a similar risk. It needs to look like it works. It needs to crash (and lose data) infrequently enough that the software will still sell. The business is not concerned with stamping out software bugs. It is concerned with releasing the software and making money. If the need arises, the business will improve the software and make more money. More often than not, this means adding features and shiny graphics. Fixing bugs is not very important to companies because customers do not pay for bug fixes. By the consumer, bugs are viewed as defects and their fixes should be free. By the company, bugs are viewed as a minor risk and fixing them would cost too much to justify. So you'll reboot once in a while or lose an hour's work once in a while. If it fries your hard disk, well, you should have backed up your data.

    Software is also one of the newest fields of human endeavor. Buildings have been built, ships have sailed and farms were farmed, all for thousands of years. No matter how much progress happens in these fields now, they have come so close to "perfection" that continued improvement serves to lower cost, improve safety and increase convenience. It's not a matter of, "Gee, how can we make buildings that actually stand without falling down three times a week?" It's just a matter of, "How wide, how deep, how tall and what color glass do you want on the outside?" You pay X dollars, wait Y months and voila, there is a building. But programming has been around for how long, 50 years? It's an increasingly important but very immature field.

    Buildings, bridges, ships... they're obvious. Everyone knows that if enough lifeboats aren't put on an unsinkable ship, it'll sink on purpose, just to piss you off. Everyone knows that if a 100 story building is going to stand, it has to take 10 years to build it. Everyone knows that a dam has to be pretty damn strong or it'll break and flood half the countryside. The building, shipyard and dam businesses aren't progressing at light speed. It is easy to justify 10 years for an outrageous building design because people KNOW what is involved. But software... Now that's totally unobvious. Software is an idea. It's abstract. It's a bunch of curse words that look like gobbledygook to the uninitiated. A bunch of "noise" characters on a broken terminal. Something done by a bunch of skinny, pimply faced geeks who got beat up in high school, took the ugly girl to prom and didn't have any friends. Why should a manager bother to care that fst_jejcl_reduce() causes a possible NULL pointer in the outer loop if case 32 is activated, which happens if the previous re-sort encountered two items with similar Amount fields, all of which will take a whole day to find and fix and will only happen, say, 2% of the times this particular feature is invoked by the user, which isn't that often? Why should anybody justify spending 2 years to develop some bulletproof program that can be banged out in 3 months, with bugs? What's the problem? Constructor workers are risking their lives, moving heavy things, sweating all day in the hot sun... While geeks are sitting in offices just punching crap on a keyboard. How difficult could it possibly be? To

  • by nick_davison ( 217681 ) on Tuesday May 20, 2003 @09:43PM (#6003683)
    ...I remember my teacher saying "Computers do exactly what they're told, not necessarily what you want them to do."

    D&D summed it up for me, years ago, with the wish spell: At its purest, it's too powerful to give to players - they'll unbalance and destroy the game. However, it can be balanced by giving them exactly what they ask for.

    "A demon lord approaches you out of the shadows."
    "I cast 'wish' - I wish for a +100 sword of almighty vorpal type slayingness."
    "The sword appears in the demon's hand. He thanks you for it, then hits you."

    Writing good code is like making a good wish. All you can do is try to cover as many eventualities as possible. The problem is, code gets really slow to run and even slower to write when you have to add out of bounds checks on every argument, error handling and reporting, garbage collection and all the rest. Even then, there'll always be some twisted scenario that you didn't know could exist so didn't plan for. So most people just give up, wish for the damn sword and hope the PC/Dungeon Master doesn't have too evil an imagination this time.
  • by Surazal ( 729 ) on Tuesday May 20, 2003 @09:44PM (#6003690) Homepage Journal
    Consumers, in the COTS context, don't mind "planned obsolescence" in their software. The current state of things proves this. People would rather have pretty features on a flaky system, than a solid system.

    This is not necessarily true... it's a bad generalization besides. Most people I work with in the IT industry would give their arm, leg, spleen, right lung, part of their left lung, lower intestine, and maybe even their occipital lobes for a reliable system that WORKS. Features are secondary.

    The "features over stability" myth is just that: a myth. Show me an admin that prefers only the latest and greatest in "features" and I'll show you an admin that will lose all her/his hair within six months (a little after all their hair turns white).

    Well, ok, I work primarily with IT people admittedly. Perhaps the folks in management are a little different. But I've noticed that IT people have ways of making management's lives miserable (in ways that are downright creative) when a bad decision is made with software purchases. I've done it, myself. ;^)
  • Re:Human Error (Score:5, Insightful)

    by Malcontent ( 40834 ) on Tuesday May 20, 2003 @09:45PM (#6003700)
    "People crash, they're buggy and they dont have a development team working on them. Computers crash because people cant catch that one little fatal error in 10,000 lines of code. "

    While this statement is true it's also a cop out. In the last twenty years there have been tremendous amount of advances in computer science and languages and yet everybody still programs in C.

    That is the reason why programs crash. Why don't people use languages that make programs more failsafe and make programmers more productive.

    It would be interesting to do a study of the "bugginess" of programs written in python, java, scheme, smalltak, lisp etc. My guess is that programs written in C crash the most.

    Where are all the programs written in scheme or smalltalk or ML?

    Use better languages and crash less.
  • Re:Speed (Score:5, Insightful)

    by mooman ( 9434 ) on Tuesday May 20, 2003 @09:48PM (#6003715) Homepage
    I think this is one of the few responses to hit upon the crux of the issue.

    Most of the other comments all revolve about code and computers or perhaps even the human nature of the programmer itself but I think they all neglect the "one ring that rules them all": Management.

    I'm not quite an old fogie yet in the software world, but can at least claim to have been around (professionally) for about a dozen years and worked for about half a dozen Fortune 500 companies (plus a few startups).

    In every single case I could go on and on about corners that were cut, testing schedules that were compressed, last minute features that were added without proper design, and an alarming number of times when programmers expressed concern about the quality of some module/functionality only to have it ignored by management.

    It's soured me so much that I'm actually considering ditching the IT world outright. In a land where deadlines control everything, you are going to have stability or quality issues - I guarantee it.

    Now, I've a known a few apathetic programmers that I personally wouldn't trust with a pencil, much less the code that my company is selling, but by and large, I think programmers all have the potential to make almost entirely stable code, or at least as stable as the tools/libraries that they have to work with. But this relies on a rather more liberal amount of testing than most companies seem willing to invest in.

    Most companies seem to use a sort of 80/20 rule. If they get 80% of the bugs out, that's good enough.. They can get their product to market and ship patches later. Having a more robust application just isn't as strong of a selling point, and even harder to prove? How do you show, objectively, that your app is more stable than your competition's? Most companies would flinch from even opening this can of worms because then you have to fess up to a certain non-zero percent failure rate on your own product.

    This whole issue touches on the "featuritis" problem as a whole. In order to maintain a revenue stream, software companies have to do one of two things:
    1) Keep making new versions of your app(s) *and* convince everyone they need the newest one [often when they really don't] -or-
    2) Try to make revenue through support contracts.
    The latter mostly goes away if your code is as good as you claim it is, making it a tough sell, or your customer base just doesn't have the budget for an ongoing contract where they really aren't getting anything new over time but still have to pay for it!

    Thus, the push for "features", and with the push for features you have deadlines, and with deadlines you have management doing whatever it takes to meet them, even if those choices are detrimental to the product.

    Ugh. This is why I hold on to the hope that open source apps, frequently hobbyist in nature, will continue to gain strength. In general, the contributors to those projects are chasing a vision and not market share. These guys and gals seem more willing to keep plugging away at something before they call it "done" and foist it on users. Of course, most of these projects are never "done" by their own admission, and will spend their lives in perpetual development. But at least I know that some of the effort is toward bug fixing and not just new features.

    But when Microsoft could have tried to make Word 95 better and more stable, they instead came out with 97.. and instead of making it more stable, they decided the market needed Word 2000, and then... Yawn. You get my drift. I would have been happy with a bulletproof Word 97 myself...

    [disclaimer: In the above I've made an outright embarassing number of generalizations and I hope they are identified as such and that I don't get slammed with a litany of examples supposedly to the contrary. Thank you.]
  • by dschuetz ( 10924 ) * <.gro.tensad. .ta. .divad.> on Tuesday May 20, 2003 @09:51PM (#6003735)
    Face it -- if our cars broke down as frequently as Windows (or Linux or whatever), we'd be suing the auto industry out of business.

    If our VCRs ate every tenth tape and only played tapes from the same manufacturer as the VCR with any quality, they'd all be returned to Circuit City.

    But for software, we grit our teeth and say, well, I just don't understand computers, and reach for the power switch.

    Until we, as consumers, start fighting for software that works without crashing, we'll continue to get the lowest possible quality -- just as we have for years. Once the customer starts demanding a quality product, the quality (and whatever software development practices, languages, testing procedures, etc., are needed) will follow.

    Bottom line -- there's no real incentive. Microsoft makes billions with buggy software, the increase in profit for selling non-buggy software is pretty small.

  • by innosent ( 618233 ) <jmdorityNO@SPAMgmail.com> on Tuesday May 20, 2003 @09:54PM (#6003752)
    Not exactly. Assuming that the hardware is ok, you can prove that a system is reliable for any given finite input (including, most importantly, all possible finite substrings of inputs, however it is not possible to test all possible inputs, since a portion of those are infinite), it's just that doing so in large systems takes enormous amounts of time, and of course, time = money. Take Microsoft, for example. It takes a team years to develop a product like Windows XP, run a few test cases, and fix the major bugs. But just think how long it would take to go through every possible input substring of a given length (and by substring/string I am including non-character inputs [mouse, network, etc]).

    Consider a simple program that inputs 10 short strings of text and does some computations on those strings. Say for example that the system that has only a keyboard as input, that all input functions are guaranteed only to input A-Z (caps only), the space bar, and 0-9 (regex ((A-Z)*(0-9)*)*( )*), not to overflow, and that there are 10 inputs with exactly 10 characters for each input (spaces fill end of string). This means that there are 37 possibilities for each digit, totaling 37^100 unique possible inputs, about 6.61E156 possibilities, each 100 characters. Typing a million characters per second would take 2.094E145 years! Keep in mind that this is an extremely simple system.

    Therefore, it is not possible to test ALL input cases of any nontrivial program, only a few selected cases, which most will agree is far from proving a program correct. Instead, developers should have detailed mathematical descriptions of how a program is to behave at each incremental step, and verify that the program follows those descriptions accurately. Programs can only be proven correct in the same manner that any discrete mathematic concept can be proven correct, with one of the most common methods of a functionality proof being mathematical induction. Based on a few basic assumptions (like that the functions you call work as documented), the rest of the system can be proven by proving the trivial parts and cases first, and then constructing a complete proof based on the trivial parts.

    The problem with this is that a small change can have a big impact on the proof, and nobody actually takes the time to verify that everything still works. Companies don't often spend money on making their software 100% correct, they just need to add the nifty new features that their customers want before their competitors do. I'd be willing to bet that 90% of the bugs found in XP can be traced to a "nifty new feature" that broke code that may have been proven correct at some point.

    In other words, the short answer is yes, if you can test every state, you can prove a program correct, but since that's usually impossible, it becomes the developers' responsibility to incrementally prove the system, which is far easier if all functionality is planned ahead of time, but still too time/money consuming for most software companies to bother with. Microsoft doesn't care if your computer crashes, you'll probably still pay them, and as much as I'd like to think otherwise, OSS isn't much different (although it's usually more time than money there).
  • by twitter ( 104583 ) on Tuesday May 20, 2003 @09:54PM (#6003755) Homepage Journal
    this was the exclusive realm of the highly trained engineer, not some wannabe type that pervades the current service market.

    Let's hear it for the "wannabes". I'm not a highly trained engineer by a long shot, but I've got computers that don't go down except for power outages. Then they come right back up. As ERS is so fond of pointing out, complexity kills traditional software. Cosed source can't keep up.

    Free software has the answer. Debian [debian.org] has 8,710 packages available to do anything a comercial comercial software does, mostly better. Not just one or two pieces of it, every piece. My systems never crash under their stable release and I run all sorts of services. How is this? It's easy. Free code get's used, fixed, improved and reviewed all the time. The pace of improvement is astounding. I could go on and on about things free software does that common comercial code does not. Code that never sees the light of day is dead.

  • by Anonymous Coward on Tuesday May 20, 2003 @10:00PM (#6003807)
    A commonly held notion, but not really well thought through.

    Sloppy programmer accesses through bad pointer in C. OS traps task.

    Sloppy programmer accesses beyond array bounds in MySafeLanguage. Runtime system traps tasks.

    In either case, your program "crashes", and the user isn't going to be any happier if you tell them that it's the "MSL virtual run time environment" that painted the blue screen of death than if it's the "operating system". The crappy program still ate my data.

    The two actual causes, IMO:

    1) People always code on the bounds of manageable complexity. Think about the programs people wrote 25 years ago. Nice as they were at the time, and they were on the bounds of manageble complexity, they have what would now be considered a laughable number of features and capabilities. As tools and processes and programmers get better, you don't get a better version of the same old thing you always had. You get something new and different that's just now become possible.

    2) Users (customers) get what they deserve. I have yet to meet a real customer that will actually wait longer and pay more for a higher quality system. Instead, they'll pay less to the guy that gets there cheaper or sooner. Everyone rants about quality, but they turn around and reward time-to-market and corner-cutting on development. If any significant proportion of users really insisted on quality, they'd get it, and probably at a much higher price. (Some, but not all, embedded development falls into this category.) Instead, they want it now and cheap, and the company that takes longer and cost more simply goes out of business.

  • by intermodal ( 534361 ) on Tuesday May 20, 2003 @10:01PM (#6003813) Homepage Journal
    in my years of repairing PCs and professional testing of software, I find that the majority of software problems are either a direct result of the QA department having no say in the company, the fact that more often than not, QA is looked down upon (and therefore are all contractors who once they know the product and company networks inside and out are at the end of their contract), and that home users don't maintain (defrag, remove dust from inside the case) their computers or pay attention when fans burn out. Or that they bought shoddy hardware, like that of the Dell single-processor Xeon Precision Workstations or an Emachine.
  • Re:Microsoft (Score:3, Insightful)

    by Foolhardy ( 664051 ) <`csmith32' `at' `gmail.com'> on Tuesday May 20, 2003 @10:02PM (#6003824)
    I have found that the drivers you use in Windows are the biggest factor in stability. Usually the drivers that come on the CD are the most stable, but they are not the best option for some devices. Microsoft supplied video drivers usually have almost no features and sometimes are quite incompatible, espically with games. Some companies produce great drivers while others seem to be really cheapo.

    Sometimes, different compainies will make completely different drivers for the same device. For example, the VIA AC'97 audio controller: VIA has their own drivers, and so does Realtek. I think that the Realtek are vastly superior to the VIA drivers, in terms of functionality and stability.

    I know its easy to blame Microsoft for every crash on a Windows system, but in my opinion bad drivers seem to be the culprit most of the time.

  • yeah, it's people. (Score:3, Insightful)

    by twitter ( 104583 ) on Tuesday May 20, 2003 @10:09PM (#6003867) Homepage Journal
    Computers crash because of people, especially some people in Redmond. You know, the folks who push that OS with a binary, one bit changes and it dies, registry, an email client that gives mail root access to hardware automatically, a web browwer with similar problems and a kernel that may still not keep track of background processes.

    It's never the user's fault. No matter what the user does, the program should recover gracefully. Code that crashes is pathetic. Take my wife. She's managed to uncover all sorts of bugs and flaws in software, but she and my baby girl have had a hard time busting Debian.

  • "We know that i+1 > i"

    Are you so sure? Depending on various circumstances, you might find that a little while after you get to 127 or 32767 (or thereabouts) i+1 has become i...

    Tim
  • by Blkdeath ( 530393 ) on Tuesday May 20, 2003 @10:13PM (#6003899) Homepage
    No, cost.

    Cop-out. BMW designs safe cars that are expensive. However, the Value cars employ many of those safety features because the research has been done, and the knowledge is now available.

    I could come up with dozens of analogies from countless industries, but it all comes back to this; why do poor coding methods continue to be employed, and where's the QC?

  • A lot of the posts here have posited answers to why computers crash (people, complexity, unsafe languages, etc.), but most everyone seems resigned to it.

    It should not be acceptable that they crash.

    Personally, I'm shocked every time I use a computer as to how primitive they are and how little has changed. Is it, or is it not the year 2003?

    All of these posited problems are solvable.

    Unsafe Languages? Stop using them. Someone please design hardware and an OS that disallows their use and disallows unsafe behavior.
    There are safe languages that compile and provide performance today (Lisp comes to mind, perhaps C#, Java's getting faster everyday, and there are safe subsets of C++). Start using those. And then someone go write something better.

    I earnestly believe that if the hardware/OS had good protection at the lowest level then performance would not necessarily have to suffer. If the OS is written in a language where the API is solidly contracted, then _true_ safety can be enforced at compile time, and not slow down the system at runtime.

    People? Users should _never_ be able to crash their machine. The person riding the elevator should have _no_ way, no matter how contrived, of making the elevator crash. And if the problem is programmers, then kick them out of the loop by forcing them to use safe languages, libraries and tools.

    Complexity? Well, this is the kicker isn't it? "you can't foresee all the possible conclusions". But we don't need to see all possible conclusions to stop crashes. And if we lay a foundation of solid transactions on solid APIs with solid languages, then complexity will be reduced, there will be less dark "unknown" spaces. Maybe it'll even be easier to write software with fewer bugs.
  • Re:Human Error (Score:5, Insightful)

    by Uller-RM ( 65231 ) on Tuesday May 20, 2003 @10:15PM (#6003918) Homepage
    Java programs can still crash -- and believe me, grade homework for undergrad CS students for a few years and you'll see plenty of it. The only difference is that Java tosses an exception that isn't handled, and C either asserts and calls exit(-1) or segfaults.

    I don't think it's fair to say that any one language is "safer" than another -- once you reach a certain level of expertise, one can write a stable and robust program in C or C++ or Java or Haskell (my preference) with equal effort. The effort is mental: being persistent enough to define solid logical definitions for each part of the program, failure conditions, etc. and then execute them to the letter in the language of choice. If the program behaves logically, you can prove that it works using logical principles -- induction and so on. (And if you ever do govt contracting or any other project that calls for requirement tracability, you'll need to.)

    The difference between languages is merely the way the code is expressed. Java and C++ have exceptions; C does not. For some situations, return codes are better than exceptions, and for some situations the opposite is true. Java has robust runtime safety -- C and C++ do not. C and C++ have templated containers -- Java's just now getting such genericity. All languages and all approaches to problems have tradeoffs: the mark of a good programmer is knowing those tradeoffs and picking which is best for the situation.
  • by abirdman ( 557790 ) * <[abirdman] [at] [maine.rr.com]> on Tuesday May 20, 2003 @10:16PM (#6003925) Homepage Journal
    I'm afraid if a user error causes the program to crash, I've got to call it a software error. It's not that hard to write the error handling handling routines, it's just never in the budget. And the users are invariably able to discover new frontiers of errors the programmer(s) never dreamed of. No matter. If clicking the wrong box, entering the wrong data, plugging in the wrong mouse, or installing the wrong screensaver causes a program to crash it's not the users fault (bless them, for they know not), it's the programmers and design engineers fault.

    Hardware errors are another problem altogether. Luckily, it's usually quick to diagnose, and it's usually cheaper to replace hardware than software. It's great how I've been using Microsoft error reporting for about 6 months now, and it's never been their fault. They must be getting better. \snicker>
  • by Anonymous Coward on Tuesday May 20, 2003 @10:18PM (#6003949)
    This would fix 95% of software problems.

    It also means that you throw away 95% of all existing software. Away goes the Slashdot (MySQL) along with the rest of the Web (Apache, ISS) and the Internet in general. Not that it matters, because you don't have any more operating systems (Linux, Windows, OS X, etc.) some of which are doubly bad because they are partially written in assembly language, which lets you do (gasp!) anything! What, precisely, are we to actually get done in your computing utopia? Read Pascal code by candlelight?

    And what do you mean, "don't allow" people to use languages you don't like? Keep your laws off my computer, mein Fuhrer.
  • by nolife ( 233813 ) on Tuesday May 20, 2003 @10:23PM (#6003980) Homepage Journal
    The methods are here now. The difference is the amount of testing done on the non consumer applications is much much greater. Consumers rarely need a very stable robotic arm welders, nuclear reactor protection systems, silicon etching machines and flight control systems at home.
  • by 1001011010110101 ( 305349 ) on Tuesday May 20, 2003 @10:23PM (#6003983)
    Quality control practices, and good methodologies are knows to many, as you said, research as been done. The only problem is... time and money. Time as in "This much testing should be enough, lets make it to the release date" and money as in "I dont have the budget to get a full testing team for this project".

  • by |Cozmo| ( 20603 ) on Tuesday May 20, 2003 @10:26PM (#6004009) Homepage
    If only it was as simple as you say.
    Creating drivers that run devices made by many different manufacturers means you have to take all of their differences into account in order to get the same behavior from all devices. For Example: If one chipset powers down a certain device during a reboot/standby/hibernate/etc and another chipset doesn't, you can run into strange behavior. Throw laptops and APCI into the mix and we're lucky things work as well as they do :) I think we're probably just lucky most of these things are isolated from end-users.

    I have several machines that won't even POST with certain configurations of USB devices plugged in. I think it is a BIOS issue.. Probably trying to fiddle with devices to do HID support or booting from storage devices and it is probably either hanging the BIOS or the hardware.
  • by WindBourne ( 631190 ) on Tuesday May 20, 2003 @10:28PM (#6004020) Journal
    Wow. You used "M$." How insulting. That proves the level of your maturity and is a cohesive summary of the fact that companies--surprise, surprise--try to make money.
    Why do you say it is so insulting? MS is all about money. That is all that they have made successfully. Their software is total crap, but due to a monopoly, they have been able to push a multi-level monopoly. Personally, I abhor their behavior and software, but admire their business sense. MS is about money, and they do it well. So no, M$ is not an insult. It is praise.
    BTW, I would have to say that if Sun and HP had the business sense that M$ had, they would be 10x their size. Gates has never developed anything new or interesting, just stolen it. But he has normally figured out where the future lies and taken advantage of it. Companies like Sun and HP have never really gotten it. It is too bad, as both have good engineers, but horrible business ppl.
  • by ThreeToe ( 411692 ) on Tuesday May 20, 2003 @10:32PM (#6004048)
    You make a very insightful analogy and I think it is quite revealing.

    You state that we won't have Software Engineering until "someone can figure out a way to prove that a given piece of software will perform as it's supposed to."

    Alas, this is known to be an impossible task in the general case: this is Turing's halting problem. There's no Newton-caliber breakthrough waiting in the wings here.

    Unit testing works because testers know their software systems intimately and can specialize testing code to work in a narrower number of cases. State modeling languages such as ASML [microsoft.com] can help improve the situation, but seasoned testers know that no tool will help them achieve 100% block and arc coverage of their code.

    I'll throw this out for discussion, then: the underlying principals of a software system's design dictate its fundamental physics. It is difficult (sometimes impossible?) to make distinctions between a software's functionality and its substrate.

    In an ideal world, developers would find a technique by which they could _always_ separate the two and hence categorize a common physics. The choice of language is part of the physics, but it isn't the sum total: C++ apps can have radically different underlying structures.

    Thoughts?

  • by PetoskeyGuy ( 648788 ) on Tuesday May 20, 2003 @10:33PM (#6004052)

    Software crashes because it's acceptable and information about how to make programs that don't crash is sometimes hard to come by.

    There are programmers out there who have spent years coding and learned how to avoid buffer overflows, check return codes, and fail safe if something unknown happens. But these things are not taught in school and even if they are, someone is going to make a mistake.

    Software Engieering never advances. We don't follow the blue prints, we send out the constructions workers and makes sure something is standing ASAP so it looks like were working. Boss is coming, put some drywall up - we'll wire it later. Some guys worked on a really safe way to build the stairways, but his last company patented it so we'll have to do something else this time.

    As an industry we don't learn from our mistakes. We reinvent the wheel time and time again but this time it's transparent, chrome and glows in the dark and square. Things are moving too fast and the old can't teach the young to avoid their mistakes because they are considered dinosaurs after a few short years. So we make the same mistakes on the "new" systems over and over.

    Plus the system feeds itself this way. This software sucks, I better upgrade.

    We would need something like standard Building Codes and Inspectors. When real buildings fail people could get hurt or die, but when a computer fails you reboot. It's just not worth it economically to make a program that never crashes. It would be obselete by the time it's done.

  • Several Factors (Score:3, Insightful)

    by null etc. ( 524767 ) on Tuesday May 20, 2003 @10:34PM (#6004058)
    There are several causes of software crashes. Let's address the obvious ones:
    • race conditions. From the FreeBSD Developers' Handbook: "A race condition is anomalous behavior caused by the unexpected dependence on the relative timing of events. In other words, a programmer incorrectly assumed that a particular event would always happen before another."

      Race conditions are particularly difficult for developers to address, since they propogate at many levels within the system (hardware level, OS-assigned resource level, application instruction level, etc.) Also, only realtime operating systems or simple embedded systems guarantee the relative ordering of certain events. Complexity has a direct correlation to the inability to guarantee timing.
    • deadlocks. Deadlock occurs when multiple processes compete for limited resources. From Sun's Java Classes: "The simplest approach to preventing deadlock is to impose ordering on the condition variables." Sometimes, it is difficult or impossible to guarantee cooperation among competing resources.
    • unsafe application environments. An operating system can establish limitations upon applications, such that those applications never exceed certain safety boundaries (e.g. access to areas of the filesystem, system resources, etc.)

      Most operating systems that thoroughly employ these limitations are considered "user-unfriendly." More user-friendly operating systems, such as Microsoft Windows, inherently eschew these safeguards by default, allowing applications to perform actions that potentially result in a crash. Application environments such as Sun's Java do a good job of "sandboxing" an application's access to resources, such that system crashes are unlikely.
    • unsafe hardware architecture. A computer's hardware consists of a primitive architecture that is unable to guarantee proper operation. The current PCI bus and "IRQ" interrupt scheme is particularly susceptible to computer crashes, if hardware drivers are programmed incorrectly.
    • third-party software and hardware. The support for third-party software and hardware results in an operating system environment which is open and generalized enough to be susceptible to crashes. For example, if you allowed anyone to come into your house and plug any manner of devices into your power outlets, you could conceivably experience a power outage as the circuit breaker kicks in to prevent electrical damage. That's the danger of exposing your outlet to strangers.

      In order to create a system that enables applications to perform tasks as complex as controlling the entire computer (e.g. screen savers, hotkey programs, power toys, etc.), applications must be given the theoretical power to perform tasks that can crash the computer. The result is that the computer crashes when the application works improperly.
    • application complexity. Regardless of how smart a developer is, the developer's ability to guarantee the functional correctness of a system decreases in proportion to the complexity of that system. Simple systems therefor are much less likely to crash than complicated systems. Whether they do, or not, depends on the safeguards that were put in place to augment the developer's ability to guarantee the functional correctness of a system. NASA's procedures for programming misison-critical systems relies on any number of safeguards to ensure functional correctness of those systems.
    That's a good starting point, for now.
  • Re:It's bugs! (Score:3, Insightful)

    by Billly Gates ( 198444 ) on Tuesday May 20, 2003 @10:35PM (#6004063) Journal
    Part of the problem is programmers do not use or do not have access to good libraries/Objects and instead write their own.

    True the api's could contain bugs but the chances are they will contain less bugs then your own because they are tested.

    Infact as Microsoft fixes the bugs with the later versions of the win32 api, mfc, and com/dcom, the number of windows based application bugs has done down. It still high though but better. .net will improve the situation. Both for VB as well as C++ programmers. QT has helped alot in Unix development as well. You can write very large programs with less bugs with good api's and language features.

    Java is a perfect example. People mainly use it because of its rich libraries and not for the language. Unfortunatly many companies have standard like only use C++ or VB for every situation. Companies like catapilar even use C++ to write programs for scripting instead of awk/sed or perl! Its silly, costly and causes bugs. The point of standardizing on c++ is you do not have to re-invent the wheel. Since IT spending is down, companies are no longer buying third party api suites.

    Anyway without good api's and object oriented code re-use for your language, your bound to have more bugs because you have to write your own.

    Object oriented programming is supposed to help reduce bugs and it will if you have alot of objects at hand to use.

  • by ergo98 ( 9391 ) on Tuesday May 20, 2003 @10:38PM (#6004093) Homepage Journal
    It takes signifigantly more skilled developers and more testing (i.e. expensive) to make systems that don't crash, and consumers(including you) won't pay for them

    I beg to differ. While there are variances based upon the quality standards and initiatives at an organization, a large correlation can be made between the complexity of software and the incidents of bugs (i.e. Bugs = (1.0 / Quality_Standards) X (Lines_of_Code / Years_In_Active_Maintenance)). There is _no_ comparison between a piece of life support equipment whose lines of code can often be measured in the hundreds, and something like Windows XP where there are tens of millions of lines of code. Features come with a cost.

    The number 1 way of assuring quality code is by removing everything until you're left with the absolute essential functions.
  • Bravo! (Score:2, Insightful)

    by dcavanaugh ( 248349 ) on Tuesday May 20, 2003 @10:39PM (#6004095) Homepage
    Now that Microsoft has been marketing fragile products for 20+ years, it should be no surprise that we have Comp. Sci. faculty with a tolerant view of instability. Some of them grew up with this stuff. If M$ "state of the art" is really good enough, then maybe software has become so commoditized that we just relegate everything to the H1Bs and let it go at that.

    True story: My wife was in the hospital maternity ward. This is a modern US hospital, not some third-world tent. For about 24 hours, they had her connected to all kinds of sensors which were connected to a Dell PC running a data collection/graphing program on what appeared to be Win 2k. The application was a joke. The nurses fumbled and bumbled with it; crashed at least once. Fortunately, the important things went well (it's a boy), but no thanks to our friends in Redmond. Had there been a problem that those sensors were supposed to detect, we would have been screwed. As an expectant father, my primary job (at delivery time) is reassure Mom that all is well. Seeing this Windows app sputtering along made my job a bit tougher. Let's hope things are a little better in the ER or ICU.
  • by Anonymous Coward on Tuesday May 20, 2003 @10:44PM (#6004119)

    The issue here is that we're talking about a product sold in stores to a large number of customers. And not a particularly cheap product, for that matter.

    Customers have every right to expect value for their hard-earned dollars. If the customer's computer meets the specs printed on the box, the game should install and run. Period.

    The publisher should give refunds to anyone who bought the game in good faith and couldn't run it.

  • by Minna Kirai ( 624281 ) on Tuesday May 20, 2003 @10:47PM (#6004146)
    Most people I work with in the IT industry would give their arm, leg, spleen, right lung, part of their left lung, lower intestine, and maybe even their occipital lobes for a reliable system that WORKS.

    No, that's the myth!

    Show me one off these voluntarily maimed admins, who carved out all his organs hoping for improved software. They don't exist.

    (More realistically, show me one who sacrificed 30% of his annual salary for better software. He also doesn't exist)

    True, from day to day, everyone wishes that that jobs were easier.
    • "I wish customers would read the web page, instead of calling me for phone support"


    • "I wish we had a train to Chicago instead of me driving this truck for 7 hour streches"

      "I wish the servers I maintained didn't crash"
    However, if those people were fully rational, they'd understand that as soon as their wish comes true, they're out of a job. (An enlightened person will welcome the change as better for the world at large; a luddite would whine, scream, and throw boots in the gears)

    And anyway, IT admins are not the consumers of software. They're not the ones who drive the buyer-seller economy. The actual consumers are other people in the company- and from their perspective, the IT staff are an expense attached to buying the software.
  • Actually (Score:4, Insightful)

    by Sycraft-fu ( 314770 ) on Tuesday May 20, 2003 @10:50PM (#6004164)
    One of the biggest barriers to stability for something like Linux (or Windows) is the fact that it must accomadate new software and hardware configurations all teh time. If you take a Lucent 7R/E phone switch it will run on a given hardware (the 7R/E) hardware. IT will run Lucent's OS, it will do only what it was designed to (switch phone circuts). There is no putting new hardware in it, less it be Lucent approved, there is no loading of new apps to make it do things, less it be Lucent approved, and so on.

    IF you want an open OS that will run with hardware by whoever happens to want to make it and software by whoever hapens to want to write it, you cannot have a verified design that is 100% reliable. Unforseen interactions WILL happen and crashes or other malfuncations will result.
  • by El Cubano ( 631386 ) on Tuesday May 20, 2003 @10:53PM (#6004178)

    Don't allow people to use languages that allow you to access memory not assigned to you or to access array positions that don't exist.

    It always bugs me at how quick people are to blame the problem for crappy coding on the language. This would be tantamount to a carpenter saying, "if my hammers weren't so damned versatile I could build a higher quality product and not break my thumb open." People would look at him like he was crazy. Or better yet, an inexperienced apprentice saying, "That hammer is just too powerful for me to use."

    That being said, C and C++ are the hammer that was designed by carpenters (OS experts) for use by caprenters (OS experts). Don't blame the problems on a bunch of kids who are neverly properly educated on the use of the tool.

  • Re:Simple ... (Score:5, Insightful)

    by DarkZero ( 516460 ) on Tuesday May 20, 2003 @11:03PM (#6004230)
    Was it, or was it not, designed to be used in this way? If it was not, why does the system let you try it?

    Your microwave isn't designed to let you put an AOL CD or a piece of tinfoil in it and turn it into a box-shaped firecracker, but it still lets you try it. So the simple answer would be that it lets you do it because it can't control absolutely everything that it interacts with. A download manager isn't designed to be run at the same time as an MP3 player, AIM, ten browser windows, an IRC client, and downloads in other programs at the same time, but it still lets you try it because it has no control over those programs, no different than the microwave's lack of control over your hand and your AOL CD.
  • Re:linux crashes (Score:2, Insightful)

    by Mr Z ( 6791 ) on Tuesday May 20, 2003 @11:08PM (#6004264) Homepage Journal

    Well, I have 309 days uptime on one of my boxes (the webserver/squid cache/bittorrent box). I've got about 192 days -- I was planning to move into a new house before then. Better start looking. :-)

    It says something that people notice a jiffies rollover bug at 497 days...

    --Joe
  • by maynard ( 3337 ) on Tuesday May 20, 2003 @11:08PM (#6004267) Journal
    I used to leave all sorts of machines running 24/7 in my apartment. Several Suns, a couple PCs running Linux and BSD, an SGI, blah blah blah. I did take care to turn monitors off though. I kept this up until I turned off all my systems (except the mail server) for a two week vacation: I was shocked to discover the next electric bill arrived a good $80 cheaper. I've since cut back to a single machine which I turn off at night. No more crazy uptimes, but honestly - I'll take the money. I wish there was consumer demand for low power destop computing. I guess I'll just have to migrate to a good laptop for the low power option. But you're absolutely right: a few computers can suck up a lot of power, with damaging results to one's electric bill. --M
  • by dragontooth ( 604494 ) on Tuesday May 20, 2003 @11:15PM (#6004298) Homepage

    Sorry that is a pile of shit. You pay a lot of money for M$ and other software too. I can almost understand it with Linux with so many different developers and no real set standard as of yet...and still it crashes less frequently than Microsoft. People would pay for stability if they had the option. Would you pay for a car that doesn't always start? Or decides every so often to stop in the middle of the expressway? I think not and these too are consumer products.

    What a cop out. You must work for M$. I don't believe that anyone who has bought Windows has got what they paid for. Since NT MS has always put "more reliable" and "more secure" in their advertising pitches. "Windows XP...The reliability of Windows 2000 with the game playing power of Windows ME." It is true that it IS very expensive to properly test and create code that does not crash. However if I am paying several hundred dollars for a piece of consumer software I would expect that I can somewhat rely on it.

    Sorry but if I had some mod points today I would be modding down this parent. You're just making excuses for them. The trouble is not the expense of writing the software it is the time expense of getting out the next piece of crap to the store shelves.

    END RANT HERE

  • by Minna Kirai ( 624281 ) on Tuesday May 20, 2003 @11:23PM (#6004337)
    It's going to take a breakthrough on the order of Newton to make Software Engineering as reliable a discipline as Civil Engineering.

    The reliablity of today's Civil Engineering comes not from deep theoretical understanding ala Newton- it's really just the same "build, crash, repeat" method those Freemasons have been using for 1000 years.

    Now that we've had centuries of experience at building similar kinds of structures, most of the kinks have been worked out. Those rare CivEng projects that break new ground still have a high risk of unexpected failures. (A 4000% cost overrun is a failure [boston.com])

    Civil Engineering still uses empirical testing to decide if a new technique is reliable, as does "Software Engineering". You just notice it more in SE because that field has more opportunities for innovation and much, much fewer penalties when an experiment fails.

    JUnit is a step in the right direction, but there's still a long way to go.

    JUnit is a step down a curving road to a dead-end. It won't take us to an ultimate solution (but it will provide benefit in the near-term future). That's because it's not a system to help formally prove code is correct (which some unpopular languages support to small degrees)- instead, Unit Testing is just a way to automate "build, crash, repeat" empirical testing.
  • by jhoffoss ( 73895 ) on Tuesday May 20, 2003 @11:30PM (#6004373) Journal
    You could never write software that was perfect, because you can never account for every situation.

    The solution most non-CSci people ask next is "Can't you write a program that checks for errors?" Intriguing to think about if you've never actually pondered it, but the answer unfortunately is no. You can't write a finite-state machine that can detect or correct an infinite number of states.

    To do so would be similar to calculating the "best" route from NY, NY to LA, CA. You could choose any number of roads and paths from coast to coast, with or without loops (finding them would be quite a bitch) possibly traversing every road in the US. If you don't understand why you can't calculate this, ask your neighborhood CSci major.

    The best we can instead do is safeguard the software we write as well as possible, which requires time (and therefore money) and computing power to do things like bound-checking on arrays; handling interrupts properly; and managing memory throughly, to name a few major problems in any software. Languages like Java come a long way in some respects, but are very slow. But this isn't a good enough solution, and frankly, most programmers aren't good enough to produce fully error free code.

    As revolting as it may sound to the hacker-coders out there, great programmers, software engineering, business processes, documentation, and management of the whole product are necessary to produce truly good software.
  • Re:Simple ... (Score:3, Insightful)

    by Marc2k ( 221814 ) on Tuesday May 20, 2003 @11:41PM (#6004435) Homepage Journal
    Computers are just like liquor...the less his parents drink vodka, the less likely they'll be to notice a difference.
  • Scope and Features (Score:2, Insightful)

    by jkichline ( 583818 ) on Tuesday May 20, 2003 @11:42PM (#6004439)
    I think the issue with crashing software is a combination of problems. Obviously cost is the biggest issue. Economics is another. And time is never on the developers side. Fact is, it is not economically advantageous to write rock solid code. Why?

    First, it costs a lot of money to test and it is very difficult to keep your new code under wraps (from competition) and still offer a truly well tested system. Open source solves this problem by somewhat reducing competition since the code is free and can be tested by many people in various stages of testing. (Probably why Open Source is more stable)

    Don't forget boredom. Once a developer gets something "working" he or she doesn't want to continue to stare at the code for hours contemplating its every possible flaw. We'd rather be reading slashdot.

    Second, if your software was 100% bug free, people would never have a reason to upgrade. Guaranteed, if Windows 98 didn't crash so dang much I would never have installed Win2k. My dad had an old Compaq Presario with Windows 3.1 on it and it never crashed. He reluctantly had to upgrade to experience things like MP3's and AOL. (and crashes) I did downgrade from WinXP (Piece of doggie doo) back to Win2k.

    Third, time is of the essence. Many times I am pressured to get the code done. It is better to have a software application that works pretty good and start using it than to have it absolutely perfect and never use it. This is an expontential scale. It takes more and more time to make the software a fractionally more stable. And sometimes you find a rewrite is in order. There is a balance to be obtained.

    Some other things to consider: Scope and Methodology. The comparison was made between cars and code. I think this is an unfair evaluation because the scope of a car is well defined. You know certain parameters such as the size of the road, the speed it can travel. You have certain benchmarks it must meet, safety regulations. Software on the other hand has few of these. Operating Systems run on an incredible number of hardware and can be configured in infinite number of ways. I've found that PCAnywhere when installed with some other, unrelated software can just blow up an machine. The problem is that scope is not, and most noteable cannot be contained WITHOUT limitations. This is the reason why a Linux server running in Terminal mode with two daemons on it can run FOREVER. The scope is well defined, crap is not compiled into the kernel.

    Lastly, methodology is the best answer. The comparison of computer code to legal code is a very good one. The reason why good lawyers write good legal docs is because they have a good methodology. They know how to cover their bases. Programming language developers should consider a development methodology and set up limitations. Java and other type-safe language set up these limitations and the result is safer code. Consider narrowing this even more. But realize that limiting what the developer can do has economic effects. What good is the worlds tightest coding methodology if VBScript still exists and can do the same thing? (and break)

    In all, we are held in the balance. Yin and Yang. We cannot have one without the other. You add features, you add bugs. You create limitations, your code doesn't get used. You increase your time to market, you watch your competition buy you out. This is the way of things. A chasing after the wind.
  • by pauljlucas ( 529435 ) on Wednesday May 21, 2003 @12:02AM (#6004554) Homepage Journal
    video games (like PS2, GameCube, etc...) are at least an order of magnitude less complex than your average PC.
    Hmmm... 3D video, A/D converters for hand-held controls with multiple inputs, multiple of said controls, must respond in real-time. Gee... doesn't sound all that less complex to me. All a typical PC has to deal with is a single keyboard and mouse.
    Firstly there's only ever one application running on them at any given time, and it's often single threaded (modulo support processors and simple interrupts).
    How soon you forget. Old versions of both Windows and MacOS also only ever had one process running at a time and were single threaded -- and yet they still crashed. And yet it isn't until very recently that both OSes got stable (ironically by running multiple processes -- ironically from the point of view of your argument).

    Game boxes just have different complexity.

  • by Anonymous Coward on Wednesday May 21, 2003 @12:04AM (#6004573)
    To get some idea of how much time testing really takes.

    I can write a simple code change in .5 hours.
    Step-thru debugging, where I watch every line of code execute: .5 hours.
    Informal testing: I do unit testing, change a database table, run the code, change the database table back to it's original value: 2 hours.
    Extreme Programming testing: Write code to do the same this, automatically for the life of the product: 8 hours.
    Extreme Programming System Testing: Run the program with sample files that will break in expected ways, and run in expected ways: 32 hours.

    Unless you are using Sun or IBM Mainframe software,
    you aren't going to get that kind of commitment for the product from Upper Management. Unless you're lucky
  • Re:Touchy subject (Score:3, Insightful)

    by Exantrius ( 43176 ) on Wednesday May 21, 2003 @12:14AM (#6004619)
    Well, I've actually caused my PS2 to crash quite a bit, such as playing gauntlet with 4 players (tries to reference negative ram addresses or something like that...
    But in general, you're right. It's very difficult to recall a console game...
    However, it only has to run on one (few at least) set of hardware. It's the reason macs seem to never crash-- If they had to program for every piece of hardware out there, there'd be a lot of "crap" that happens, and things get messy...

    If PCs were uber-standardized-- this proc. this amount of ram, this and that, then there would be no problem. I'm working tech support (for a *gag* foxpro program) and one in 100 customers gets extreme slowdown (like running a report can take 72 hours when it's supposed to only take 10ish minutes) all the time. We have been hunting for it for the past months, and it isn't the data... It seems possibly hardware related, but there's so much hardware out there, and so many different layouts for it (win9x v. me v. 2k vs. nt v. xp)

    It's a nice belief that they try it on a bunch of systems, but chances are, if it's anything like the jobs I've run, you've got one guy that collects all the files, then at the end, he runs it around the office, and maybe to a "test room" with generic pcs of varying speeds and makeups, which he tests it on.

    Did you ever look at sierra's help stuff? I never had a problem installing their stuff (microprose on the other hand, cost me six months allowance because it hard killed win31, and I had to bring it to the store to get it reinstalled (you know, when your parents didn't trust you to touch the damned thing, even though you can't do any more damage than you already did? Ahhh, memories...

    Uh, that's all I have to add. Good points, just a bit more insight... /ex
  • Re:Human Error (Score:5, Insightful)

    by GlassHeart ( 579618 ) on Wednesday May 21, 2003 @12:16AM (#6004629) Journal
    It would be interesting to do a study of the "bugginess" of programs written in python, java, scheme, smalltak, lisp etc. My guess is that programs written in C crash the most.

    Even that is a worthless statistic. Assuming that bad programs are written by bad programmers, the language that more bad programmers choose will appear the highest in your study as the buggiest language. Bad programmers choose the language du jour, thinking it will land them a cushy job.

    You'll have to disprove the assumption to correctly blame the language.

    Use better languages and crash less.

    Try dividing by zero in your better language of choice.

  • by stwrtpj ( 518864 ) on Wednesday May 21, 2003 @12:21AM (#6004657) Journal
    Here's the thing that always interested me. Why don't console games crash? I'm sure they do sometimes, but I've got a Dreamcast and about 50 games. I've seen a small bug here and there, but I've never seen the machine blue-screen or whatever DC's do when the OS lunches itself. I realize that the standardized hardware platform has a lot to do with it, but games are every bit as complex as other software, perhaps more so. So why don't these games crash? Well, if they did, they would never sell.

    This is one point (and a good one), but the truth is that games get a lot more testing than other software. The main reason for this is that most games could be considered a realtime system, whereas your spreadsheet program is not.

    What I mean by this is the fact that a program that needs to respond instantly to user input while at the same time spewing out millions of triangles a second of 3D graphics data has a much lower tolerance for error than your spreadsheet program that spends 90% of its time just sitting there as you type stuff into cells.

    Spreadsheet fubar'ed because of some odd value you input? Oops. Oh well, reload from the autosaved copy and try again.

    Your game fubar'ed because of some object collision detection glitch? Arrggghh, my character got killed!! I had the game's ultimate superpowered megaboss down to 1 friggin' hit point!! NOOOOOO!!!

    Perhaps this example also makes a statement about the priorities we place on how excited we get over games vs productivity software :)

  • Complications (Score:3, Insightful)

    by Latent Heat ( 558884 ) on Wednesday May 21, 2003 @12:21AM (#6004658)
    I got my life-sciences PhD sis that book "Complications" for Christmas and had a chance to read a couple of chapters before wrapping it as a gift. What I got out if it is the big cultural divide between the route taken by surgeons and by anesthesiologists. The surgeons have these Q&A sessions where they confess to their mistakes behind closed doors -- they get browbeaten by their peers but I suppose they have some "immunity from prosecution." They essentially talk about the mistakes they made and how they ended up killing patients and how they aren't going to do THAT again. The anesthesiologists used to kill a lot more patients than they do now -- don't know if the current rate is "acceptable" but anesthesia apparently has gotten much, much safer than even from 10 years ago. They got this engineer dude to look at their "systems and procedures" and he came up with fixes such as "all anesthesia machines should turn up the O2 when you turn the dial clockwise" and "when you turn the dial all the way to the left it shouldn't turn off the O2 completely" and "use standardized monitors of patient vital signs to figure that you have the O2 tube down the esophagus and are choking the patient so you can put the tube down the windpipe before you kill the patient." The surgeons are these "he-men hair-on-the-chest right-stuff" types -- patients being killed or maimed by surgery is not the effect of weak practices and procedures, it is the fault of wimpy surgeons who are not good enough (remind you of the C/C++ community blaming outcomes of dangerous language features on programmer skill?). The author of Complications is suggesting that even the most skilled, conscientious surgeon can and in fact does continue to have bad outcomes in this environment, and business is as usual among surgeons. The anesthesiolgists turned their craft over to some industrial engineer dude and have revolutionized their field. Yeah, yeah, No Silver Bullet, and a lot of the "process" aspects of software engineering are PHB snakeoil. I used to think that the proper computer language was the answer, but it seems for all C++'s warts and for all of Ada's safeguards, C++ and Ada work out pretty much the same in terms of developer productivity and bug rates. On the other hand you have the challenging-dangerous features of C++: copy constructors and stack-object assignment semantics, templates, macros, and what have you, and the cry goes out "these features are needed because they are powerful and the programmers who are scared of them are intellectual midgets." I suppose standardizing the UI to anesthesia machines crimps the art of the gifted anesthesiologist as well. Maybe there is hope in design patterns as a guide to coding. Fowler talks about the "null object pattern" where you never assign an object variable to null but instead assign it to a "null object" that responds to all method calls according to some "null object behavior." Look Ma, no null-reference exceptions! I understand that something like this is built in to Objective C. Maybe instead of relying on garbage collection we can come up with design patterns and procedures for when an object is constructed and when it is deleted according to some probably-safe schemes. Don't have an answer, but if the answer is giving people powerful but dangerous tools and minimal guidance on how to use them as a way of not crimping "expression", and if the answer is calling programmers who make mistakes stupid with the idea that there are programmers who don't make mistakes, we are never going to solve these problems.
  • by dspeyer ( 531333 ) <dspeyer&wam,umd,edu> on Wednesday May 21, 2003 @12:26AM (#6004682) Homepage Journal
    The "obvious truth" is that most bugs occur at boundaries. It's actually not very obvious, but it is very well established at this point. That's why intelligent modularity and clean APIs are so important.

    It's also why a single system that doesn't interact with anything tends to be easy to debug. A VCR just does its thing, and doens't worry about what anyone else may be doing -- not even the hardware it runs on!

    Single tasking systems seldom crash. Single tasking systems that maintain no state between programs crash even less. Ones which run on only their own hardware crash still less.

    But they also do less.

  • by ComputerSlicer23 ( 516509 ) on Wednesday May 21, 2003 @12:30AM (#6004711)
    Sure, writting software that will work on precisely one known system (or a handful of revisions). Take the PS1 for instance. It's my understanding there are 4-6 revisions of the original system. They behave mildly differently (the original ones for instance can't do force feedback controllers, and some of them have slower CD-ROMS). I thought there we're a few minor incompatibilies that affected game speed (the game played slower on the original hardware).

    Now consider that there are probably only a half dozen BIOS versions, where once you boot up your in complete control of the system.

    Now look at a stock PC. There are probably a dozen BIOS makers, with 10's of thousands of BIOS's. Oh, which version of Windows do you have? Oh, you have Windows 95 with Service pack 1, which you later removed, but certain pieces of that couldn't be uninstalled. No, you mean to say you have Windows 95 OSR1? Wait, did you say Windows 95 OSR2? Wait, you have a fresh Win 98 install. No, you mean, that's a Win 3.0 upgraded to Win 3.1 upgraded to 3.11 upgraded to 3.11 with Win32s upgraded to Win95 which then had service Pack 1 installed, which was then upgraded to Win 98, upgraded to 98 second edition, with several service packs installed. Wait, did you, or didn't you install the Motherboard DLL's? Do you have this weeks or last weeks version of the video driver? Do you have revision A, B, or C of the 3com 3c509 (it makes a difference on which driver you should install, no lie).

    Have you installed the Win 95 Resource kit or not? Wait, do you have the Debug, or the production version of the MFC libraries installed? Are you sure you have enough memory?

    Do you have one of those goofy non-standard VGA cards?

    Did you put a AGP 2x card in the 4x only slot?

    Does your IDE drive properly implement write back/write thru caching? Was the standard not clear, so different manufacters do different things, yet both are complaint?

    Now, do you have the version of wincrt.dll that came with Windows, the one that was patched, the one that was shipped with Office, the one that shipped with the goofy game you installed?

    When was the last time you flashed your BIOS? Are you sure you got the 5ns RAM, and the jumpers are set to 3.2 volts, not 3.3 volts?

    Do you have an AMD approved power supply, or is it one that doesn't supply 450 watts? Or can't sustain the output over long durations?

    Have you applied this hot fix? Or that one? Do you have IIS on, or off?

    Wait, is that a PCI 1.1, or PCI 1.0, does that need a 66Mhz PCI bus. No, hold on, do you have the goofy non-standard PCI card that has timings that are a little off? Are the IRQ's correctly allocated?

    The next time your console maker asks you what hardware you have, or what revision of the machine you have, then they are starting to have problems like a PC software maker would. A console maker, only has to worry about what they actually make, and they only have to worry about what they put in the console. They don't have to worry if another part maker is making parts to spec, or if the spec is ambigious on certain points. They don't have to worry about a lot of things. About the worst thing that could happen to a consle maker is to have a bad run of boards/parts, and have to recall a specific series of serial numbers. Other then that, all of the parts should act absolutely identical in all situations (assuming normal operating conditions, like clean power, normal temp/humidity).

    It's part of the reason Apple's make such stable hardware, it's why Sun equipment has such a good reputation for stability. Mac's and Sun boxes are reasonable consistant. About the worst you can do, is have non-standard patches. But as a base install, they are a well defined quality.

    Video games actually do crash. I've had all kinds of games crash on me on PC hardware. Even the original Pac Man had a crash in it. Once your score exceeded a specific value, it's my understanding the hardware locked up. Wo

  • at the End (Score:1, Insightful)

    by beavmetal ( 250116 ) <beavmetal@hot m a il.com> on Wednesday May 21, 2003 @01:00AM (#6004827) Homepage
    Besides the harware production flaws, the software coding flaws, and the shipping process (every atom is manufactured, man-handled, assembled, and shipped somewhere), let us not forget end user abuse.

    Who is guilty of smacking/kicking the PC or mainframe? I bet this helps stability alot.
  • by frovingslosh ( 582462 ) on Wednesday May 21, 2003 @01:00AM (#6004828)
    "I've used computers for about 30 years and over that time their hardware reliability has improved (but not that much), but their software reliability has remained largely unchanged.

    I've been using computers a few years longer. Heck, I've owned computers a few years longer (yes, that makes my first one prior to the 8080 micro chip). But even 25 years ago I saw Data General systems with a lot less raw power than a Pentium that ran a multi-user OS and supported an office full of users, and routinely ran without crashing or even being shut down from year to year, and were only rebooted when the tech came around to give them a scheduled prevenative maintence. Sure, some systems did fail (and some in quite interesting ways), but it was the exception, not the rule. The thing that I see as having changed is that Bill Gates became the richest man in the world, while at the same time giving us an OS that crashed so regularly that it just can't stay up. And somehow people accepted it. How he got away with it I don't understand.

  • by NetCurl ( 54699 ) on Wednesday May 21, 2003 @01:08AM (#6004862)

    Personally I don't think not giving the user the option of defining any settings which could cause malfunction to be the answer. The reason? Well it's pretty simple, when set properly those same settings give flexibility, added functionality, and performance (at least one, sometimes two, often all three of the above).

    See, that's the thing. I like Apple's OS because at surface level, you can't get access to those features that could really break things if you screwed with them too much. If you really want to muck around with those settings, they are there and ready to be played with through various means (Terminal -- it's a freaking BSD system, Third-Party, and power-user know-how). I would like to respectfully disagree with your statment and say that by default they don't offer the option of defining settings that may cause malfunction, but in OS X they have left almost complete wiggle-room to in fact screw EVERYTHING up; if you know what you're doing. I think it's more genius than anything...
  • by sh!va ( 312105 ) on Wednesday May 21, 2003 @01:11AM (#6004873)
    This is where an open source system such as linux excels... it does so because alot of the same code that goes into making those critical platforms goes into the main stream releases, thus carrying over to the average user at home. This is a big part of why linux is so stable even on desktops.
    What is the point you're trying to make? Are you arguing that on Windows platforms the base OSs are different between what a programmer codes on and what s/he uses at home? Whichever way you argue, how does it add to the discussion about why computers crash?

    You could argue (and you later do) that linux/opensource code in general might tend to be less buggy because there are more number of people looking at it. That is what Eric Raymond believes (".. all bugs are shallow given enough eyeballs"). This is not quite true as researchers around the world have shown. There are lots of shallow bugs, but they get weeded out really fast in opensource or closed source software - perhaps faster in open source. However the ones causing crasher are often deeper and these do _not_ get weeded out until some expert goes after it or some new technology comes around to make it shallow.

    Open source development is free... it has no pressure to release final versions, no pressure to release features until they are stable
    You make in interesting point. Open source moves the pressure from the developers to the users (who now have to keep up with the constant fixes and patches and recompiles and what-nots). Is this better than catering to a really dumb user whilst keeping the pressure on the developer? I'm not sure. At the same time you do have a completely valid point of there being external "deadline" pressure on some of the closed software shops (Microsoft comes to mind) that often pushes out substandard code.

    So the questions _really_ is - can you make better software (no, it ain't going to happen, ever. Get real), or can you _accept_ bad softare as a reality and design systems to work around it. This is probably _very_ hard to do for user-facing GUI kind of stuff (which is the stuff people complain about when things "crash"). But research is underway to help other daemon type programs recover better in the face of a crash [ Slashdot [slashdot.org]]
  • Re:Touchy subject (Score:3, Insightful)

    by Zoarre ( 1487 ) on Wednesday May 21, 2003 @01:21AM (#6004906)
    I remmeber years ago having a conversation with an IT manager at IBM. We were talking about the inability of computer programmers to make their code foolproof. His point was that we don't see problems like this with proprietary hardware. When was the last time someone crashed their Super Nintendo?

    The Super Nintendo used a 3Mhz Motorola 65816, the same processor used in an Apple IIgs. I can't find it's transistor count on the web, but it could not have had less than 5000 (the 6502) nor could it have had more then 68,000 (the 68k). Compare this to a modern AMD Athlon 3000+, which has about 54.3 million transistors. The Super Nintendo might be less likely to crash than a PC because there are at least 54 million fewer things to break.

    Also, his claim that you don't find similar problems in modern hardware is incorrect. Just search Google for "intel errata" to see what I mean.

    I bought my Gamecube last week and a copy of Metroid Prime. Ironically, it runs on an IBM PowerPC chip (the IBM branding is right on the box) and it's crashed twice since I've owned it. (I <3 my Gamecube regardless).

    Industry professionals that produce glib, ignorant assertions such as this one might be part of the problem. :D

  • by TheNetAvenger ( 624455 ) on Wednesday May 21, 2003 @02:12AM (#6005072)
    Why is it that people are always using Windows98/ME that was basically written in 1997 and 1999 and then compare it to their *nix installations that are the current versions and running the latest *nix patches?

    If people want to compare MS and Windows to their *nux, at least use WindowsXP as the base line.

    It would be just as silly to compare WindowsXP to a 1997 version of any *nix out there.

    Or if you are going to use an 'old' MS OS, at least base it on WindowsNT4.0 which is at least in the same class line as *nix. Our clients have had high usage NT4.0 installations run for years without failures.

    Windows9x is a grand extension of the DOS architecture, NT on the other hand is just a completely different ball game by design.
  • Crashing... (Score:3, Insightful)

    by JWSmythe ( 446288 ) * <jwsmythe@nospam.jwsmythe.com> on Wednesday May 21, 2003 @02:23AM (#6005118) Homepage Journal
    Crashes are a rather ambiguous topic..

    A lot of computer crashes depend on what you're doing with it.

    The machine I'm working on right now running Win98 or Win2k crashed on a regular basis by itself. I was tempted to blame bad hardware. Under Linux with a similiar workload (OS, GUI, browser, mail client) it never crashes.. That I can blame on the software being run.

    Identical machines with completely MS software behave the same, so it's hard to blame non MS software for the crashing.

    My Compaq iPaq with WinCE would lock up or shut itself off about twice a day under virtually no load and no 3rd party software. (I hadn't really figured out what to do with it yet). I was ready to return it to the store. I opted to call it a part-time paperweight, and "try" Familiar Linux on it.. Hasn't crashed since..

    Well, that's not completely true. I've done some rather silly OS upgrades (hey, lets change all the libraries while it's running, and see what happens), so the crash was user failure.

    But not to make Linux sound perfect, I've crashed machines with poorly written software. I've sent them into huge loops, and had software running that managed to suck up all the memory and hang the machine (a packet sniffer monitoring a 100Mb/s connection). Even my favorite web server, thttpd, had a poorly written beta version once that would upset the server after a couple days of running.

    Is it always the OS? Nope. I've had a set of 10 machines with "generic" memory in them.. After a few years of running, they all began crashing mysteriously about twice a day.. Swapped the memory out for name-brand memory, and the started working perfectly.

    We have a big industrial looking Dell on the network. Memory flaked out in that. Machine was dying about once a month. Swapped that out for a larger quantity of Crucial memory, and no more problems.

    In a computer store I worked in years ago, we bought the cheapest hardware possible. The motherboards didn't come with boxes, and the manuals never made a reference to a manufacturer. Most of the hardware I couldn't even track down a manufacturer name through the vendors. About 1 in 10 parts wouldn't behave properly when we turned it on. About 1 in 30 machines came back for repairs for bad hardware within a few months.

    So, it is really up to everyone involved if the machine will work right. I use Asus motherboards, Crucial memory, and Western Digital hard drives, and rarely have a hardware problem. The last problem I had was a bad IDE cable. There's always something that can fail.

    The software has to run well, and we've very very happy with Slackware's distributions, with Apache and thttpd.

    The biggest problem we have is user software or simple misconfigurations.. What happens when you have a heavy traffic web site, and the web server logs never rotate or get truncated? The drive fills up fast, and you end up with 2Gb logs.

    What happens when you write a program that ends up sucking up all the memory and CPU time? Makes it not run right (I've done it myself a few times. Oops.)

    People constantly bring their home machines in to work for repairs, for various reasons. About half are software misconfigurations (how many 3rd party applications do you really need running at boot time?). The other half, dying hardware.. The CPU fan made noise for 6 months and then stopped making noise, but you let it go? Ya your CPU is burnt. Cheap fans do that faster than most.

    Can they build a crash-proof computer? No. Just like they can't build a crash proof car.. Cars typically crash due to user failure (users including other drivers), or compontent failure (Ford tire blowouts). Not really the car's fault. I had a car in a parking lot crash. A driver missed the highway and broadsided it.

    So, you can strive for perfection, but there are always going to be circumstances that can cause failures, usually attributed to users. (those damned users.).

  • by 4minus0 ( 325645 ) on Wednesday May 21, 2003 @02:28AM (#6005149)
    Free software never ceases to amaze me.

    I have set up countless email servers, firewalls, spam catching relays, web servers and dns servers. Some clients want Red Hat, others are more up on the game and have heard of Debian or Slackware, others could care less. That's beside the point... It's open freaking source, hack it to your needs/liking.
    You wanna know how much I had to pay for the operating system or individual packages of said software? Nada, that's right, zero, zilch, zip.

    It baffles the mind how something that works so well can be free.
    That means alot to a small time contractor like myself.
    I may not have the money or the coding know-how to give back to the community but you can bet your custom kernel that when somebody has a question on Usenet or a web forum about Linux or a particular package that I happen to know about that I help that person like I was being paid to.
    That's the beauty of 99% of the people in this community... I can even say "I have a client who needs X how do I implement this?", and more often than not someone will help me out with the answer or at least point me to the docs that will answer my question. Even knowing good and damn well I'm getting paid to find the answer to that question.
    This is a good thing we have here folks, I would imagine that I've taken far more than I've given back but every chance I get I do give back and I like to think that most users of this crazy thing called Free Software do too. So far that theory has proven itself true. Just a little soapboxing on my part here, sorry for the rambling.
  • by Overly Critical Guy ( 663429 ) on Wednesday May 21, 2003 @02:59AM (#6005271)
    Why do you say it is so insulting?

    Because I am being sarcastic.

    MS is all about money.

    Welcome to capitalism.

    That is all that they have made successfully. Their software is total crap,

    Right there, I can tell the rest of your argument will be pointless, biased drivel. You and I both know their software is not total crap.

    but due to a monopoly, they have been able to push a multi-level monopoly.

    Consumers drive that monopoly. They are doing something right.

    Personally, I abhor their behavior and software, but admire their business sense.

    No, you don't. You hate them because they are successful, they hold a monopoly, and Linux just can't seem to make the break. It frustrates you, so you use "M$."

    MS is about money, and they do it well.

    As any company should.

    So no, M$ is not an insult. It is praise.

    No, it is a pointless term Slashbots use because they think it is in some way vaguely insulting. They don't realize it is not insulting. They just see all their fellow Slashbot brethren using it, and think they are being clever and funny by doing so. Obviously, they are not.

    BTW, I would have to say that if Sun and HP had the business sense that M$ had, they would be 10x their size. Gates has never developed anything new or interesting, just stolen it.

    You and I both know that's not true. In fact, Linux is stealing all of Microsoft's ideas, like the start menu, taskbar, and so forth.

    But he has normally figured out where the future lies and taken advantage of it. Companies like Sun and HP have never really gotten it. It is too bad, as both have good engineers, but horrible business ppl.

    Microsoft employes some of the smartest software developers out there. They can afford to. Think about it.

    Next.
  • by digital photo ( 635872 ) on Wednesday May 21, 2003 @03:26AM (#6005387) Homepage Journal

    I would agree. Properly and well written code will gracefully handle runtime errors.

    Translation: Short of the user fubar'ing the program or data files themselves, the program should handle all user input in a graceful way.

    The problem though is that to do this would require quite a bit of extra work.

    Progammers are caught in a situation of getting something ready for market at a time dictated to them by a department which doesn't understand the underlying issues or saying "Screw it" and making the code solid.

    That only describes one way in which the problem is caused.

    The bigger problem is the attitude people have about computers which allows for this kind of shoddy programming. People are, for the most part, okay and even expectant of their computers to crash at some point in time.

    This in turn makes it okay to release bad code which will be "fixed later".

    I say that whenever we get a crash or a problem, we report it to the company and we post it to our websites and to review sites.

    I say that the users should make it a big fat noticable problem to the companies whenever their software breaks.

    why? because it means that whenever someone who's never used the software before searches on Google for that software or software company's name, they will find page after page of complaints, dissuading them from using the software.

    the flip side is, if the software works, post to your sites and review sites. Give the people and companies who produce good software credit when it is due.

    As users and consumers, we should find ways to encourage the producers and companies to produce solid code.

    Solid stable code shouldn't be the exception to the rule.

  • by bm_luethke ( 253362 ) <`luethkeb' `at' `comcast.net'> on Wednesday May 21, 2003 @03:28AM (#6005392)
    I think you loose alot (and I am sure you will quickly ignore my posts).

    The poster may not be a native english speaker.

    The poster may have my problem (dyslexia, learning disability, etc) and still be quite competant in what they are trying to express.

    So, I can't spell. I have a physical problem (dyslexic - actuall medical diagnosis), I also don't have time to spend putting every post through a spell checker. It still doesn't change the fact of the content of my posts being correct or incorrect. Then again - it is only damaging to you to ignore any wisdom given by someone who doesn't speak english well or has a disability.
  • Re:Human Error (Score:5, Insightful)

    by ojQj ( 657924 ) on Wednesday May 21, 2003 @03:35AM (#6005420)
    Disclaimer: I haven't programmed in Java since my undergrad, but I learned it before C++. I've been programming in C++ professionally for 3 years straight now, not counting internships and class assignments before that.

    I'd rather have an exception than a crash. It gives me more information about what I did wrong. A crash that's not reliably repeatable and only happens in your release version under Windows OT systems with IE 4 installed, is next to impossible to find and fix -- in C++ it's only worse.

    Not only that, but memory management is more than just a nuisance. Just yesterday, I wanted to move some code from one class to another to improve the object-oriented structure of some code which I've taken over from another developer. In that code were a couple of news, and I couldn't find the deletes which matched them. So I asked the original developer. Turns out the deletes were in a base class of the class that I was moving the code to. If I had been programming in Java, this would have been a cut and paste job finished in 30 seconds, plus 15 minutes for testing the change before checking in. In C++, it was 15 minutes trying to find the deletes myself, 15 minutes waiting for the other developer to get to a break point in his work and another 15 minutes assuring myself that the deletes really were called for all cases, and another 15 minutes for testing the change before checking in. That's a factor of 3-4 (depending on if I have something else I can do while waiting) for the C++ program.

    Memory management and other unnecessary tasks which C++ saddles the developer with do make an impact on either development time, program stability, or both. And that is also true for experienced C++ programmers.

    They also make an impact on language learning time, which is not to be underestimated with the number of newbies today, and people moving up from still worse languages like Cobol. In addition, even for an experienced C++ programmer, they make a difference in the time it takes to understand code which was programmed by another programmer.

    I agree with you that there are situations where every language, including C++, is the most appropriate for the problem in question. I just think that C++ is over-used, thus reducing the average stability of modern programs and the average productivity of modern programmers.

  • It's drivers (Score:3, Insightful)

    by jeti ( 105266 ) on Wednesday May 21, 2003 @03:39AM (#6005429)
    Pretty much the only thing that can nowadays crash a
    PC are problems with drivers.

    The core of both Linux and Windows have gotten very
    stable. And normal apps can't kill them. Basically
    only drivers do have that privilege. And those very
    drivers do also have to compensate all the bugs in
    the hardware.

    The only way to get PCs more stable is to kick
    drivers out of the kernel. And the only OS doing this I
    know is QNX. Apart from being known for its stability,
    it also runs fast and with extremely low latencies.
  • the way as we do.. (Score:3, Insightful)

    by tshuma ( 611888 ) on Wednesday May 21, 2003 @04:46AM (#6005623) Journal
    Have you ever heard about a company which is bulding houses without any plans?
    Software companies are growing too fast, and they want to make more and more and more...
    there is no time to make good requirements and no time to make a plan..

    People, and mostly managers, are not "safe thinking".. Thay want everything as fast as possible. This is the reason why software companies need to use software to controll they process.

    But in the other hand, the hardware is looking the same.. i dont remember any C64 which has wrong memory, or motherboard.. it was just good at all! But if I buy a new memory modul to my computer it could be wrong, or it is incompatible with the others!

    So, what I belive, we need to use programs to controll the all software designe process, a program which dont let me go around a problem. But I am sad, because we sould use it since 80's!!!

  • by Alan Partridge ( 516639 ) on Wednesday May 21, 2003 @05:16AM (#6005690) Journal
    who's gonna buy the product? who's gonna install it?
  • by BigBadBri ( 595126 ) on Wednesday May 21, 2003 @05:20AM (#6005697)
    Apple's Human Interface Guidelines are a nice introduction to user-fault tolerance, even if you're developing for other platforms.

    Are we to understand that Apple is good, or that Apple users are particularly stupid?

    Personally, I've never used a Mac for work (I've only dealt with them when setting networks up for others), but the UI has always seemed a few steps ahead of the competition in terms of ease of use, so I'd applaud Apple for taking the time to think of the user and making the interface easy to use.

  • by Anonymous Coward on Wednesday May 21, 2003 @05:46AM (#6005766)
    My take is that there are problems with the process of creating software.

    My first belief is that there are no "user errors" that cause software to crash. Hitting ctrl-s should not cause the word-processor to crash, taking out your thesis.

    Secondly, the software design, develop, test, redesign ... process is far to often ignored. How many software managers out there believe that the sound of keys clicking is indicative of "progress"?

    Third, there are an aweful lot of extremely bad software-designers out there. These are people who fail to understand how to compartmentalize their design. Compartmentalization should always be done with a view to minimizing software complexity and reducing the number of failure cases. A simple elegant well-thought-out design almost always performs better than a complicated over-engineered for speed design.

    Finally, programmers are people - subject to personal problems and limitations in what they can do. I've had the pleasure of working with several very arrogant egotistic project leaders who's projects consitently fail. Nothing pisses people off faster than having their code removed from the CVS and replaced with the bosses' that doesn't work and fails to handle special conditions, esp. when the check-in is done after office hours and the e-mail states "it works, its staying".

    Bottom line: hire good designers, technical writers and programmers. Never hire idiots to save money. Carefully screen project managers, and choose only those ones with success under their belt.

    When I buy software and pay good money for it, failure is not an option.

    -Brett
  • by mendred ( 634647 ) on Wednesday May 21, 2003 @06:04AM (#6005820) Homepage
    C/C++ are languages that were designed to be as low level as possible. Therefore, the language itself is very simplified, meaning that it expects you to take care of every detail.

    Which makes this language very suitable when used by a small team of 10-20 people who know exactly what they are doing. They can design specific components relevant to their project/product and the rest 100-200 ppl can use a higher level language to link these components and build the final product.

    Using C/C++ only in a team of 100-200 ppl is a recipe for disaster. It requires a great deal of discipline and expertise and also a lot of time in such cases. And humans are prone to error after all. And there is that saying too many cooks spoil a broth.

    Also using only a high level language may make ur code stable for limited usage but under heavy load it will fail, and when it does you will run helter skelter wondering where the problem is. But there will be no indication in your code. I don't know if any of you java programmers have ever encountered a out of memory exception thanks to heavy object overhead and torn your hair in despair, but I have and it isn't pleasant.

    And a client isn't interested in excuses. He just says get it to work in the hardware I have. Atleast a crash or a memory leak, can be traced and fixed but this??? We found a workaround eventually but it was a very painful and harrowing process after consulting a lot of documentation and certainly belied java's reputation as a easy language.

    Also remember some faults may be under the hood and will be there till they get fixed- beyond your control, because essentially after all these languages add a layer over the lower level, meaning more complexity. And this complexity will be very generic in nature and may not pertain to your project or your need. In contrast, C/C++ is as low as u get and so you can write components suited for ur needs.

    For eg. in our project, the programmers outside the core team use java , but they will use native calls to some libs the core team prepares. We find that this way Java gives excellent performance. It is an excellent language for program structure and modules but not for coding core components as the overhead involved is significant. Significant allocations and deallocations are not done by them at all, (ya they use new in java but under the hood all allocation and deallocation is taken care of by the component, the java part is more like a wrapper and has a very low memory print so reduced work for the GC) and any module the core team develops goes through vigorous testing before it is handed over.And the others can just drop it in place. Its not as easy as it sounds, but the Boss anyway feels its a nice balance between efficiency and ease. And besides its helpful of ur boss is also a programmer and a member of the core team.:)

    Again if you are a java expert you could probably minimize those overheads without needing to touch C/C++. I am not sure. Also maybe in the future JIT compilers and other stuff may make java come very close to C/C++ in terms of performance,and defacto hardware may become powerful enough to drop C/C++ altogether (for example now nobody uses a 386/486 for serious work, but here we even had problems on a P4 1 ghz having 256 mb ram, ok may not be bleeding edge but can't call it obsolete). But till then this is the model we will use. Also if we require cross platform independence, only the core libraries need to be ported. Right now the linux port is underway.

    What I am trying to say is everything should be viewed in shades of gray. There is a place for everything and there is a reason for everything to exist. For example, my brother for his phd is using java to run some scientific calculations heavy number crunching stuff because it is easy to code and u don't have to worry about anything other than the logic. Plus his university has given him a dual processor P4 2.5ghz with 1GB ram just for that :))(oh it also has a radeon 9700 drool:). Bu
  • by Afrosheen ( 42464 ) on Wednesday May 21, 2003 @06:21AM (#6005880)
    Considering that Apple's original (and perhaps enduring) core market were 'creative types', I'd say they were shooting for brilliant people that didn't know shit about computers. They originally established those guidelines so companies coding software would adhere to a standard and everything would feel right.

    Consider Adobe, for example. You open an old or new version of photoshop on macintosh..it looks and feels the same. Everything is always in the same place on a mac. File, Edit, Bla bla bla it's always in the same order regardless of the version, regardless of the app. It's called 'genius' from a user's standpoint.

    When you can take a drooling noob and turn him into a productive photo retoucher in one week, I attribute that more to apple and adobe than anything. Trust me, I had to train a few dozen people from various backgrounds and everyone became a ninja eventually.
  • by Bert64 ( 520050 ) <bert AT slashdot DOT firenzee DOT com> on Wednesday May 21, 2003 @06:23AM (#6005887) Homepage
    People are so used to unstable computers nowadays, a crash is considered normal.. people EXPECT computers to crash, and couldnt imagine one that doesnt.
    This means that unstable software sells just as well as stable software, but is much cheaper to produce since you dont need to test it so thoroughly. Now any commercial vendor will realise they can save a lot of money while only very slightly damaging their sales, the money they save on testing more than makes up for the lost sales so they just continue writing buggy software.
    If the average computer user would boycott products for being unstable, and stand up and say "this really isn't good enough", and it seriously hurt software sales, then something would swiftly be done about it.
  • Re:Touchy subject (Score:2, Insightful)

    by Bluelive ( 608914 ) on Wednesday May 21, 2003 @06:28AM (#6005901)
    This could ofcourse have been a single bit that failed on the medium you got your copy on.
  • by dipipanone ( 570849 ) on Wednesday May 21, 2003 @06:30AM (#6005912)
    You and I both know that's not true. In fact, Linux is stealing all of Microsoft's ideas, like the start menu, taskbar, and so forth.

    Damn. And there I was, believing for all these years that Microsoft had stolen these ideas from Apple. Who in turn owed Xerox no small debt. Thanks for the correction though, Overly Critical Guy. I won't believe the hype in future.

    Microsoft employes some of the smartest software developers out there.

    And if only they'd set them to working on Outlook, I might today be using a mail client that was more stable and more efficient than the copy of Eudora Light I was using ten years ago.
  • McDonald's Ware (Score:1, Insightful)

    by Anonymous Coward on Wednesday May 21, 2003 @07:54AM (#6006153)
    McD's doesn't make food that's good for you. But they make lots of it and they profit by it.

    M$ doesn't thrive on quality software, just software with lots of bells and whistles, marketed well to the masses. And they profit by it.

    Unfortunately, MickeySoft sets the BAR.

    It IS possible the write qality, bug-free, software that doesn't crash. Just nobody has found a way to profit by it.
  • by tychoS ( 200282 ) on Wednesday May 21, 2003 @08:12AM (#6006207)
    For close to a decade I have worked as a software developer for various companies, and in the course of that period I have read quite a few books and papers on software project management, process and the like, as well as participated in conferences and study groups on the topic. Both theorethical and anecdotical evidence points towards the way we organise software development to be the main limiter to quality and creativity.


    In most software companies you get promoted for political aptitude with little or no regard to yoru knowledge of how to create software and just as important how to organise software development teams well and how to get a mutually benefitical relationship with the clients during and after the project.


    Such people tend to beleive urban legends such as in bygone days, in a country far from here, there was a software project that used the waterfall process and finished on time, within budget and with a happy customer.


    They do this despite the reasons why waterfall processes leads to nowhere pleasent having been throughly documented in everything from scholary texts on organisational theory to excessive numbers of first person narrated horror stories. And who can blame them. They got promoted to middle or upper management, not because they knew a thing about organising software projects, but because they were better politicans than the next guy, so it would not further their carear if they were to sit down and read their first book on software project management throry.

  • by ChrisPaget ( 229422 ) on Wednesday May 21, 2003 @08:20AM (#6006249)
    Windows 2000 Server, SP3. Up for 55 days, 15 hours, 53 minutes. And that's only because I moved into my flat 55 days, 17 hours ago :) In that time it's been used extensively for C / C++ development, plenty of Quake 3, CD burning, watching DVDs, Kazaa, you name it. And it also serves my website (half a million hits over the 55 days), email, internal DNS, DHCP and file server. It's transferred over 150Gb of data to either the internet or LAN, and has never crashed. Who says Windows 2000 isn't stable? I don't even need to reboot when I install patches - restarting services to trigger the updates is relatively easy on Win2K if you know your services well.

    Windows in general cops a LOT of shit for instability that it really doesn't deserve. Before you criticise Windows for being unstable, I suggest you try debugging a crashdump - 99.9% of the time it's caused by a third-party driver. Cheap sound card? Old graphics driver? Hell, maybe even you've not installed the 4in1 driver for that Via IDE controller on your motherboard? Drivers are the single biggest source of crashes and reboots in Win2K. If you want a stable system, spend some money on your hardware, and get it from a company that provides decent drivers.

    Admittedly, that's the reason why *nix is generally perceived as more stable than Windows - if a driver is bad in Windows, you're screwed. If a Linux driver is bad, you can fix it, recompile the source, and bye bye instability.

    Don't blame Microsoft for instability. Blame the third-party hardware vendors who can't be bothered to spend the time and money properly debugging their drivers.
  • by Anonymous Coward on Wednesday May 21, 2003 @08:35AM (#6006322)
    Hm. Yes, but when you're coding, you're dealing with the task in hand, which is never to write code that handles all possibilities of user interaction. Ultimately, it's testing that should find the bugs.

    Yes, there are bad coders, or coders who are simply over-worked and managers who want everything yesterday. But if enough testing is planned into the project, then there will ultimately be less bugs.

    All software has bugs. That's inevitable. It's minimising the number of bugs and their impact that's important, and that, ultimately, is best done when testing.
  • by smallpaul ( 65919 ) <paul@@@prescod...net> on Wednesday May 21, 2003 @08:48AM (#6006404)

    It always bugs me at how quick people are to blame the problem for crappy coding on the language. This would be tantamount to a carpenter saying, "if my hammers weren't so damned versatile I could build a higher quality product and not break my thumb open."

    No, you've completely mischaracterized the argument. Actually, the argument is: "People keep using a wrench as a hammer. Yes, you can do it but it isn't efficient and it isn't safe." C++ is not a good application programming language but that is what is is most often used for. It is excellent as a component, operating system or runtime programming language.

  • by Carbon Unit 549 ( 325547 ) on Wednesday May 21, 2003 @08:50AM (#6006412) Homepage
    When was the last time someone crashed their Super Nintendo?

    Actually, my game cube has crashed on several occasions with SSX tricky and other games.
  • by c4seyj0nes ( 669515 ) on Wednesday May 21, 2003 @08:54AM (#6006440)
    In my opinion, something the user does should never cause a program or operating system crash. If this can occur, it is the developer who is at fault, not the user C:\WINDOWS> del *.*
  • by glatiak ( 617813 ) on Wednesday May 21, 2003 @09:15AM (#6006562)
    Twentyfive years ago I worked for a database and application vendor doing internals (Amcor in case anyone cares). Filtering for correct input and preventing long scale logical errors was a major fetish. Much of this was not difficult, just a group agreement to use library routines for all user interaction that had input validation and condition handling. Programs were built from shells that had standard condition handling embedded -- you added custom branches as needed. What made the whole approach successful was an agreement on standards of program behavior and a willingness to share common code. Errors like the ever popular buffer overflow just didnt happen because moves into buffers checked first, etc. The move to RISC processor architecture attenuated synchronous error handling, to be sure. But in the large, it is the obsession that in IT, experience is a handicap (just ask any recruiter about experience that is not 110% matched to what they want NOW) -- so junior programmer mistakes become institutionalized. The budget is a convenient excuse, but I think the real root is the inexperienced lack of appreciation for what matters.
  • by yandros ( 38911 ) on Wednesday May 21, 2003 @10:12AM (#6006998) Homepage
    People continually ask why so many Mac OS X updates require rebooting. The answer is very simple: it's effective, and most people don't care.

    Sure, they could write complicated update scripts that `know' both which applications/subsystems are affected and how to safely restart/reset those, but this would be more complicated, more effort, and less likely to be correct. Most people are used to the idea that they need to reboot (thanks, MS), so why not?

    FWIW, Mac OS X Server doesn't require you to reboot all the time. This, I claim, is not an accident.
  • Of course, there's no need to mention Microsoft's inability to create a stable system.

    You know, my win2k machine -- the one that has been up since our last power outtage, and had been up since the power outtage before that -- has never crashed. It might be because I don't overclock it, used a retail processor, Intel networking, four fans, whatever. But it has not crashed or needed a reboot since I installed Jetico BestCrypt last year, March or something. I use it every day, have played pretty hardware intensive games on it, and even used it as a server.

    I think the problem here isn't with Microsoft and their inability to write a stable OS. If it is stable anywhere, that means the kernel isn't leaking ram or occasionally polling hardware that doesn't exist. The problem therefore lies with Microsoft's inherent trust that driver manufacturers and software engineers will handle their own damn errors. Linux doesn't do that. The kernel is so "low" that it recovers from just about everything. The software on top of it, that's another story. Many of the applications I've used in Linux crash after a single parsing error, bringing down anything reliant on them. Tell me you've never had an X server crash on you, taking down your entire GUI. To the average user, who isn't running a bunch of services or daemons, losing the GUI is the same thing as crashing. So what if bringing it back up is faster than rebooting the machine -- it's also more complex to support.

    Besides, hardly anybody buys a Windows installation because they wanted a more stable system. They bought it because they wanted cooler toys and a snappy GUI. People "buy" Linux, BSD, et al. for stability.
  • by Cookeisparanoid ( 178680 ) on Wednesday May 21, 2003 @10:38AM (#6007191) Homepage
    Its a well know HCI concept that people learn by trail and error so its really a design flaw in the program if user error causes a crash
  • by gurps_npc ( 621217 ) on Wednesday May 21, 2003 @11:18AM (#6007489) Homepage
    Anyone can write code for a computer.

    In order to be flexible enough to do everythign a computer can do, computer languages have to be allowed to crash the computer. Otherwise you are severly limiting what they can do and slowing thigns down.

    Most computer crashes are caused by an INTERACTION of two pieces of code that did not know about each other and were never tested.

    If you want a system that never crashes than all you have to do is:

    1) accept a restricted operating system that will never be able to compete with a commercial system like Windows.

    2) Never install a program that was not A) created by the same company/group that wrote your operating sytem, B) specifically designed for your particular computer, and C) designed to be used with and thoroghly tested against all the other software that is currently installed on your PC>

    That is what companies do when they make non-pc computer equiptment (cars have tiny computers) and is the reason why such things do not crash.

  • oh come on... (Score:2, Insightful)

    by spazoid12 ( 525450 ) on Wednesday May 21, 2003 @11:57AM (#6007767)
    Why Do Computers Still Crash?

    So, he's used computers for 30 years, apparently not programmed them.

    Other important questions:
    Why do computers still cost money?
    Why do computers still require power?
    Why can't computers yet read my mind?
    Why don't computers smell pretty?

    Well, back to the crashing... it's all my fault. I'm sorry. I won't do it again.
  • by Anonymous Coward on Wednesday May 21, 2003 @12:13PM (#6007921)
    Yes I have 30,000 lines of code and not one crash in over a year. It is multi-user and multi-threaded and is used ALL DAY everyday at my office. I will not say that it has never had any issues but it has never crashed. Most of the issues seem to have to do with the DDE link to our contact managment program. Sometimes they seem to stop talking to each other.
    The truth is this is more about the OS then apps. An app should not take down a computer if it does then the OS has issues.
    I will say that a lot of OS crashes and app crashes has to do with shoddy coding.
  • Re:True but... (Score:1, Insightful)

    by Anonymous Coward on Wednesday May 21, 2003 @12:27PM (#6008038)
    Software developers, for the most part, want to write the most solid code possible. However, the sad truth is that the software development life cycle is driven by the 3 M's. Money, Management, and Marketing. They dictate who, how much, what, and when the software goes to market. How many hallway conversations have I had with one of the 3 M's asking me what features I can include by a certain date. And I say, "none because there's not much time to test it." Then the hammer comes down and I'm told to include it anyway and "we'll fix it in a patch release." And this is just to say that the feature is included. Most developers would like to spend a little more time designing software and making it as solid as possible, but there are always exceptions, incompatibilities in hardware and the like that make software development difficult. If you look at the most solid systems, (i.e. military, government, critical systems that protect life) they run in a extremely confined environment, where there are very little or no hardware variations. They also, take years and a small army to develop. Most of us "normal" developers don't have those kind of resources or time available to us.
  • by Overly Critical Guy ( 663429 ) on Wednesday May 21, 2003 @02:29PM (#6009218)
    Damn. And there I was, believing for all these years that Microsoft had stolen these ideas from Apple.

    When did Apple have a taskbar and a Start menu? And when did I say Microsoft didn't take ideas from Apple? You won't address my argument so will instead shift the focus onto something else. It is amusing, and Slashbot moderators are falling over themselves to mod it up because their precious Linux is under attack.

    And if only they'd set them to working on Outlook, I might today be using a mail client that was more stable and more efficient than the copy of Eudora Light I was using ten years ago.

    Here's hoping I someday find a compiler and IDE that is better than Visual Studio.

    Next.
  • by Overly Critical Guy ( 663429 ) on Wednesday May 21, 2003 @02:33PM (#6009270)
    Actually it is the whole Monploy thing that is causeing so many program crashes and security issues.

    No, it's not.

    The fact that Microsoft is the most widely used software in the world is the fact that there are "so many program crashes and security issues." Any piece of software will be exploited if 90% of the computing world is using and abusing it.

    By the way, do you subscribe to Bugtraq? If you did, you'd see Linux has more issues reported per month than Windows. But those are the kinds of facts you won't see posted anywhere here at Slashdot. Heck, I still remember that major filesystem bug that came out in the stable release of the Linux kernel. It was hilarious. Didn't anybody test it before releasing?
  • by VinceTronics ( 675022 ) on Wednesday May 21, 2003 @03:58PM (#6010072) Homepage
    MIT Tech Review's July 2002 cover story was titled Why Is Software So Bad?" [techreview.com] (registration required to read whole article). The article makes the point that because there is no liability to the makers of faulty systems, there isn't any real incentive to build systems that never crash.

    What if we could bill the HW, OS, and apps vendors for our lost time due to crashes? I'm sure systems would improve in a hurry!

    What's needed is legislation making vendors liable for losses due to faulty computer systems. Remember, carmakers cared more about styling than safety until Ralph Nader's book Unsafe at Any Speed alerted the industry and consumers to the need for things like safety belts. Now we have federal safety standards for automobiles.

    I'm sure the libertarian-leaning tech community will freak out as soon as they read this. But "self-regulation" will only take the computer industry so far towards total reliability. As computer systems govern more aspects of our modern lives, government regulation seems inevitable in my view.

Lots of folks confuse bad management with destiny. -- Frank Hubbard

Working...