Why Do Computers Still Crash? 1533
geoff lane asks: "I've used computers for about 30 years and over that time their hardware reliability has improved (but not that much), but their software reliability has remained largely unchanged. Sometimes a company gets it right -- my Psion 3a has never crashed despite being switched on and in use for over five years, but my shiny new Zaurus crashed within a month of purchase (a hard reset losing all data was required to get it running again). Of course, there's no need to mention Microsoft's inability to create a stable system. So, why are modern operating systems still unable to deal with and recover from problems? Is the need for speed preventing the use of reliable software design techniques? Or is modern software just so complex that there is always another unexpected interaction that's not understood and not planned for? Are we using the wrong tools (such as C) which do not provide the facilities necessary to write safe software?" If we were to make computer crashes a thing of the past, what would we have to do, both in our software and in our operating systems, to make this come to pass?
crashes? (Score:4, Interesting)
On the workstation side they are definitely not THAT stable, but since we've switched to XP/2K on the PC side, those pc's regularly get 60+ days of uptime. Just as a note--I had a XP computer the other day that would crash about two or three times a day. The guy that was using it kept yelling about microsoft, etc etc etc. Turned out to be bad ram. After switching in new ram it's currently at 40 days uptime (not a single crash).
For some reason the macs we have get turned off every night so their uptime isn't an issue, but from what I hear OSX is quite stable.
Touchy subject (Score:5, Interesting)
Scientific American... (Score:5, Interesting)
They also propose that all computer systems should have an "undo" feature built in to allow harmful changes (either due to mistakes or malice) to be easily undone...
New features are more important than stability (Score:3, Interesting)
People upgrade for new features. That computer/OS/gizmo you have today does a lot more than the one from 10 years ago. That's a lot more code that needs to be written, and thus a lot more opportunity for errors. It's that simple.
(I'm actually ok with that. I'd rather have a moderately crashy Windows XP box capable of playing GTA:Vice City than the hypothetical alternative: a super-stable Windows95, capable only of playing "Doom 2".)
Re:Simple ... (Score:5, Interesting)
Re:OS X (Score:1, Interesting)
The ultimate solution (Score:5, Interesting)
Of course, you'd need to make sure the algorithms that humans wrote aren't flawed themselves, but once you got that pinned down, you would be more or less home-free.
Even if you didn't take this drastic a step, another solution would be computer-aided software burn-in. Let the computer test the software for bugs. A super-QA Analysis if you will. Log complete program traces for every trial run, and let the machine put the software through every input/output possiblity.
Re:Because they are complex systems (Score:1, Interesting)
Well, maybe never, but...
My calc professor once put a question on an exam that was designed to crash a TI-89.
Re:Not to point out the obvious, but (Score:2, Interesting)
Mandate memory checking tools (Score:5, Interesting)
Most errors of this can be found with testing under tools like valgrind [kde.org] or Rational's purify [rational.com]. I'm sure there are others (I've heard of ParaSoft Insure++, ATOM Third Degree, CodeGaurd, and ZeroFault), but the quality of these tools really matters.
The issue is that tiny errors can cause crashes intermittently, and not immediately. For example:
uninitialized memory reads -- usually not a problem, but if this value is ever actually used, it will be.
array bounds reads -- never acceptable, but depending on the structure of memory, may not always cause an immediate crash.
array bounds writes -- like ABRs, may not be immediately fatal, but these are going to crash your code sooner or later.
Since they don't always cause an immediate crash, these errors are likely to creep in to released code without use of one of these tools. And if you want to know why we shouldn't always run programs in an environment that checks these kinds of things, try it once; you'll notice a speed hit of usually an order of magnitude. C/C++ is a perfectly acceptable language -- not all debugging has to be done by the compiler/interpreter or only after you notice a problem.
Anyway, hope that wasn't too pedantic....
We've got a lot of techniques in the gaming world (Score:3, Interesting)
In the GameCube, crashes are alleviated by having only a thin OS layer between the hardware and the game, and restricting only a single task to be run in a single privilege level of the CPU, avoiding context switches and going back and forth between user and kernel mode which introduces complexity and can wreak havoc if malicious data is present.
Furthermore, we have a set hardware configuration, running a well defined consistent set of drivers, which are again, minimal, and this eliminates another factor that often leads to crashes in the PC world.
The most important thing though is robust software design. In our games, we all code exception handlers for the software, so that a single errant NULL pointer doesn't bring the whole thing down with a "Segmentation fault" message as PC users seem to experience with their software, but rather, we gracefully recover, perhaps immediately rolling back to the previous iteration in the game loop and "moving" the player a bit, for instance, in a FPS where the player might have entered into an area in a orientation that happens to create a divide by zero error due to numerical imprecision.
In the future with CPU and memory speeds increasing, we are investigating new designs, such as microkernel based architectures where individual game entities are separate protected "processes" that communicate via some fast IPC mechanism such as shared memory or a "tuplespace", so that a bug in one entity doesn't bring the whole universe crashing to a halt, and I hope that such techniques are adopted by the general computing world.
Re:And (Score:1, Interesting)
People who lack with facility with English generally write shitty interfaces that people loathe using, even if the code is "clean".
Re:The ultimate solution (Score:3, Interesting)
Unfortunately, this only works if you can distinguish between buggy and non-buggy code produced by the algorithm. You can do tests, but no test suite will be exhaustive (otherwise we'd just use it on human-developed code to find the bugs).
Perfect software can only be produced if a formal proof of correctness is possible. Even then, you're limited by the assumptions the proof makes.
Re:Microsoft (Score:5, Interesting)
Thoughts on why *nix is stable (Score:2, Interesting)
I'm going to claim that the prime reason systems with GUIs (and I'm including everyone) are unstable is because noone has come up with a rock solid base for such a system. X is not solid, windows explorer, mac os x's application manager, no one has it right.
The one thing I am leaving out, is that drivers also tend to be a major cause of instability. I cannot run the nvidia driver on my gentoo box, certain usb events can bring a system to a screetching halt. What needs to happen is better design around the unstable interfaces, such that in the worst case scenario, things can still be recovered.
Re:The ultimate solution (Score:5, Interesting)
That only works if you can write a fiteness algorithm that can tell whether the program did the correct thing or not -- otherwise, you have no way to decide what to "breed" and what to throw away. And for many types of program, that fitness algorithm would be more difficult to write than the program you are trying to auto-generate...
Of course, you'd need to make sure the algorithms that humans wrote aren't flawed themselves, but once you got that pinned down, you would be more or less home-free.
All you've done is replace a hard problem ("write a program that does X") with a harder problem ("write a program that teaches a computer to write a program that does X"). No dice.
Even if you didn't take this drastic a step, another solution would be computer-aided software burn-in. Let the computer test the software for bugs. A super-QA Analysis if you will. Log complete program traces for every trial run, and let the machine put the software through every input/output possiblity.
For most modern programs, there isn't nearly enough time left before the heat-death of the universe to do this. Hell, for programs other than simple batch-processors, the number of possible input and outputs is infinite (since the program can do an arbitrary number of actions before the user quits it)
Don't single out Microsoft (Score:3, Interesting)
My Windows XP box, which is my fileserver, has been up for 5 months so far.
My OS X box, which I use for web browsing and word processing, crashes about once every three days.
Now, I certainly have some bones to pick with Microsoft, but Apple is no better.
According to complexity theory... (Score:2, Interesting)
Many softwares have been evolving so fast that there's been no time to perfect the existing features before adding new ones. At some point in the lifetime of indivisual software, it reaches a point where it's somewhat "stable" in the sense that no more major features are needed. For example, TeX reached its relative maturity during 80s and IIRC, there's no known bug at this point.
If all softwares are given enough time, they will all reach that kind of maturity. The problem is not all of them can survive that long - usually they become obsolete before they become stable...
Obligatory OpenZaurus plug (Score:3, Interesting)
Use OpenZaurus [openzaurus.org] and while crashes still appear (I assume 3.2 will eventually, though I haven't had a full crash since it first came out), crashes will not lose all your data, since it's written to flash.
Also, my Linux box hasn't crashed this year, and I can't recall any crashes last ye-- no, wait, there was one slew, but it was an icky driver which I got rid of. I'd say a pretty good track record for a system built almost entirely from CVS.
Can't remember any crashes this year or last on any other Linux boxes I manage that I can think of (8 boxes off the top of my head).
Turing showed this (Score:4, Interesting)
Accept it. It's a fact of nature.
all systems crash, not just MS (Score:5, Interesting)
Time is Money. (Score:5, Interesting)
A couple of months ago, the company I worked for spent a lot of time and effort developing a robust testing methodology. We had a software product that through blood sweat and tears would not crash unless you basically blasted the hardware in some way.
But that led to two problems. First, we only had so many people working, and resources spent testing and bugfixing were not being used to add new features. Second, the time it took to get it that robust delayed the product's release beyond the point where we could recover the investment. [Time developing] * [Cost of operating] was greater than [expected number of units sold] * [price per unit].
What ended up happening was that we lacked the features to justify the price and number of units we needed to sell to cover the cost of developing it. We had no bugs -- and we could be certain of it -- that would crash the machine.
As of last month, the company could no longer afford to pay me. I'm not there any more.
The moral of the story is that trying to make a bug-free product will bankrupt your company, especially a startup. Software tools have improved, but the benefit largely goes towards adding new whiz-bang features that sell the product for more money, not to being able to fix more bugs.
What we should do as engineers and managers of software products is to not be afraid of getting the product out the door with a few bugs in it if we want our company to do well; this business reality is ultimately why bugs will a big part of software for the forseeable future.
Re:Because it doesn't matter to you! (Score:5, Interesting)
Is there a time where the development methods and quality control learned from these large, mission-critical projects will find their way to the consumer product market? If not, why?
What are you smoking? (Score:5, Interesting)
The Ti PowerBook G4 I am writing this post on is running Mac OS X 10.2.x. It goes in an out of sleep on an irregular basis, and not always when it is idle. I swap PCMCIA cards in and out. It hops from network to network. I do a lot more than browsing and word processing.
According to my Konfabulator uptime widget, I have 83 days, 23 hours, 20 minutes. My load average at the moment is 1.7. It has not been rebooted since I installed OS X (I did it myself after buying it just for messing around purposes).
You sir are either lying, have bad hardware, or you've severely corrupted your installation. This operating system (which is BSD) is solid as a rock.
Re:Computers don't crash (Score:5, Interesting)
The current issue of Scientific American states that 51% of crashes are due to user error. 15%=software error. 34%=hardware error. Refer to article for further info.
You made a little "user error" there yourself-- the article says that 34%=software error and 15%=hardware error.
Oh, and those figures are just for Web applications, not software applications in general.
It's an interesting article. Unfortunately, they're not very clear about what constitutes a "user error." I've filled out Web forms that gave me an "error" when I included hyphens in my phone number or credit card number. That's far from an error, it's just poor user interface design.
In my opinion, something the user does should never cause a program or operating system crash. If this can occur, it is the developer who is at fault, not the user.
Apple's Human Interface Guidelines are a nice introduction to user-fault tolerance, even if you're developing for other platforms.
Don't get me started. (Score:5, Interesting)
My pet explanation is that computer code is in many ways like legal code, with computers playing the part of honest criminals. They follow the law exactly, and walk through loopholes without even thinking about it.
So you patch the loophole. This, oddly enough seems to make the code bigger. To further contribute to the code bloat, at different times different legislators/developers have different opinions about what the goal of the code is, and different areas which they own. Small patches are thought of as being safer, but interactions with other bits lead to... more loopholes. This is also why both Windows 2000 and the US tax code seem to take up a lot of storage space.
This then makes clear the value in refactoring, not that I really expect the tax code to be replaced with something sane anytime soon. Following this line of reasoning we also see why carefull encapsulation is so important - so that one can rewrite one module of the system without affecting others.
Advanced language features such as garbage collection and strong typing don't eliminate bugs. However, they do eliminate certain classes of bugs (segmentation faults) and so reduce the bookeeping required to produce bug-free code. Since this class of bugs is one of the most expident for hackers to take advantage of, there are also disproportinate security benefits to using safe languages.
On a final note, testing is a defense against bugs, but I believe that testing, especially black box integration testing, should be a final defense that is rarely sees action. If the developers can't find all but a few of the bugs with their own testing, the developers have lost their perspective on how the code works. If a separate QA team finds 1000 bugs, in my experience, the development process has failed and the system will always seem buggy.
Re:Speed (Score:4, Interesting)
Well said, I would have to agree with the majority of your post. The only thing I have an issue with is:
It took you over a decade? I've been working in "light" IT for about 4 months, and I already have come to this unfortunate conclusion. Writing commercial software just isn't fun, not only do you have to write software that you may not find all that interesting, but you also are denied the opportunity to use your skills to the fullest and create something that you are truly proud of. Corners are cut, and in the end, you realize, that it's just a "product", or an in-house "app", it only needs to work "good enough", nevermind if the code needs cleaning up or whatever other issues there are (they don't (seem to) exist if you're not staring at the code!).
Third-generation languages. (Score:2, Interesting)
because almost all software is still developed using third-generation
("high-level") languages. These languages place on the programmer
the burden of such fiddly details as allocating and freeing memory
and checking the size of allocated memory to see that it's adequate
for the data being copied in.
*Most* of the time when an application crashes seemingly at random,
it's a memory allocation problem of one kind or another: a buffer
that was allocated to small and gets overrun, or a pointer error,
or something of that nature. When an application (or your whole
system) grows more sluggish the longer you leave it running, that's
usually a memory leak: something was allocated and not released
properly -- repeatedly. All of these problems result from a lack
of excruciating vigilence on the part of the programmers when using
a language that requires it. In a large project, maintaining that
ceaseless caution is a nightmarish prospect.
Languages (both interpreted and compiled languages) have been around
for over a decade that handle these things, freeing the programmer
to concentrate on developing the more high-level features of the
software, but because this checking imposes some overhead (in terms
mostly of CPU time and sometimes some memory footprint), they don't
get used for most applications. Yet.
The time is coming, though. The value of VHLLs is beginning to be
recognised, *finally*. When software is written in a language with
built-in memory management, problems like segmentation faults (core
dumps in Unix; in the Windows world these are known as Illegal
Operations, formerly known as General Protection Faults) and buffer
overruns go away entirely.
Add proper garbage collection (not reference counting like Perl5
does, but real gc, which I hope we will get in Perl6), and you
also dispense with memory leaks once and for all.
It's coming. Applications are *beginning* to be developed in this
next generation of languages, but it takes time, because all the
existing apps are mostly C and C++, and you have to throw them out
and start over, which nobody wants to do for obvious reasons.
There will of course always be room for a certain amount of
inherently low-level code written in C or one of its kin: code
that absolutely can't spare a nanosecond per run, code that has
to run on the bare metal (kernels, bootloaders,
needed to bootstrap the VHLL tools (compilers and whatnot). But
when C is no more common than assembly language is today, then
you'll be done with random crashes.
Applications will of course still have bugs -- circumstances
wherein they don't perform as they ought. And you'll still have
hangs, because nobody's figured out how to design a compiler or
interpreter that can detect an infinite loop, and nobody except
Mel[1] has coded up an implementation for completing an infinite
loop and passing on to what follows. Perhaps quantum computing
will one day change this, but that's outside of the forseeable
future. But crashes of the sort where the app suddenly terminates
should be mostly a thing of the past within twenty years, ten if
we're quite lucky.
[1] Google for "The Story of Mel, A Real Programmer".
AppleWorks never crashed (Score:3, Interesting)
I'm willing to concede that the codebase was considerably smaller. It had to be, in order to produce an executable that would fit in 800K (the size of a 3.5" double-density floppy) and would run reasonably well on a 1-MHz 8-bit processor with as little as 128K of RAM...but I don't find myself doing sufficiently more advanced stuff in Word or Excel than I used to do in AppleWorks (actually, AppleWorks was probably doing more sophisticated stuff with UltraMacros added to it). I would be willing to wager that 95% of Office users use no more than 5-10% of its features. All that extra code that keeps getting added in with every new release means there's that much less time spent making sure the core functionality (and all of the chrome added in previous releases) is bug-free.
(I'll admit that I haven't had much trouble with Office...but then you've noticed that I don't push it particularly hard either.)
Software still crahses... (Score:3, Interesting)
With something like Windows XP, no amount of testing could eliminate every conceivable bug, but there is no doubt in my mind that Microsoft, along with almost every other software company in the world, rushes poorly designed, inadequately tested products to market to meet customer demand.
Remember, a product's success is due largely to a check list of features created by the marketing people. A product with 90% reliability and 100 features will sell better than a product with 98% reliability and 10 features. Otherwise, how can you explain the success of Microsoft Office? OK, bad example, MS Office is successful because it's been bundled with so much hardware, but you see my point.
The bottom line is computers are now a commodity. They have become so ubiquitous and cheap that I can go down to the Salvation Army and purchase what would have been considered a supercomputer 10 years ago, for $50. Software is quikly reaching the same state. How much software can you buy for $10 or less? A lot. And not all of it is bad, though most is. On the other hand, you can drop hundreds or thousands of dollars on software that is just as quirky, hard to use and even just as buggy.
Here's the thing that always interested me. Why don't console games crash? I'm sure they do sometimes, but I've got a Dreamcast and about 50 games. I've seen a small bug here and there, but I've never seen the machine blue-screen or whatever DC's do when the OS lunches itself. I realize that the standardized hardware platform has a lot to do with it, but games are every bit as complex as other software, perhaps more so. So why don't these games crash? Well, if they did, they would never sell. I'm sure there would be
Kinda makes me wonder...
Re:For those who are willing to pay... (Score:5, Interesting)
This reminds me of a story I read in the internal magazine of a telecomunications equipment supplier that I used to work for. It was about an international toll switch somewhere in the U.K. that had been up for 17 years (or something extreme like that.) Furthermore, this included having all of its hardware upgraded and replaced. Twice.
Just stop and think about that for a while in PC terms... "I replaced my motherboard with the power on without rebooting my system, while it was serving 10,000 web pages a second."
Granted, this is a higher level of hardware with full redundancy, but it still boggles my mind.
Re:Simple, yes, for other reasons (Score:4, Interesting)
Software is incomprehensibly fragile -- any single thing can cause a crash, taking the whole system or application down. And even those critical parts of things like airplanes have multiple redundancies, something that's hard to build into software. You can do things like catching exceptions, but you typically can't recover as gracefully as if there was never a problem at all.
The shuttle is actually not a bad analogy -- it's also very fragile due to the stresses it endures. And we've effectively had two crashes in 100 runs. Most software is more stable than that.
"Stability" is in the eye of the admin (Score:2, Interesting)
-In our server room, which, admittedly, is a little crowded, a Windows 95 box was disconnected from the network but accidently left running. It stayed up for more than a year. No load, of course, but it stayed up. It made the hair on my neck stand on end.
-In the same server room, a clone PC running Suse Linux 7.0 ran for just short of two years without a reboot. It would have gone longer had the old, 2 gig hard disk not died a clunking death. Fortunately, the web data was on a different disk. We loaded another system drive and had our departmental web/Samba server up in minutes.
-We have a Compaq Prosignia 200 running NT4 and Raptor 6.0 Firewall. It has seen uptimes exceeding 9 months on more than one occasion. Would have gone longer, I think, were it not for some memory leaks in the Raptor management console snap-in.
I point these things out so as to ask the question: how stable is stable? Hey, *nix has been my passion for years, but I've seen for myself that NT4 and, now, Windows 2000, can perform well if they are set up by someone who knows what s/he is doing. I believe impressive uptimes can be attributed to many things, but I do not always blame the OS code for the bad things that happen.We all know what bad firmware and drivers can do. I'll take NT4 on an Alphaserver over Linux on a Packard-Bell any day.
Of course, Linux on the Alphaserver is better yet . . . . : )
Re:Microsoft (Score:4, Interesting)
Re:Scientific American... (Score:1, Interesting)
Memorable line: Men are from Mars. Women are from Venus. Computers are from hell.
Re:and (Score:2, Interesting)
I had a win98 system last a bit past 30 days with regular use once and it was terribly hosed by the time I rebooted. Win2k or XP can last until your power goes out, you kick the surge protector, or you need to reboot to install drivers/software/hotfixes
Check this out -- lets talk some SERIOUS UPTIME. (Score:4, Interesting)
Kodiak_Rtr uptime is 6 years, 9 weeks, 3 days, 10 hours, 43 minutes
System restarted by power-on
Re:Computers don't crash (Score:2, Interesting)
it DOES cause an error (Score:5, Interesting)
Actually, "syntax errors" like this DO cause a problem for wetware systems -- they cause the brain (well, mine at least) to kind of glaze over and take the remainder of the sentence/thought much less seriously. Kind of like aborting/returning out of a subroutine.
Here in the Slashdot world of "definately" and "righting", I've learned that any posted comment that makes high-school-level grammatical or spelling errors is not worth my time and I immediately skip the post. I've been doing this quite rigorously lately -- blah blah blah "seperate" PAGE DOWN.
OK now, everybody nod and think I'm talking about someone else's posts ...
Re:Because it doesn't matter to you! (Score:5, Interesting)
Call it reciprocity. ;)
Otherwise, the functionality of cars and their safety mechanisms have evolved, and that evolution has made it from the $79,990 cars to the $13,990 cars that are being mass produced. Otherwise, who'd feel comfortable driving 160KPH in something that costs a mere 3-4 months' salary?
Probably one of the sources of problem in the software development industry, I'd say, is duplication of effort. Rather than take existing code and improve upon it, people seem either egotistically or somehow legally (copyright++) bound to constantly re-invent the wheel.
The GPL development model is great in theory, however in practise it tends to lead to "My camp is better than your camp" rather than "Our camp is approaching perfection".
Re:Because it doesn't matter to you! (Score:5, Interesting)
The desktop applications for linux are less stable but benefit from similar developement models and sometimes having the same coders involved so tend to be more stable than the competition. After all, after a hard day of coding stable server code, that programmer goes home and listens to mp3's. He runs the same platform at home that he uses at work (linux)... but at home he's running the gui and playing mp3's, one day he decides to scratch and itch because a feature is missing he would like. This gets him looking at the code, and like many other coders he can't stand to see instability... especially since that is what he does. He invariable fixes things and adds a patch for whatever feature he wanted to add.
ok now here comes the shameless plug:
This is why I believe linux will continue to grow and be accepted as the dominate platform. Current software in other areas is stagnating, it has for a while, some applications cannot significantly improve without major revamps in technology (IM's come to mind), a slow steady approach to development (and yes it is slow considering the number of manhours spent on opensource... there are just so many more men to spend hours that it amounts to rapid development.) leads to fewer bugs in the final code that faces the test of time... more code faces the test of time because it was done right (or closer to right) the first time and thus gets the bugs ironed out of it. Open source development is free... it has no pressure to release final versions, no pressure to release features until they are stable... In the course of time (maybe 5 yrs, maybe 50) it's an eventuality that this will win because it cannot be killed, there is nothing to fight after all, no business to put out.
Re:Whose computers still crash? (Score:1, Interesting)
The systems themselves are a collection of spare parts and old workstation purchased off Ebay. At the low end is a typically configured Pentium II 400 and at the high end is a typically configured Duron 900. All of them are running Windows 98 SE.
The game and the scripts keep all the systems at or very near 100% cpu utilization at nearly all times. The only time they are not working is when the game servers are down or my internet connection is down. Both of those are not very frequent.
Even under that somewhat heavy load, I go months without rebooting them. In fact, the only time they are rebooted is when I lose power or I'm leaving for on an extended vacation. One of them is an exception to that rule and has blue screened on occasion, perhaps 3 or 4 times in the past year.
Of course, on the system I actually use(not one of the seven described above), I left windows 98 a long time ago and I remember being plagued with BSOD's, lock ups, and constant reboots to keep things working.
What explains these two opposing performance comparisons? I have no idea really, but I have a guess...
On systems I use, I am constantly adding/installing software and hardware. On the systems that just macro 24/7, I don't do any of that. There is nothing but the bare essentials installed. Perhaps that has something to do with it.
Anyhow, back to the main point, I disagree that Windows based systems crash even if they are not doing anything. I have a whole bunch that work hard all day and they don't have that problem.
*No, that wasn't a typo, scripts on the computers "play" the game. It is known as macroig in the MMORPG world.
Re:Because it doesn't matter to you! (Score:1, Interesting)
MS has a long history of success built on choosing to add new features rather than focusing on quality. This strategy started long before MS had any monopoly power and has always been successful. At least until recently, people have been willing to *pay* for new features while also fairly willing to overlook mediocre quality. This is understandable given the rapid advance in computer capabilities.
But that was the past. I actually think this strategy could be the eventual downfall of MS. It is so ingrained in their culture, and with such a dominant market share there isn't much pressure to change. But eventually, MS will find a hostile market if they are not able to adapt to consumers who put more importance on quality.
[OT] your sig (Score:2, Interesting)
About your sig: Actually, I currently write games on a machine with about 1.5K of memory and an 895kHz CPU. [spatula-city.org] And I am grateful.
--JoeRe:Human Error (Score:5, Interesting)
All bridges (for the most part) must be built by people. Bridges collapse because people can't catch that one little fatal error in one or two million components.
The shit coders put out there, I swear... The reason software crashes is that by-and-large it's hacked together, not engineered. You hack a bridge together, and yes, it'll fail. You engineer software, and yes, it will run reliably. It's not fun to do - no easter eggs, no cool tricks, no cramming features in weeks before ship.
I'm stunned at the amount of code that goes out that was written by interns, by unexperienced coders, by people that just don't have a clue. The software industry really has no concept of best practices, no leadership, no authority body. The fact that buffer overflows still happen is stunning.
It's not small projects that work well because out of dumb luck they happen to not fail, or larger projects that work okay because we have 34,000 people looking at the code. If that's 'best practices', then we're doomed.
"Mozilla (www.mozilla.org) has a feedback option to help them debug, many software companies are including this."
Uh huh. Let's translate that to my car: "Hi. Yeah, I'd like to report a bug. I have a Saturn Ion, version 1.1v4. Yeah, when I turn on the left turn signal and then turn on the lights, the car catches on fire. You might want to fix that in the next version. Just though you might want to know. Bye."
Whoops, bullshit alert. (Score:3, Interesting)
Bullshit, bullshit, bullshit. This urban legend deserved to die years ago.
I ran several Windows95 OSR2 systems with uptimes approaching 90+ days, and had no problems with them locking up. Sure, 9x wasn't HAPPY with this, and if you ran a lot of applications odds are you won't hit this, but I did it many times in my former employment.
When the '45 days' (as I heard it first) rumor started going around, I set up a bunch of idle 95 machines for fun, and on days 45-50 watched for anything going on. Not one crashed.
Hell, for all I know, Microsoft themselves are reporting this, just to cover their asses based on some average uptime limit they worked out, but I will swear on a stack of bibles that I've had Win95 machines go at least twice this supposed limit without locking up.
Re:Try the UML (Score:4, Interesting)
The blueprint is the actual prototype of the product being designed.
The problem is if you document every step and algorthim in exact detail you will spend weeks, months, and yes years without a single line of code!
This is unacceptable in today's bussiness world where all the projects are due yesterday and your bosses demand percentage wise how much of the code is being developed. If you spend a month planning and not a single line of code is developered your canned.
My father took over a project where a clueless IT manager got because she slept with the CIO. Anyway she went to a seminar which talked about over flowcharting everything would be the wave of the future. She then had all the programers draft every single algorithm to the very if statements themselves on paper. After 4 months and not a single line of code my old man took over. From there he finished the project within 3 weeks!
My point is that drafting programs is too time consuming. In a way your drawing is the program and changes can be made as you go. Its essential to have good flowcharts and notes but they need to be generalized. If there is an error in it you can delete the line and fix it. In engineering you would have to dissamble the actual product and redesign it. Because they would cost time and money it is not accepted. In software that limitation is not there or as sevre.
UML tries to be the blueprint of all software programs but instead is only used to explain certain subsystems and algorithms. Mostly flowcharts are used so all the developers have a sense on how the program will work and how to invoke different pieces of the program.
I do not think this going to change unless there is a quick and easy way to debug UML charts. Logic errors are killer and if its perfect I suppose you can compile the uml directly into the language of choice.
Hmmm infact this might be the way to do it in the future.
Re:OT: Electric overconsumption (Score:5, Interesting)
My mail/web server would run fine off of something rediculously small, like a Sharp Zaurus. Here are my requirements, and I will pay for one if it is available.
Yes, I could probably build this with PC104 components, but I want a pre-built product, and I'm willing to pay for it (maybe $300 - $400).
Re:and (Score:4, Interesting)
I play RTCW quite a bit on my WinXP box with no issues. RTCW occasionally crashes, and I have to hit CTRL-ALT-DEL to bring up task manager and kill it, but the system remains stable.
When I first built this box I had some issues, after a while it would lock up. Turned out it was because the video card was overheating. The system itself wasn't locking up, just the video card. Put the system in a new Antec SX-835II case with better cooling and haven't had a problem since.
Why Do Computer STill Crash ? (Score:2, Interesting)
'The Meaning of Correctness
1. The program contains no syntax errors that can be detected by the compiler.
2. As for 1 and it can be run.
3. There exists a set of test data for which the program will yield the correct answer
4. For a typical ( ie reasonable) set of data the program return the right answer
5. For a deliberately difficult set of data the program returns the right answer.
6. For all sets of data, valid with respect to the specification, the program restuns the right answer
7. For all possible sets of valid test data, and for all likely conditions of erroneous input the program returns a correct ( or at least reasonable) answer.
8. For all possible input the program gives the correct, or reasonable answers.
Most programmers work at level 3 or 4
Users at 8.'
(I am sorry but I have lost the reference to the original book)
Easy.. economics and ongoing profit (Score:3, Interesting)
1. Any programmer knows that 90% of the code is written in the first 90% of the time, and the other 10% of the code is written in the other 90% of the time. (no typo). That is to say, it takes a lot more time, effort, and hence money, to move a project from "working well" to "working perfectly".
2. Many software companies these days make very little profit on the 1.0 release of their software, and make huge amounts of money through ongoing support charges. Microsoft is a classic example of this type of company.
3. If you release a piece of software that works really well, does everything the users want, and never crashes or causes trouble, then you may as well pack up shop and go out of business quietly. The unfortunate truth is that nobody is going to buy version 2 if they can do everything they want with version 1, and they're not getting constantly frustrated by crashes. The only carrot you have in this situation is to think up some really great ideas for version 2 in order to encourage people to upgrade - In fact, some of those ideas may have been deliberately left out of version 1 just so that they could be added later. Version 3 is more difficult still, and version 5 is right out. By comparison - how many versions of office are we up to now ?
A notable except to this business model is the games writers. Companies like valve and id software consistantly produce very near to bug-free code that works well and generally impresses the masses.
In all the years since half-life was released, there have been relatively few patches and fixes, and many of those were to prevent ingenious new methods of cheating, or to add support for hardware that didn't exist when the game was first released. The unreal engine had a similar history.
People buy new games because they crave the excitement or challange of exploring and interacting with it. That's not something that could really be said about excel or word, so those sorts of products have to rely on the "draw out the profit over many releases" strategy described above.
Another (big) factor is people's expectations - most people expect that word will crash from time to time, and given microsoft's past history, they have little reason to expect that to change. On the other hand, gamers have an expectation that the latest game from id software will be as solid as a rock, and that the few problems that do crop up after the release will be fixed quickly.
If a games company didn't spend that "other" 90% on the last 10% of development, and released something that crashed as often as explorer, their reputation would be mud within days, and people would stop buying their games.
And lastly, choice.
People have a choice as to which games they want to buy. It's a competitive market out there, with many people having little disposible income to spend on games. On the other hand, despite what linux advocates (I can't believe I'm saying this on slashdot) say, most people use MS apps and operating systems because they don't have a choice - say due to corporate rules.
You might think that it is the end user that gets the sharp end of the stick here, but the people that really get screwed are the dedicated and talented programmers, who are working for companies that don't care too much if they release code before it has been fully tested.
Re:Easy.. economics and ongoing profit (Score:2, Interesting)
When they transitioned from an Engineering Company to a Management Company, they surplussed all this neat software. Me, along with my software, was excessed. I was first in line to buy it from the company, being I knew exactly what it was and how I could run it on anything I could get my hands on. The company no longer exists, but I still run the software daily, albeit in another company.
Here it is, nearly 20 years later. I *still* prefer to use these programs. They are blindingly fast on a Pentium, allow me to update their libraries with all the latest parts I use, and still work perfectly.
By this time, I understand exactly what these programs do and am quite fast with them.. they are so familiar by now that I no longer have to concern myself with how to get the system to do what I want... now that I have finally perfected a simple DOS-based system thats ready for work about 13 seconds after I turn on power. I still fail to see what everyone is carrying-on about over these finicky new design softwares. I *try* to use them but soon become so frustrated with them that I keep reverting to the simple one.
It kinda bugs me when I have way too many choices - like do I really care what font or centering options the resistor values show up in the schematic I am preparing to feed to the SPICE simulator or the PCB Layout proggie? Just put the value where I place it and I'm happy. I just want it done NOW. I don't wanna dicker with it. If its gonna get published, I'll dump it into a .DXF file and let the AutoCad and PhotoShop guys gussy it up all they want.
See? There's an anecdotal evidence supporting your claim. They did the software right, and never sold another to me. All the companies that made the software are now out of business ( one got bought out, the other two are just gone.)
The favorite concern of the company I now work with is that I am using completely unsupported software. But then, I used a completely unsupported hammer when I built my doghouse. Big deal. If it works, what do you need support for?
Paranoia (Score:3, Interesting)
UNIX had the opposite philosophy. The hardware was expected to work perfectly. This led to situations where a DEC operating system would run reliably on a particular machine for months at a time and UNIX would crash within minutes on the same hardware.
Re:Whoops, bullshit alert. (Score:3, Interesting)
Check every second....
Maybe GetTickCount wraps, but I don't care,
something else will probably break before 49 days anyway
*/
if (m_dwLastTick-GetTickCount())>1000)
{
DoSomeThingImportant();
m_dwLastTick=GetTickCount();
}
GetTickCount returns the number of millisecs since reboot, after 49 days it will wrap and start over, so lazy programmers using code such as above will have a problem.
Re:Computers don't crash (Score:3, Interesting)
A quick trip to the terminal reports my uptime as "11:35AM up 57 days, 12:42..." This is by no means a long time by Unix standards, but for a laptop (iBook 600Mhz) that I use everyday, sleeping, waking, starting and stopping multiple programs, working on all sorts of stuff, burning CDs, browsing the net etc, I'd say it was very good.
The longest I could go on my Windows 2000 box before I'd have to reset was about a week - it wouldn't crash, it would just get confused and start swapping icon images over, so Word would have the Excel icon, and so on.
The only time I reboot my iBook is for system updates. Very few programs "Unexpectedly Quit" on me (Camino used to do it occasionally, every 2 weeks or so, but I'm using Safari right now). I've never had a kernel panic in 10.2.x (I had two in 10.1.5, but I traced it to the well known Classic environment and a USB device panic bug that was fixed).
If you want your software to crash less, buy a Mac.
Re:Good = expensive as hell (Score:3, Interesting)
I agree completely... This is the same kind of thinking that people use to try to outlaw guns... "If someone can use it to commit a crime, we should just eliminate them!".
I would say that poor development, insufficient design, (obviously) insufficient testing and a focus on features rather than security are MUCH more to blame for software quality issues than which language was chosen for the implementation.
I still think we should be able to moderate the whole article as a Troll...
T
It's not /computers/ in general, it's PC's (Score:3, Interesting)
It's interesting how little has really changed in the past 5 years...
Re:Computers don't crash (Score:2, Interesting)
How can a "user error" cause a crash. Software should do proper bounds checking and should act appropriately (which may mean giving and error message) no matter what input it is given.
About the only crash due to user error that I can imagine really being due to user error would be the user killing the proicess with killall or pkill or its moral equivalent.
Other than that, its just bad bounds checking and blaming it on user error is really bad form.
Part of the problem IMNSHO is the commodity desktop. There are so many machines and they are all cheap and its more important to get the work done than it is to make sure the crash doens't ever happen again.
On real systems, if the system crashes, crash dumps are sent off to the OS vendor and they track down the problem and fix it. I know, we have had to collect and send off crash dumps in the past.
Each round of that makes the system more stable.
Thats one of the advantages of Linux, and why there are some systems that don't crash (my linux boxes pretty much only crash when the power goes out, and the UPS battery drains). That is, that these OSs like Linuxs and BSD are used in real enviornments and there are people commited to fixing the problems... so even the lowly common desktop user reaps the benefits.
See there is the differnce.. Windows, even the "server" versions grew out of a desktop OS with a desktop way of doing things. "Oh the server crashed, well lets reboot and hope it doesn't happen again", whereas Linux and BSD come from the land of the server down to the desktop "Oh the server cashed? get DEC on the phone" or "Get out those crash dumps".
-Steve
programmers trending downward (Score:3, Interesting)
liability, training, capitalism (Score:2, Interesting)
Training of Software Engineers. With point and click interfaces you have people with an average reading ability of a 5th grader writing code. Even hinting that someone wasn't a good writer of code was considered "unprofessional" at some workplaces (i.e. -- you are not a 'team player').
Capitalism -- it's not cost effective to fix bugs until a customer finds them.
Even in code for Secure OS's under Common Criteria CAPP/LSPP, vendors aren't required to fix bugs that are not discovered by the independant evaluator or the customer. So even if the product manager knows of bugs in the OS that is intended for 'high security' government projects, there is no law saying he has to list them or fix them (unless they are found by a 3rd party or the customer). Spending time fixing bugs that are NOT found by the customer is not only not cost-effective, it is considered not working on "assigned priorities" and can be grounds for lower reviews.
This isn't pessimism -- it's reality. Quality doesn't pay when you can sell customers faulty products then charge the customers to fix the faulty product you sold them in the first place -- one might argue that it pays to have more bugs in the code -- you can charge more for service contracts and rack up more incidents that you then charge the customer, per incident, to handle.
More thoughts on hot button; ex: college class (Score:2, Interesting)
The price of perfection is taught early -- an early lesson was when for a final project we were to work with 2-3 other people to make a final program. The deadline was approaching and our program still wasn't running. Turning it in late was a letter grade drop/day. Two of us felt we were close and didn't want to turn in a non-running program. The third wanted to turn it in. They also felt that they'd done their part and there were no problems in it.
The third turned in the project with his name on it. My partner and I spent another day cleaning up his code to get it to work and turned it in. We got a a "C" on the project, with a downgrade for bad coding practice in his section of the code and being a day late. He got a "B" even though it didn't work. In the final grade both he and my partner got "D"s while I got a "C", which sorta sucked for my major -- but it turns out that 60% of the class got "D"s and "E"s. Made a big stink about the course material being too difficult and the teacher made a public 'booboo' comment "It was the same material he'd taught before, it was just an exceptionally dumb class." Major ire of parents.
Anyone who got a "D" or "E" had it stricken from their academic record. It as the only "C" I got in my comp-sci curriculum (str8 A's in 300 level and above classes). But on that project, I learned that deadlines were more important than code quality.
Spin forward 15 years -- at small startup before Xmas. Deadline for demo approaching and I and other team member had parties to go to that evening. He was programming a DSP chip (he was a PhD wizard), and I was handling the drivers on the 286 DOS box. I checked my code backwards and forwards and he swore it couldn't be his stuff. Finally, I displayed output he was sending and it was 'wrong'. Unfortunately, my party had been out of town and I'd already missed the deadline for getting there because it was emphasized to me how important the project was to complete before leaving. When the problem was discovered in his code -- guess what -- he could't stay to fix it (I didn't
know anything about the DSP chip he was using) because, the VP told me, he was married and his wife was gonna leave him if he missed the party (I don't think he was serious, but maybe). I had no such excuse -- only a partner who went to the party alone.
Again -- what do I learn? Personal relationships take presidence over
product and code quality, so far we have code quality below deadlines and below personal relationships (though that has more disappeared in the modern
world).
more later...
-l
core problem: people people != computer people (Score:2, Interesting)
Those who spend time going to lunch, drinking beer together, palling around together -- they begin to think alike -- they develop synergy -- but they also develop a closed system. The ones who don't pal around come up with the completely off-the-wall ways of doing things because they haven't been indoctrinated into the 'normal way' of doing things. Quite often these ideas are shot down because of their eccentricity. But Steve Job's personal computer idea he presented to HP -- shot down by corportate culture was a brilliant success. He gives countless examples of the most brilliant people generally not being very good with "people skills".
A correllary of this is that those who push for perfection far past the 'norm' are going to be unpopular outsiders -- they are the nit-pickers, the one's who aren't team players. Again, they might be the ones that would nit pick the code to perfection, given the chance, but the larger group says "enough" -- it's "good enough, it boots, let's ship it".
In both instances the people most likely to increase quality in software are those that have the least political clout and are often least liked by their peers. Their peers often feel like the 'nitpicker' has a prideful, superiority complex -- overly prideful and sometimes go out of their way to sabotage work that might otherwise have turned the company around and saved millions.
I specifically was involved in a group who had to choose between 2 vendors of Microsoft compatible software. I became the lone supporter of company B. I was adamantly opposed to "A" for reasons I coudn't articulate at the time -- my gut told me "A" was untrustworthy but I couldn't tell why. I was overruled and 4-5 months into the project "A" sued MS for non-cooperation effectively killing our project. It was too late to go with company "B" who's price had doubled now that they were the only game in town. It turns out "A" had been having trouble with MS all through the negotiations with us, but no one picked up on it. Reminding anyone of the decision made me decidedly unpopular. But it was precisely because I hadn't gone out and been wined and dined by "A" and hadn't formed a "Good 'ol boy" relationship with them that I could see something was amiss. It was precisely the fact that I wasn't a hobbnobber/ polical animal that I caught the 'off' vibes. Those who were "good team employees" went along with the majority decision and the 'friendly team "A" who came onsite to woo us. Its the same principle at work.
Those who make the world work -- are also those most likely to compromise and most likely to compromise quality. It's because of their willingness to compromise that they are liked by many but it's the same compromise that resultes in compromised code -- both in terms of bugs and security.
I sure as heck don't know the answer. Successful combinations are highlighted in the book mentioned above where one person knows the almost anti-personal nature of the 'idea' person, and handles the media and external interactions, but the it's rare to find groups that work well like that.
It has often been said that the best software doesn't come out of committee but out of 1 or a few people -- while companies like to think that 9 women can have a baby in 1 month, it ends up more often that the 9 women argue over who