Examples of Programming Gone Wrong?

Please create an account to participate in the Slashdot moderation system

Examples of Programming Gone Wrong? 674

Posted by Cliff on Sunday October 27, 2002 @02:43PM from the oops! dept.

LightForce3 asks: "I'm a beginning CS student, and in my studies I've come across examples of programmer error causing very large problems, such as the Ariane 5 failure and the Therac-25 accidents, often as tales of caution to beginner programmers such as myself. My (morbid?) curiosity has been piqued, and I'm looking for other examples of programmer error leading to serious problems. After all, it is better to learn from the mistakes of others than from your own, right? ;) What programming-related accidents, incidents, and failures, both well-known and obscure, do Slashdot readers know about, and are there any good resources for researching these?"

This discussion has been archived. No new comments can be posted.

Examples of Programming Gone Wrong?

Load 500 More Comments

Search 674 Comments Log In/Create an Account

Comments Filter:

On Fox tonight @ 8pm (Score:4, Funny)

by darylp ( 41915 ) writes: on Sunday October 27, 2002 @02:46PM (#4542348)

When Programming goes Wrong 2! Thrill to our latest reality TV series where we show REAL LIFE footage of poorly thought out database schemas, unchecked buffers and even explicit shots of forbidden goto statements.

Share
twitter facebook
- Re:On Fox tonight @ 8pm (Score:3, Funny)
  
  by jsse ( 254124 ) writes:
  
  You thought they haven't featured that before? :)
  
  Once on TV there was a documtary on a history of computers, talking about Pascal, father of computers, the first programmer, the first vacuum tube computer, and....the first (real)bug found - in closeup shot!
  
  I found this extremely amazing and couldn't even move my eyes away throughtout the show. Then I found my wife and my mother-in-laws fell into deep coma on sofa...
  
  Damn! I should have taped the show!
The book "Fatal Defect" (Score:5, Informative)

by spanky555 ( 148893 ) writes: on Sunday October 27, 2002 @02:47PM (#4542357)

This book is devoted to just that. It's what you're looking for...go get it and read it.

Share
twitter facebook
Mars Orbiter Lost Over Metric Conversion (Score:4, Informative)

by Kircle ( 564389 ) writes: on Sunday October 27, 2002 @02:48PM (#4542358)

http://slashdot.org/articles/99/09/30/1437217.shtm l

Share
twitter facebook
- Mars Orbiter Lost Over Metric Conversion (link) (Score:2, Informative)
  
  by Kircle ( 564389 ) writes:
  
  Here's the Link [slashdot.org]
Here are my Top 4: (Score:5, Informative)

by ekrout ( 139379 ) writes: on Sunday October 27, 2002 @02:48PM (#4542362) Journal

1.) Patriot missile failure [umn.edu]
2.) Intel f*cking up floating-point calculations in one of their chips [intel.com]
3.) High-tech toilet glitch (no, really!) [ncl.ac.uk]
4.) Windows ME [microsoft.com]

Share
twitter facebook
- OT: Scuds and Patriot missile defenses (Score:5, Interesting)
  
  by GuyMannDude ( 574364 ) writes: on Sunday October 27, 2002 @05:21PM (#4543252) Journal
  
  People keep pointint to the floating point error as the cause for why the Patriot system at that time (the PAC-2) let that scud go through. But as I've already pointed out in an earlier post [slashdot.org], the PAC-2 did a crappy job (far worse than is generally known) intercepting scuds not because of coding errors but because the problem of hitting an erratically moving missile was so difficult. I think it's important to get the word out as we approach a new war with Iraq and consider a national missile defense shield. This recent article [atimes.com] briefly discusses Israel's own attempts at missile defense because they don't trust the PAC-2 (for good reason) and it's questionable whether the US is going to give them some PAC-3 batteries.
  Bottom line: that stuff about the floating point error in the PAC-2 system looks neat on paper but it's not at all clear that the faulty calculation was responsible for the loss of life.
  
  GMD
  
  Parent Share
  twitter facebook
- - Re:Toilet glitch (Score:5, Funny)
    
    by squarefish ( 561836 ) writes: on Sunday October 27, 2002 @03:17PM (#4542569)
    
    a different toilet story from about 10 years ago:
    
    This appeared in today's (2/17) Seattle Post-Intelligencer:
    It was a flush with a rush.
    Toilets and urinals in the King County Courthouse exploded yesterday after a worker in Metro's downtown bus tunnel mistakenly connected an air compressor to the building's water line. As soon as hapless individuals flushed the pressurized privies, the plumbing started popping in restrooms throughout the 72-year-old building, said building services manager Bill Kemp. "They started blowing at about 11:30 (a.m.) and it took us awhile to figure it out," he recounted."We knew it had to be air in the system but the Water Department said that was impossible." It wasn't. The source of the problem was finally tracked to the tunnel under Third Avenue, and the errant air compressor was shut down. But not before employees on every floor in the 10-story courthouse had stories to tell about gushing geysers in the john. "We think we've lost about 20 to 25 toilets," said Kemp. "The porcelain is actually cracked." Kemp said no one has admitted being hurt by the unusual blast, although several people were badly drenched. Or very surprised. Explained Kemp, "The urinals acted more like bidets." We had other reports that people were not necessarily on the toilet but close." "This has not exactly been a good day for Metro," he noted. by Mary Rothschild --P-I Reporter
    
    link [netfunny.com](story is near bottom, pun intended.
    
    Parent Share
    twitter facebook
- - Re:That is NOTHING -- 10,000 died in Bhopal, India (Score:3, Insightful)
    
    by entrylevel ( 559061 ) writes:
    
    Actually I think that no one mentioned this since it has nothing to do with programmer error. (At least not according to what you linked to.)
  - Re:That is NOTHING -- 10,000 died in Bhopal, India (Score:5, Insightful)
    
    by Forager ( 144256 ) writes: on Sunday October 27, 2002 @04:48PM (#4543089) Homepage
    
    From the site:
    
    "In 1969, as part of its global empire, Union Carbide Corporation set up its pesticide formulation unit in the northern end of the city of Bhopal in central India. Initially it mixed and packaged pesticides imported from the US but was gradually expanded. In December 1979 its Methyl Iso Cyanate (MC) plant with an imtalled capacity of 5000 tonnes went into production.
    
    On the night of December 2, 1984, during routine maintenance operations in the Methyl Iso Cyanate (MC) plant, at about 9.30 p.m., a large quantity of water entered storage tank no. 610 containing over 60 tonnes of AEC.
    
    This triggered off a runaway reaction resulting in a tremendous increase of temperature and pressure in the tank and 40 tonnes of MIC along with Hydrogen Cyanide and other reaction products burst past the ruptured disc and into the night air of Bhopal at around 12.30 a.m. Safety systems were grossly under-designed and inoperative. Senior factory officials knew of the lethal build-up in the tank at least one hour before the leakage, yet the siren to warn neighbourhood communities was sounded more than one hour after the leak started.
    
    By then, the poisons had enveloped an area of 40 sq.kms. killing thousands of people in its immediate wake. Over 500 thousand suffered from acute breathlessness, pain in the eyes and vomiting as they ran in panic to get away from the poison clouds that hung close to the ground for more than four hours."
    
    Nothing to do with programming errors here that I can see. Sounds more like gross negligence and incompetence to me.
    
    -A.
    
    Parent Share
    twitter facebook
    - Re:That is NOTHING -- 10,000 died in Bhopal, India (Score:4, Informative)
      
      by jgaynor ( 205453 ) writes: <jon AT gaynor DOT org> on Sunday October 27, 2002 @07:28PM (#4543841) Homepage
      
      A "large quantity of water" entered the storage tank because an employee who had just been fired dropped a hose into it out of spite (he didnt know what would happen, he just wanted to ruin something). Yes the safety precautions were under-par, but when someone with legitimate access wants to destroy something its pretty hard to prevent.
      
      And yes, this has nothing to do with programming error :).
      
      Parent Share
      twitter facebook
  - - Re:That is NOTHING -- 10,000 died in Bhopal, India (Score:3, Insightful)
      
      by Cyno01 ( 573917 ) writes:
      
      Now, what do you think would happen if a large number of decent hard working consumers were wiped out in a single event?
      it did, 09/11/01
    - 10,000 dead in Bhopal, India != Concern (Score:3, Funny)
      
      by Sean Clifford ( 322444 ) writes:
      
      <sarcasm>
      When something like this happens, it's little more than an embarassing public relations problem. If the news can't be completely supressed through advertising, perhaps it can be kept off the evening news and relegated to the back pages. It requires a well-coordinated PR firm, but hey that's what they're around for.
      Sure, a few independent news agencies might pick it up and make a big deal about it - until someone goes whaling or starts cutting down redwoods. Few people pay much attention to the independent media anyway. Joe Sixpack doesn't subscribe to The Progressive.
      On the local front, shut down the plant, and evacuate your American/European workers. Split them up and transfer them around. If someone makes noise, force them to sign an NDA for their severance packages. Spread liberal bribes on the local front, write the whole venture off, and wait for the hubbub to die down. If you want to stay in the region and resume operations, do so under the umbrella of a subsidiary. If it's too risky, simply relocate to another third-world region. It's not like there's a limited supply.
      Unless you stay in the region, you really don't have to worry much about the local population. They're too poor to pursue legal action or be a security threat.
      Besides, it's not as if they're white Christians, is it?
      </sarcasm>
RTM Worm (Score:5, Interesting)

by rwash ( 16296 ) writes: on Sunday October 27, 2002 @02:48PM (#4542368) Homepage

In the 80's, Robert T Morris accidentally released a worm that exploited problems in sendmail and other common internet daemons that took down most of what was the internet at that time. This was expecially bad since about half of it was military.

Share
twitter facebook
- You forgot the part that went wrong. (Score:3, Interesting)
  
  by mindstrm ( 20013 ) writes:
  
  The only reason it took thigns down was because a timing loop was messed up, and it was spreading something like 1000 times too fast. It was supposed to spread everywhere, yes, but by crawling slowly.. it was not intended to eat up all connections on all machines.
  
  Had that been the case, it would have been much more widespread and caused much less damage.
- Re:RTM Worm (Score:5, Interesting)
  
  by ProfessorPuke ( 318074 ) writes: on Sunday October 27, 2002 @09:17PM (#4544315)
  
  "accidentally released" is wrong, or prehaps whitewashing by RTM's friends. The release was fully intentional. What was accidental about it is that he hadn't realized that in addition to infecting virtually every UNIX system it found, it would also DOS them. The worm constantly tried to infect every available system, meaning that a system which was vulnerable would recieve many, MANY copies of the worm, exhausting its processing power.
  
  RTM had been aware of the possiblilty, and implemented a fix- but he did it wrong. He'd created code so that a new worm, when first arriving at a host, could check if a previous instance of the worm had been there. If so, it could abort its infection process.
  
  However, he was afraid that this would make vaccinating machines too easy (by sysops faking the "already infected" flag), so he created a 12.5% random chance that an incoming worm would ignored the fact that a machine was already compromised and infect it again. That probability had NO rational basis behind it, (in fact the whole idea of using randomizing like this is flawed), and served to postpone the shutdown of the internet by at most an hour.
  
  This was an especially bad blunder because it set a frightening example of what hackers could do. If RTM had used a 100% chance of non-reinfection, (and played his cards right from then on), he'd have been hailed as an innovative security analyst who'd prevented security-compromising violations of the Pentagon's systems. Instead he was tossed in prison for years.
  
  Parent Share
  twitter facebook
Y2K? (Score:5, Insightful)

by Monthenor ( 42511 ) writes: <monthenor&gogeek,org> on Sunday October 27, 2002 @02:49PM (#4542373) Homepage

...wherein a technique to save memory on older computers resulted in a massive media panic twenty years later. Oh, and it caused a couple glitches

Share
twitter facebook
- Re:Y2K? (Score:4, Funny)
  
  by Citizen of Earth ( 569446 ) writes: on Sunday October 27, 2002 @07:54PM (#4543941)
  
  wherein a technique to save memory on older computers resulted in a massive media panic twenty years later.
  
  Yeah, we fleeced 'em pretty good, eh. We should do that again in 2038 in order to pad my retirement account!
  
  Parent Share
  twitter facebook
- You just can't win, can you... (Score:4, Insightful)
  
  by achurch ( 201270 ) writes: on Monday October 28, 2002 @01:19AM (#4545348) Homepage
  
  If the Y2k bugs hadn't been fixed, things would have broken left and right, and we would have been blamed for not fixing them ahead of time.
  Since the Y2k bugs were fixed, very few things broke, and we got blamed for wasting tons of money to no effect.
  C'est la vie, I guess.
  
  Parent Share
  twitter facebook
- - Re:Y2K? (Score:3, Informative)
    
    by T-Ranger ( 10520 ) writes:
    
    Prehaps true, but back in the days of punchcards anc COBOL you wernt storing a integer for a date, you were storing a string.
- - Re:Y2K? (Score:4, Funny)
    
    by balloonhead ( 589759 ) writes: <doncuan.yahoo@com> on Sunday October 27, 2002 @07:45PM (#4543912)
    
    It was overhyped nonsense. A lot of people made a lot of money out of the panic spread about Y2K. A few things might have broken, but essentially the predicted disaster was never going to happen.
    Hey we sold you this! Top of the range! But it's broken, even before we sold it to you. If you pay us £500000 we'll fix them all, but if you don't your blood will boil and your head will explode, all your kids will die of pestilence, your wife will sleep around, your plane will try to reach the moon and all your elevators are belong to us.
    
    Parent Share
    twitter facebook
    - Look out.... (Score:3, Funny)
      
      by r_j_prahad ( 309298 ) writes:
      
      Hey we sold you this! Top of the range! But it's broken, even before we sold it to you. If you pay us £500000 we'll fix them all, but if you don't your blood will boil and your head will explode, all your kids will die of pestilence, your wife will sleep around, your plane will try to reach the moon and all your elevators are belong to us.
      
      Careful. Quoting from a Microsoft EULA like that without proper attribution could get you tossed into jail for a DMCA violation, sport.
How about the AT&T Switch failure in NY? (Score:5, Informative)

by Bolen ( 4896 ) writes: on Sunday October 27, 2002 @02:50PM (#4542385)

A Central Office (CO) switch is basically a mainframe-class computer programed in assembler. A few years back, a newly-installed switch failed due to a bug in the code, causing a cascading failure of the phone system for a few hours.

Share
twitter facebook
- It was a bad break in C code (Score:5, Informative)
  
  by hayne ( 545353 ) writes: on Sunday October 27, 2002 @04:13PM (#4542881)
  
  Actually, the switching code was in C and the crash was due to a programmer's apparent misunderstanding of the 'break' statement. See full details at: http://www.csc.calpoly.edu/~jdalbey/SWE/Papers/att _collapse.html [calpoly.edu]
  
  Parent Share
  twitter facebook
See RISKS Digest (Score:2, Informative)

by sotweed ( 118223 ) writes:

... distributed by SRI. You can browse the archives at http://catless.ncl.ac.uk/Risks

This is probably the broadest and best source for this kind of information.
Risks digest (Score:5, Interesting)

by wik ( 10258 ) writes: on Sunday October 27, 2002 @02:52PM (#4542394) Homepage Journal

I'd suggest reading The Risks Digest [ncl.ac.uk]. Its topics tend to be on computer systems and the risks (monetary, time, life, political, etc...) associated with them over the past 17 years or so. Just be aware of the witty remarks the moderator makes.

As far as other death-inducing systems go, you might be interested in, computer controlled plane accidents (airbus, anyone?), unmanned trains, and nuclear reactors are some classical favorites.

Share
twitter facebook
- Also on UseNet (Score:3, Informative)
  
  by xixax ( 44677 ) writes:
  
  I originally started reading it as comp.risks on UseNet.
  
  news://comp.risks
  
  Xix.
One time this guy (Score:5, Funny)

by photon317 ( 208409 ) writes: on Sunday October 27, 2002 @02:52PM (#4542396)

tried to write a peice of software called Slashcode. Look at what happened.

Share
twitter facebook
Pleanty of examples here: (Score:2)

by zulux ( 112259 ) writes:

Shared Source [microsoft.com]

Just fill out the forms, sell your soul, and you can browse programming errors for rest of your natural life - and that's supposing that viewing this code won't make you slit your wrists in dispair.
I had a professor (Score:5, Funny)

by teamhasnoi ( 554944 ) writes: <teamhasnoi.yahoo@com> on Sunday October 27, 2002 @02:53PM (#4542402) Journal

Professor Falkin was always saying, "Leave a backdoor in any program you write, just in case your code becomes self-aware."

Share
twitter facebook
- Re:I had a professor (Score:5, Funny)
  
  by Subcarrier ( 262294 ) writes: on Sunday October 27, 2002 @03:06PM (#4542496)
  
  Professor Falkin was always saying, "Leave a backdoor in any program you write, just in case your code becomes self-aware."
  
  I can't help thinking that a fairly high percentage of current Microsoft employees must be former students of his.
  
  Parent Share
  twitter facebook
RISKS Digest (Score:4, Informative)

by BinBoy ( 164798 ) writes: on Sunday October 27, 2002 @02:54PM (#4542413) Homepage

The RISKS Digest is a mailing list and usenet newsgroup that describes all kinds of situations where technology has gone wrong. Many of the stories involve programming errors.

Google's RISKs Archive [google.com]

Share
twitter facebook
Why, the world's favorite mail client, (Score:5, Interesting)

by Scarblac ( 122480 ) writes: <slashdot@gerlich.nl> on Sunday October 27, 2002 @02:55PM (#4542422) Homepage

Outlook!

Built with the idea that code in attachments should be executable, often automatically. Also full of exploitable bugs, to get even more stuff running automatically, regardless of who who sent it. Responsible for a huge amount of damage by all sorts of worms, trojans, etc.

Someone, somewhere got the idea that email would look better with html; and if it got html, it should get scripting too, that's consistent with web pages! And it's cool if attachments (like pictures) can be opened in their appropriate program automatically - let's run any executables then, that's consistent!

This is oversimplified, but I really feel that this is a case of stupid consistency that caused multi-billion dollar damage. Email should never be executed by the mail client.

Share
twitter facebook
- Re:Why, the world's favorite mail client, (Score:4, Funny)
  
  by illsorted ( 12593 ) writes: on Sunday October 27, 2002 @03:58PM (#4542797)
  
  I don't know which is funnier, this post, or the fact that it's modded "+4: Flamebait".
  
  Parent Share
  twitter facebook
- Re:Why, the world's favorite mail client, (Score:5, Insightful)
  
  by billbaggins ( 156118 ) writes: on Sunday October 27, 2002 @05:30PM (#4543290)
  
  Not quite. I don't think even Outlook was ever set to just run code automatically. What went wrong was that for a long time (and, in unpatched versions, even today), Outlook would implicitly trust the "Content-type" header for an attachment or message, and, if it was a "safe" type (like text/html or image/jpeg) then the attachment would be handed off to the document-opener to be rendered & displayed inline. Problem was, the document-opener didn't go by the MIME type but by the extension. So if you had something like
  
  Content-type: image/gif Content-disposition: attachment; filename="fux0r.scr"
  
  then the document-opener would say "ah, this is a screensaver, I should execute it" and before the poor user knew what was going on, all hell was breaking loose...
  
  Parent Share
  twitter facebook
That was an easy setup (Score:5, Interesting)

by Ghoser777 ( 113623 ) writes: <fahrenba@m a c . c om> on Sunday October 27, 2002 @02:57PM (#4542438) Homepage

A clear example is: [insert random microsoft product].
Oh wait... -1 Redundant
Here's a good site [tu-muenchen.de] though with tons of examples.
My favorite would be the infamous [fas.org] time when NASA did half its calculation in metric and the rest in SI. ;)
F-bacher

Share
twitter facebook
- Re:That was an easy setup (Score:3, Informative)
  
  by Transcendent ( 204992 ) writes:
  
  that was not an error in the programming... some dumbass gave all the calculation in English units for acceleration to the programmer who writes his program using SI for units (or metric... same thing...).
- - - Re:Thanks! (Score:3, Funny)
      
      by NeXTer ( 619524 ) writes:
      
      Especially considering that SI is short for the French term "Système International". :P
Failures (Score:4, Informative)

by Jordan Graf ( 4898 ) writes: on Sunday October 27, 2002 @02:57PM (#4542439)

MIT runs a class called 6.033: Computer Systems Engineering. These [mit.edu] lecture notes contain a list of projects that had great sums of money spent on them only to be abandoned. Also the reading list [mit.edu] has a bunch of papers that discuss the "big splash" failures like Therac 25.

Share
twitter facebook
Pretty sure this was posted earlier on slashdot (Score:2, Informative)

by Utopia ( 149375 ) writes:

but couldn't find it.

Anyway, here are a couple of links.
Software horror stories [tau.ac.il]
More horrors [yorku.ca]
A Great Story (Score:5, Interesting)

by puppetman ( 131489 ) writes: on Sunday October 27, 2002 @02:58PM (#4542452) Homepage

that was told to my class about the altitude of fighter jets.

A company was hired to rewrite the code that was used on one of the models of fighter jets, and they offered to fix an unusual bug.

The details are: apparently they had two altimeters - one was barometric, and the other I don't remember.

Anyway, the programmer was coding along, and was writing code to determine what would happen if the altimeters stopped functioning.

He came to the case where they both weren't working, and couldn't figure out what to do, so called one of the pilots that was acting as an information source for the developers, and asked him what altitude they normally flew at, and he answered, "12,000 feet" or something similar.

So the programmer wrote,

if altimeter1 not working
{
if altimeter2 not working
{
set height = 12000;
}
}

Stupid, but this code could not be changed. The pilots had the following rule deeply ingrained: if the altitude stays at 12,000 for more than a few seconds, pull up, as your altimeters aren't working.

Share
twitter facebook
- That's kind of silly (Score:4, Insightful)
  
  by Ghoser777 ( 113623 ) writes: <fahrenba@m a c . c om> on Sunday October 27, 2002 @03:02PM (#4542470) Homepage
  
  Wouldn't setting it to something like 0 be better? I mean, I could miss it sticking at 12,000 for a while, but if I notice that my altitude is suddenly 0, I think my first instinct will be to pull up as fast as possible.
  
  F-bacher
  
  Parent Share
  twitter facebook
  - Re:That's kind of silly (Score:4, Informative)
    
    by pongo000 ( 97357 ) writes: on Sunday October 27, 2002 @03:28PM (#4542627)
    
    Wouldn't setting it to something like 0 be better?
    
    In most areas of the world (unless you're flying over the Dead Sea, or Death Valley, or New Orleans), if your altimeter reads 0, you're probably already dead. Altimeters used for navigation read MSL (height above mean sea level), not AGL (height above ground). There are radar altimeters that read in AGL, but these are used for close-to-ground maneuvers like landing.
    
    Parent Share
    twitter facebook
  - Re:That's kind of silly (Score:4, Interesting)
    
    by pete-classic ( 75983 ) writes: <hutnick@gmail.com> on Sunday October 27, 2002 @05:44PM (#4543345) Homepage Journal
    
    From a text I am currently working on:
    
    The compiler requires you to declare variables, but does not require you to initialize them. Does that mean you can get away with leaving them uninitialized? Well, you might program your entire life without coming upon a reason not to. If you don't initialize them, however, you will almost certainly run into a very difficult bug, probably sooner than later. Using an uninitialized variable is perfectly valid syntax, but is always a logic error. The compiler won't complain, but you will get wild, unpredictable, and wrong results. In the worst case, you might get believable, but wrong results. This leads us to what to use as an initializer. Most people use zero. Using an "obviously wrong" value may be more useful. Often a maximal value (such as int students="65536") is more obviously wrong. [Emphasis added for this post.]
    
    This isn't variable initialization, but the principal replies. Data that you know are junk should look like junk! Trying to "fake it" or make it "look good" is exactly the wrong thing to do.
    
    -Peter
    
    Parent Share
    twitter facebook
- Re:A Great Story (Score:4, Informative)
  
  by florescent_beige ( 608235 ) writes: on Sunday October 27, 2002 @08:28PM (#4544112) Journal
  
  Speaking of aviation: This SAAB Gripen crash [canit.se] was attributed to the coding of the control laws in the flight control computer. So was this [ncl.ac.uk] one. And this [google.ca] F-22. And lets all remember the Apollo 11 [nasa.gov] incident.
  
  Parent Share
  twitter facebook
- - Re:Why this cant be right... (Score:4, Insightful)
    
    by cybercuzco ( 100904 ) writes: on Sunday October 27, 2002 @05:31PM (#4543298) Homepage Journal
    
    Thats not entirely true. Adding power will inccrease your altitude, but pulling up will too. When you pull up, you trade altitude for speed. In other words, youll go higher but your plane will be goins slower. Eventually you arent going fast enough to maintain level flight characteristics, so you have to add power or stop trying to go higher. In some cases youre right though, if you already are only going fast enough to maintain level flight, pulling back on the stick will slow you down and decrease your altitude, but this isnt always true. As for the person who didnt understand how adding power increased altitude, when you go faster, you increase the lift coming from your wings (since lift is a function of speed and angle of attack) so there is a net upward force on the aircraft, causing it to go upwards.
    
    Parent Share
    twitter facebook
  - - Re:Why this cant be right... (Score:4, Informative)
      
      by bongholio ( 609944 ) writes: on Sunday October 27, 2002 @06:39PM (#4543607)
      
      You're all sorta right.. here is one of my favorite aviation pages [monmouth.com] It'll tell you more than you ever wanted to know about airplane physics (from a pilot's point of view). Chapter 1 covers these altitude/speed/power concepts...
      
      Parent Share
      twitter facebook
Shared download class (Score:5, Funny)

by solostring ( 620535 ) writes: on Sunday October 27, 2002 @03:03PM (#4542474) Homepage

I used to work for a 1999/2000 'golden child' dot-bomb which dealt in file trading... a proposed legal form of napster. It was a fucked company from the start, but it still had a lot of traffic in the early days.

We always had problems with downloading files from the site.... the files kept getting corrupted, and occasionaly, a member would complain that they tried to download a powerpoint presentation and ended up getting 4 way anal porn.

This perplexed the developers, and it was not until 9 months after going online with the site, did they realise that the java class that dealt with the downloads was a single process shared by all users! :)

So, your download would go ok IF nobody else tried to download at the same time. If two people clicked download at about the same time, you would download the file that the second person wished to download.

No wonder they went bankrupt :)

Share
twitter facebook
Easy, (Score:5, Funny)

by Phoenix666 ( 184391 ) writes: on Sunday October 27, 2002 @03:03PM (#4542476)

When they played Heidi over the end of the greatest come-back in football history. Oh wait, you didn't mean that kind of programming, did you?

Share
twitter facebook
Don't be so narrow (Score:5, Insightful)

by coyote-san ( 38515 ) writes: on Sunday October 27, 2002 @03:03PM (#4542482)

Don't be so narrow in your approach. Is it a programming error if a stadium roof collapses because the engineers couldn't understand what the output of their computer model was saying?

What about when the construction crew quietly substituted what they thought was an equivalent design to what the computer program came up with for a skywalk over a hotel lobby?

After almost 20 years in this field, I think that at least 80% of the serious "errors" I see are because the user didn't understand the results of the program, and only 20% of them are due to classic development errors.

The lesson to learn from this: the user interface matters. Give some thought to presenting the information in a meaningful manner (e.g., the infamous pre-Challenger graphs showing O-ring erosion vs. the post-Challenger graph that mapped damage by temperature at the time of launch), and allow users to see the information in the way that makes the most sense to them.

Share
twitter facebook
- Re:Don't be so narrow (Score:4, Insightful)
  
  by FattMattP ( 86246 ) writes: on Sunday October 27, 2002 @05:00PM (#4543166) Homepage
  
  The lesson to learn from this: the user interface matters. Give some thought to presenting the information in a meaningful manner (e.g., the infamous pre-Challenger graphs showing O-ring erosion vs. the post-Challenger graph that mapped damage by temperature at the time of launch), and allow users to see the information in the way that makes the most sense to them.
  
  On a related note, a guy named Edward Tufte wrote a some books on just this type of subject. I believe it was called The Visual Display of Quantitative Information, or something like that. Basically, he goes show how thinking more about how you present the data can help you to communicate your ideas more effectivly. He also talks about the O-ring problem that you mention. He shows the charts from the NASA engineers and then shows the charts he had drawn. You could definitly see the problem much more clearly in his drawings.
  
  Parent Share
  twitter facebook
Train collision (Score:5, Interesting)

by Shaddup ( 615685 ) writes: on Sunday October 27, 2002 @03:04PM (#4542489)

A company I once worked for (as an intern) was in the business of what's called "train control" software. Briefly, it's the software that dispatchers use to monitor the status of the switches, the position of all the trains being tracked by the system, etc. One of the features of the system is to provide early-warning of potential collisions. Well, the system is quite reliable (having been in service, in one form or another, since the 70's). However, there have been some accidents.

Once such accident, in Mexico, was caused by an unexpected combination of several simultaneous failures. One day, for some reason, one of the servers needed to be reset. At the same time, two freight trains were stopped at a switch, in the process of what's called a "pass," where one train turns off onto a side track to let the other train pass by on the main track. Long story short, the status bits of the switch got lost during the server reset (there is a provision for restoring track states when the backup servers take over, but it didn't work for some reason). After asking if the track was clear, the driver for train1 recieved a green light from the dispatch office. The dispatcher, not knowing that train2 hadn't cleared the switch yet, figured everything was ok. The trains collided at very low speed, and not head-on, but nonetheless the collision cost the rail line several million in equipment and downtime. No one was hurt.

The lesson: When writing bullet-proof software, check every possible condition! More extensive field testing would have caught the failover bug.

Share
twitter facebook
- Re:Train collision (Score:3, Insightful)
  
  by Reziac ( 43301 ) writes:
  
  IANAP, but I'd think your bulletproof software should also have some way to gracefully account for "impossible" conditions, which users are so clever at creating!!
Non-life threatening, but interesting bug... (Score:5, Interesting)

by Anonymous Coward writes: on Sunday October 27, 2002 @03:05PM (#4542492)

I'm an AC for a reason...

Let's just say that two years ago a very large international shipping company suffered two days of worldwide failure in the package routings printed on labels. The bug was caused by an incorrectly placed paren in an index offset calculation, leading to truncation of an intermediate result (to a 16 bit unsigned int, when it should have been 32). The bug sat dormant for five years because the result matrix it was indexing into was smaller than 64kbytes. As soon as it grew over that size - boom! What a way to wake up at 2am when the Asian-Pacific region starts calling...

I didn't make it, but I was definitely involved with the fix. After that we did some very thorough auditing on all of the routing code - and fortunately didn't find any other surprises lurking.

Share
twitter facebook
Airbus (Score:5, Interesting)

by That_Dan_Guy ( 589967 ) writes: on Sunday October 27, 2002 @03:07PM (#4542503)

This isn't really a programming error, but a user training error.

In the Airbus if the pilot tries to correct (use the flight controls) while the computer is engaged the computer will correct the pilot's correction. Unlike in a car with cruise control where if you hit the breaks it just cuts the cruise control. Many China Airlines planes have crashed due to poor pilot training in this regard. They weren't trained well enough to shut off the computer control before taking control of the plane.

I'm also sure someone can be a little more detailed than this, but it is, IMO, at least a design error that has caused hundreds of deaths.

As a side note, my Software Engineer professor refused to ever fly on a fly by wire plane, and was opposed to SDI simply because he didn't beleive that either had been or ever would be debugged properly. (if there is one error in every 10,000 lines of code, and it has 3 or 4 million Lines of Code, how many errors is that? His answer: too many to trust)

Share
twitter facebook
- - Re:Airbus (Score:3, Informative)
    
    by s20451 ( 410424 ) writes:
    
    Unfortunately, it has been conclusively proven by experience that the risk of an incapacitated pilot causing an accident is much, much less than the risk of a pilot and computer being at odds over the correct course of action in an emergency, or the risk of computer settings confusing the pilot. I prefer the Boeing design philosophy, which is that the pilot is the final authority on the operation of the airplane, not the computer. The pilot, not the software engineer, is on board the airplane, and therefore has a much higher interest in ensuring that the vehicle gets on the ground in one piece.
I've got a course for you (Score:3, Interesting)

by generic-man ( 33649 ) writes: on Sunday October 27, 2002 @03:09PM (#4542515) Homepage Journal

The article summary sounds oddly familiar: are you sure you're not in my Dependable/Survivable Systems [cmu.edu] class? We just covered Therac-25 and Ariane-5. There are a number of other readings [cmu.edu] on topics like dependability that you might find interesting.

Share
twitter facebook
deep c secrets (Score:5, Informative)

by lylonius ( 20917 ) writes: on Sunday October 27, 2002 @03:12PM (#4542535)

Aside from being an excellent text on C Programming, the book, "Expert C Programming", by Peter van der Linden, discusses several such bugs.

The 1993 $20 million SunSoft Asynchronous I/O bug.

The 1961 Fortran subroutine used to calculate orbital trajectories at NASA for several Mercury flights.

A discussion on the 1988 RTM worm.

Sun's first internationalized Pascal compiler corrupt date strings.

1961 Mercury software failure (. used instead of ,).

1962 Mariner 1 software failure resulting in $12 million rocket and probe destroyed.

among others.

Share
twitter facebook
- Re:deep c secrets (Score:3, Interesting)
  
  by seanadams.com ( 463190 ) writes:
  
  The 1993 $20 million SunSoft Asynchronous I/O bug.
  
  So I looked this one up [ccu.edu.tw]:
  
  x==2;
  
  The programmer's finger had bounced on the "equals" key, accidentally pressing it twice instead of once.
  
  Uh, right... bounced on the key. The story is very light on details as to what the problem was, how they found it, how it slipped past QA, etc, but clearly this was a PROCESS error and not a design flaw.
  
  If a software bug is holding up the shipment of $20M worth of hardware, then Sun had some real serious problems besides shoddy programming. You don't commit millions of dllars to building hardware when you know there's a bug somewhere. that's just absurd.
  
  The programmer is really the last person to blame for the $20M backlog. I'd blame QA for signing off on the code, I'd blame the C language and their compiler for letting such a stupid typo slip through without a warning, and I'd blame the suits for trying to fit the software development cycle to their hardware release cycle. If the programmers are to blame at all, it's for structuring the program in such a way that such a bug could easily slip through - the typo itself it forgiveable. With just a few assert()s here and there, this kind of bug is almost impossible to write.
  
  You just DONT build production hardware when the software isn't ready yet. The system needs to be tested as whole. If the hardware works with a previous rev of the software, then that's all you ship, period.
  - Re:deep c secrets (Score:5, Informative)
    
    by pvdl ( 621000 ) writes: on Sunday October 27, 2002 @10:10PM (#4544544)
    
    Sean,
    This was a real bug in Sun's async I/O library back in 1993. The bug had nothing to do with hardware. It had to do with sales. There was one specific customer who had some software that used the async IO library.
    
    By bad luck, their code tickled this bug (caused it to manifest). As a result, their application failed. By chance, they were on the point of buying $20M of sun servers. Recognizing that they had a huge amount of leverage, they told the salesman "Gee, we'd really like to sign the purchase order, but our app doesn't work, and we
    think it's a bug in your library."
    
    The salesman called thru to the kernel group and explained what was happening. The right developer (probably Dan) put aside ten other urgent things, and searched for this problem. It was not easy to find, but he did find it quickly, and issued a patch the same day. This almost never happens, but $20M is $20M.
    
    The customer tested the patch, and everything worked perfectly. The customer was happy. We were happy. And the salesman with the commission on $20M of server hardware was happiest of all.
    
    I don't understand that suggestion that "you just DONT build production hardware when the software isn't ready yet". Software is never ready. The FCS date is just a milestone on the continuum of evolving and improving the software. The truth is that all systems from all computer manufacturers are developed to the hardware schedule, and they ship as soon as the hardware is ready, in whatever state the software is.
    
    One of the biggest sins you can commit as a software developer is to cause a slip in the overall product because the software isn't ready.
    There are excellent economic reasons for this, but they are too long to go into here.
    
    I was going to write a new book based on my experience developing OS software for the SunBlade 100 and 1000 workstations. But to my astonishment, Prentice-Hall were wishy-washy on the project.
    
    I still think it would have been a terrific book, but it is such a large amount of work to write a book, that I am not going to take it on unless the publisher is 100% behind the project.
    
    My working title for it was "The Whole of a New Machine - How we built the world's fastest desktop computer" (which it was at the time it launched).
    I did develop the book synopsis, and wrote the entire first chapter. I put them on the CD that comes with the 5th Edition of "Just Java" if you want to see them.
    ---
    BTW, "Expert C Programming" gets the failed rocket software details 100% correct, unlike some of the corrections below.
    
    Parent Share
    twitter facebook
Bye bye credit purchases (Score:3, Interesting)

by cat_jesus ( 525334 ) writes: on Sunday October 27, 2002 @03:23PM (#4542598)

I worked for a programmer back in the 80's who made a mistake that caused all credit card purchases to disappear from the electronic journal. This meant that their purchases were not recorded on their credit card statements. Fortunately for the company the bug did not affect the recording of the transactions on the paper journal. This bug wasn't discovered for a few days and it took quite some time to rekey all the credit transactions.

Unfortunately this was not her first or last mistake of this magnitude. Retailers often see IT as an expense rather than an asset and are as cheap as possible. This has a tendency to cause shoddy programming since they hire as few programmers as possible and overwork them and often software is put into production without being thoroughly tested. At least this was the case when I worked in retail some ten years ago--I don't think I'll do that again.

But I am finding that insurance companies have the same philosophy.

Share
twitter facebook
I can't recomend comp.risks too highly (Score:3, Informative)

by Camel Racer ( 134168 ) writes: on Sunday October 27, 2002 @03:40PM (#4542692)

I can't recomend the risks site too highly. (redundent I know)
Risks To The Public In Computers And Related Systems
http://catless.ncl.ac.uk/Risks

On how to be 0wned by other people: Counterpane: Crypto-Gram . Shares with comp.risks the reframe of "I can't belive people don't learn from this"
Counterpane: Crypto-Gram
http://www.counterpane.com/crypto-gra m.html

Don Norman's _The Design of Everyday Things_ and website also offer insight on how to avoid UI failures relating to failures.
http://www.jnd.org/index.html

Also, get a copy of _Code Complete_ and/or _Code Write_ by Steve McConnell [pub: Microsoft Press Which is rich irony) Lots of mistakes and how to avoid them.
The cautionary note might be that most of these failures are human related at some level. Whether it be at the project level, or the UI level -- there are lots of ways to cause a failure.

Finally, avoid any kind of carreer in Software QA. There is no better way to just get kicked around at the expense of the people putting the bugs in the software in the first place.

Share
twitter facebook
NASA software bugs (Score:3, Informative)

by dstone ( 191334 ) writes: on Sunday October 27, 2002 @03:51PM (#4542756) Homepage

Someone here was claiming that NASA has never had a software bug. That sounded pretty unbelievable to me. And sure enough, it's not true. In the recent Mars missions alone, they had a bunch of software bugs resulting in things varying from non-fatal vehicle failures to outright loss of spacecraft.

Regarding the loss of the Mars Climate Orbiter spacecraft [nasa.gov], from nasa.gov: "The 'root cause' of the loss of the spacecraft was the failed translation of English units into metric units in a segment of ground-based, navigation-related mission software"

Also, here are several [nasa.gov] "software bugs" (their words) relating to the Mars Surveyor Lander Vehicle are described. These bugs were detected and fixed in the field (ie, Mars). At least one of the bugs caused a heater failure in the vehicle on Mars. This failure was recovered from.

Anyways, those are just two quickies, but NASA has their share of bugs. (And generally some pretty ingenious ways to reprogram and update vehicle software post-launch.)

On a related note, here's a paper from NASA entitled "The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software" [nasa.gov].

Share
twitter facebook
Always Mount a Scratch Monkey (Score:3, Insightful)

by calyxa ( 618266 ) writes: on Sunday October 27, 2002 @04:11PM (#4542863) Homepage Journal

a classic tale:
http://www.acme.com/jef/netgems/scratch_monkey.htm l [acme.com]
-calyxa

Share
twitter facebook
Good site (Score:4, Informative)

by dumboy ( 602836 ) writes: <mkuentz.gmail@com> on Sunday October 27, 2002 @04:16PM (#4542899)

Check out this site [tu-muenchen.de]
http://wwwzenger.informatik.tu-muenchen.de/perso ns/huckle/bugse.html

Share
twitter facebook
Insidious bug from the wayback machine (Score:3, Informative)

by rufusdufus ( 450462 ) writes: on Sunday October 27, 2002 @04:24PM (#4542938)

Back when C++ was new, there was an insidious problem with the syntax that never showed up during compilation.

if(c=='\') //check for \
slashfound=1; //found one, handle path
++index;

Code similar to this delayed shipment of a commercial product because it caused serious instability.

Share
twitter facebook
- Re:Insidious bug from the wayback machine (Score:4, Informative)
  
  by rufusdufus ( 450462 ) writes: on Sunday October 27, 2002 @05:18PM (#4543234)
  
  Perhaps I should point out the bug: the comment "//check for \" ends with a pre-processor line-entesion character (\), which effective appends the next line onto the current line, thus the code "slashfound=1" is effectively commented out and the next statment (++index) only executes if c=='\'
  
  Parent Share
  twitter facebook
One of the best resources I've found (Score:5, Informative)

by PghFox ( 453313 ) writes: <afoxson @ p obox.com> on Sunday October 27, 2002 @04:27PM (#4542957) Homepage
The Pragmatic Programmer: From Journeyman to Master [amazon.com], is one of the best resources I've found to avoid common programming mistakes. This book details many of the common errors we make as software developers and describes strategies for overcoming them. Having been in the field for close to two decades, I've found this book to be of immense value, and give it a high recommendation.

Some of the tips, which may appear obvious to some of us, include:
- Always Aim for Simplicity, Clarity and Generality
- Treat all of your code as if you're going to release it
- Keep subroutines small; break-up code as you go
- Document as you go, not after the fact
- Write tests as you go, not after the fact
- Fix bugs immediately; do not delay fixing them
- Do not duplicate any code, anywhere
- Separate form and functionality
- Subroutines should do one thing and do it well
- Make your work easy to reuse
Share
twitter facebook
Airport flight schedules (Score:5, Funny)

by PyroX_Pro ( 579695 ) writes: on Sunday October 27, 2002 @05:24PM (#4543264) Journal

Here are some of the best examples of windows crashing on high visibility systems that are relied upon:

in the street [squidly.org]
At the airport [bloomu.edu]
at the atm [piemaster.co.uk]
on CNN [piemaster.co.uk]
At disneyland [piemaster.co.uk]
On your phone [piemaster.co.uk]
In an airplane [pyroxpro.com]
At the bus stop [dropbear.id.au]

Share
twitter facebook
Lets Not Forget the Best... (Score:5, Funny)

by Hott of the World ( 537284 ) writes: on Sunday October 27, 2002 @06:05PM (#4543456) Homepage Journal

Slashdot Math!

cause we all know 50 + 1 - 1 = 49!

Ok, that was lame, go ahead and mod me down...

Share
twitter facebook
Mars Pathfinder (Score:3, Interesting)

by Kerg ( 71582 ) writes: on Sunday October 27, 2002 @06:48PM (#4543647)

The little "RC" NASA sent to explore the surface of Mars had a nasty bug in its threading system (priority inversion problem in critical code section) that caused total system resets every 20 minutes or so.
You can read about it from James Gosling's home page [sun.com] (also has info on Arianne 5 [sun.com]).
Luckily the engineers were able to upload a patch to Mars. That's remote debugging/patching for you :-)

Share
twitter facebook
F-16 AOA and WOW (Score:4, Interesting)

by taaminator ( 185731 ) writes: on Sunday October 27, 2002 @07:25PM (#4543834)

"Flight instruments don't lie"

First, BEFORE YOU LEAVE THE GROUND, pilots are taught that instruments don't lie. Specifically, when the human inner ear is placed in flight, things go wrong (the inner ear canals are static, not dynamic, devices; the fluid has no dampening or rate sensors). When there is no external reference, the inner ear canals adjust to the eye's visual presentation. It's called the 'leans.' Bad joo-joo. Many a perfectly good aircraft has been flown into the ground because the pilot believed his ears and eyes and not his instruments.

Second, IN FLIGHT, angle-of-attack (AOA) is a spectacular indicator of where your airfoil exists within (or outside) the flight envelope for your aircraft. Inside the flight envelope, you can seek best range (mpg) or best endurance (loiter) or best climb.

In most aircraft, the angle-of-attack indicator is a manual instrument (on the skin is a sensor which looks like a big euro-style handle and it runs to an indicator in the cockpit).

Many pilots are correctly taught to 'fly' the angle-of-attack.

Third, ON THE GROUND, when you land, you use the aircraft shape as an airbrake. You hold the aircraft nose off the ground as long as possible to create drag.

Fourth, ON THE GROUND, when you land, you do not want to hold the aircraft nose too far off the ground or the tail will scrape the runway and your fitness report will reflect and you'll be the butt of bad jokes at Snopes for eternity.

The AOA is used to assist in the performance of aerodynmic braking. The aircraft performance manual publishes the tried and true range of AOAs for aerodynamic braking. [It also indicates when too much AOA will ding the aircraft.]

Aerodynamic braking is part art and part science and requires accurate instruments.

Enter the F-16 ... it has an electronic AOA.

F-16 pilots were taught to fly the flight direction indicators to land.

However, many old and new pilots fell back on the old AOA once the wheels touched the ground to do aerodynamic braking.

Suddenly, F-16 tails were scraping along the runway at an alarming (and expensive) rate.

[As an aside, the problem was probably ignored until a senior officer ground off a few inches of aluminum THEN there was a problem.]

The programmers who wrote the AOA routines were rightly told that the AOA is used in flight. So, when the AOA detected that the aircraft had placed weight on the wheels (weight-on-wheels - WOW), it was programmed to quit working. Unfortunately, it kept the last AOA reading ... no matter what the real AOA was.

Pilot flies, pilot lands, pilot believes instruments, pilot scrapes multi-million dollar aircraft's tail along runway.

The programming solution was simple: when there was WOW, fade the AOA.

This was another case when contracts pit spec wording against spec intent against functional application and understanding of how it's supposed to work ... Fortunately, it was expensive and not lethal.

"Why did they call you 'sparky' and why are you driving school buses in North Topeka?"

Share
twitter facebook
My favourite quote on the subject (Score:3, Funny)

by iapetus ( 24050 ) writes: on Sunday October 27, 2002 @08:02PM (#4543979) Homepage

The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents.
-- Nathaniel Borenstein

Share
twitter facebook
New Slashdot Category? (Score:3, Insightful)

by po8 ( 187055 ) writes: on Sunday October 27, 2002 @09:29PM (#4544357)

Could we have a new Slashdot category entitled Ask Slashdot To Do My Research/Homework For Me? Then I could mark this category unread and avoid some annoyance.

There is so much information readily available on the subject of software failures online and in scientific and popular publications. (See other responses to this question for examples.) IMHO, the questioner should go look for the answer to this kind of question directly before bugging the entire Slashdot audience; the editors should enforce this policy.

Share
twitter facebook
- Re:New Slashdot Category? (Score:3, Insightful)
  
  by Mac Degger ( 576336 ) writes:
  
  I'm kinda sick of seeing this kind of comment. If it where to hold any merit, WTF would 'ask slashdot' be for? To me, the whole purpose of the 'ask slashdot' catagory is to plumb the experience of the people who frequent this place. If you're not allowed to do that, what would you use it for?
London Ambulance disaster (Score:3, Insightful)

by os2fan ( 254461 ) writes: on Sunday October 27, 2002 @09:46PM (#4544430) Homepage

There was a disaster in the dispatcher software that was written for London Ambulance. This was documented in a book on computer disasters.
The system did not collapse per se but progressively became bogged down by a series of poor design issues and implementation issues.
What happened was there was a memory leak, in that not all the memory used when a call was processed was released. This meant that each call chewed up a small part of core.
As the day wore on, this loss of memory started to make the system run slower, and created more calls as users started to worry about the non-show of the ambulance.
Meanwhile, back at the control centre, the operators started getting blasted by messages about over-due ambulances, and other system warnings. They were spending time simply dismissing Error dialogues.
By the end of the day, they were still dealing with the emergency issues notified at 12.00.
Of course, in the inquiry, there were many different management and design issues to be addressed, including the reliability and scalability of the software. [It was a Visual Basic program.]
I have seen a number of instances personally, most of these tend to be ignored by management keen to see the system up and running. The most often case for dismissal of problems is "teething problems", and "Luditism".
In practice, the real issue here is the UI. Not so much "flash chrome", but that the buttons and so forth will actually do what the user expects them to do. The user must be able to understand how to process and correct errors in relation to the application data itself. That is, if I enter 1200, and I mean 1130, I should be able to correct that.
The other disaster happening out there is that the program must be useful to the operator. So apart from entering data, the operator must be able to extract useful information from it. What the back end does does not really matter.
For example, a clerk who has to enter data on the screen each sale, in addition to operating the till, would be reluctant to use it. On the other hande, if the program is part of the till operation, and it provides information on how much stock is left, the clerk is more accepting of the change.
Implementing a system is not about plonking a pc with a program on a user's desk. It's about a user process. Users are looking for outcomes, not process. So if you want to go to a shop, you want to buy something, and the clerk wants to sell it to you. All the rest is administrivia.
Software design is important. So is user training.

Share
twitter facebook
Computer-Related Risks by Peter G. Neumann (Score:4, Informative)

by Malic ( 15038 ) writes: on Sunday October 27, 2002 @09:50PM (#4544453)

I think I've recommended this book serveral times on Slashdot. Simply put, THE collection of computing related horror stories.

http://www.amazon.com/exec/obidos/tg/detail/-/02 01 55805X/qid=1035769692/sr=8-13/ref=sr_8_13/104-4078 673-1863905?v=glance&n=507846

Share
twitter facebook
Sleipner A (Score:3, Informative)

by RallyDriver ( 49641 ) writes: on Monday October 28, 2002 @02:41AM (#4545611) Homepage

On a slightly different tack - the [umn.edu]
Sleipner A oil platform sank because of a bad design, caused by inaccurate computer based modelling (using an FEA tool inappropriately). In this case it was the data not the software.

Share
twitter facebook
- Re:Challenger (Score:5, Informative)
  
  by agentZ ( 210674 ) writes: on Sunday October 27, 2002 @02:48PM (#4542367)
  
  What happened to Challenger wasn't a programming mistake, but rather a case of not following policy. The solid rocket boosters were never designed to operate in cold temperatures. The result of working outside of design specs was catastrophic failure, yes, but that wasn't the result of a programming error.
  
  Parent Share
  twitter facebook
  - Re:Challenger (Score:2)
    
    by Lemmy Caution ( 8378 ) writes:
    
    If you've ever seen Edward Tufte's canned speech, he partially attributes the event to an inability of the engineers involved to organize and present their information in a clear way to communicate the nature of the problem.
- Re:Challenger (Score:5, Informative)
  
  by Pyromage ( 19360 ) writes: on Sunday October 27, 2002 @02:51PM (#4542389) Homepage
  
  Incorrect: This was not a programming issue. Nor was it a software issue at all. The problem was the O-ring seals in the SRBs (Solid Rocket Boosters). The manufacturer stated that they should not be operated under 53 degrees, and NASA overrode the recomendation and launched anyway. The expected happened.
  
  NASA hasn't ever had a hardware problem. Or a software problem. Ever. Every problem can be directly tied to one specific person being a fscking moron. The closest you could come is that Mars probe that crashed because of mismatched units. And that was just poor communication among the software guys.
  
  Parent Share
  twitter facebook
  - Re:Challenger (Score:4, Insightful)
    
    by GileadGreene ( 539584 ) writes: on Sunday October 27, 2002 @04:53PM (#4543127) Homepage
    
    NASA hasn't ever had a hardware problem. Or a software problem. Ever.
    Well, except for Mars Polar Lander, where the failure review board determined that the lander crashed because a flag indocating contact with the ground was not intialized to zero prior to the start of the retro-thruster loop. So the flag got set by the shock of deploying the landing legs, never got reset, and caused the thrusters to switch off as soon as they were on.
    I guess maybe you forgot about Apollo 13 as well (hardware)? Or the Galileo High Gain Antenna that failed to deploy (hardware)? Or the serious telemetry system problems they had with one of the Voyagers (hardware)? Or the faulty landing bag on one of the Mercury flights (hardware)? (was it Glenn's? I don't remember) Or that funky glitch in the landing computer during Apollo 11 (software)? You know, there's a reason that most space mission tend to be heavy on redundant hardware, and invest a lot of time and effort in fault protection software.
    Every problem can be directly tied to one specific person being a fscking moron.
    Well yeah, but that's the case with a lot of bugs, isn't it? Mistakes tend to be people issues.
    The closest you could come is that Mars probe that crashed because of mismatched units. And that was just poor communication among the software guys.
    You are at least correct about that - the problem was not a software issue. Lockheed Martin Astronautics was on contract to supply everything to NASA in SI units (which is what NASA uses for everything). LMA - or at least the part the caused this problem - uses English (Imperial) units internally, and neglected to perform the appropriate conversion before they sent the data on to NASA.
    
    Parent Share
    twitter facebook
  - - - Re:Apollo 1 / hardware fault (Score:3, Interesting)
        
        by ckedge ( 192996 ) writes:
        
        .
        I've read in-depth technical analyses of the Apollo fire, and I have an MSc in Physics.
        
        Before that, *no-one* knew that a spark in one place could cause a fire TWO FEET AWAY.
        
        (You get little hot bits of burnt dust floating around in a pure oxygen atmosphere, and they keep themselves hot enough to set something else afire quite a ways away. Of course things are *easier* to set fire to in that atmosphere as well.)
- Re:Challenger -- AT&T had the biggest gaff. (Score:2, Interesting)
  
  by telecaster ( 468063 ) writes:
  
  I'll agree that some programming errors *could* be fatal, but the one that comes to mind is the "2 line change" from AT&T that essentially knocked out phone service throughout the east and mid-west in 1990. It was the topic if many quality assurance seminars for the better part of the early 90's. I only remember it because it effected my company -- we lost phone service for 2 days. It was also one of those traditional "last minute changes" that someone clearly f*cked on...
  
  http://www.soft.com/AppNotes/attcrash.html
- Re:already.. (Score:5, Insightful)
  
  by Anonymous Coward writes: on Sunday October 27, 2002 @02:57PM (#4542436)
  
  Why not provide a link instead of saying "Oh yeah, I saw it way back when."
  
  You people who say "use google to find it" or "this was already asked" are worse than the people who actualy ask the question.
  
  Their only problem (if it could be said to be a problem) is ignorance, your kind however are a much better example of the problem of self-rightous lazyness.
  
  Parent Share
  twitter facebook
  - Re:already.. (Score:4, Funny)
    
    by Anonymous Coward writes: on Sunday October 27, 2002 @03:02PM (#4542468)
    
    couldn't find it either, huh ? ;)
    
    Parent Share
    twitter facebook
- Re:already.. (Score:5, Insightful)
  
  by Java Pimp ( 98454 ) writes: on Sunday October 27, 2002 @03:22PM (#4542592) Homepage
  
  Yeah it is probably in the archives. I've read it before.
  
  Problem is, the slashdot search engine sucks. I haven't yet been able to query the archives and actually find what I'm looking for without needing to dig through hundreds of irrelevent discussions. Sometimes I think it might be faster to just scroll back through the "Older Stuff" section.
  
  Or we could just have another discussion about it. :-)
  
  Parent Share
  twitter facebook
- Re:already.. (Score:5, Informative)
  
  by gmajor ( 514414 ) writes: on Sunday October 27, 2002 @03:27PM (#4542623) Journal
  
  http://slashdot.org/articles/02/05/02/1525210.shtm l?tid=128 [slashdot.org] - Debug your code, or else!
  
  Using google's serach engine provides better results for slashdot.org that slashdot's own search engine :-)
  
  Parent Share
  twitter facebook
- Re:One Word (Score:5, Informative)
  
  by Helter ( 593482 ) writes: on Sunday October 27, 2002 @03:09PM (#4542513)
  
  Come on now, that's the lazy way!
  
  How about citing an actual example of windows code bugs causing big problems? I'll go first. The USS Yorktown [gcn.com] had to be towed back to harbor when the NT system that was automating most of the ship crashed.
  
  Parent Share
  twitter facebook
  - not true (Score:4, Informative)
    
    by PissedOffGuy ( 612092 ) writes: on Sunday October 27, 2002 @03:37PM (#4542680)
    
    the database they were using faulted on a divide by zero. nothing to do with NT.
    
    Parent Share
    twitter facebook
    - Re:not true (Score:5, Insightful)
      
      by Anonymous Coward writes: on Sunday October 27, 2002 @03:40PM (#4542695)
      
      The database failure caused NT to crash. Good software design includes failure planning.
      
      Parent Share
      twitter facebook
    - - Re:not true (Score:5, Informative)
        
        by PissedOffGuy ( 612092 ) writes: on Sunday October 27, 2002 @04:02PM (#4542812)
        
        what are you talking about? the navy inquiry found no fault in NT. here, you try this: write a program that divides by zero and run it on NT. as with any other good OS, the program shuts down and the OS keeps going. user mode code cannot cause a blue screen, makes sense.
        
        in the navy's case the crashed program was enough to call the computers "down", and that makes sense too. the only thing that doesnt make sense is the attribution of blame to the OS for an app problem.
        
        Parent Share
        twitter facebook
      - How is an app the fault of NT? (Score:5, Informative)
        
        by deadsquid ( 535515 ) writes: <asx@deadsqui[ ]om ['d.c' in gap]> on Sunday October 27, 2002 @04:04PM (#4542825) Homepage
        
        Much as I dislike NT, especially in critical environments, this problem [info-sec.com] had nothing to do with NT. It had everything to do with bad coding.
        
        As we all know, information systems are only as smart as people make them. In the case of the USS Yorktown, an admin/operator entered data which caused a divide by zero condition in the application. Because the application did not have any exception handling built into it for a divide by zero condition, it died.
        
        You can't blame the OS for this. The application should have had exception handling built into it in a couple of places. It probably should have checked any new entries before comitting them to ensure the new data would not introduce such a condition, and the app itself should have had appropriate error handling to prevent a panic/dump when a divide by zero condition was encountered.
        
        If the app was coded by the same people on another platform, the end result would have been the same.
        
        Parent Share
        twitter facebook
        
        Wind was the *cause*. . . (Score:5, Insightful)
        
        by kfg ( 145172 ) writes: on Sunday October 27, 2002 @06:09PM (#4543478)
        
        of the Tacoma Narrows bridge falling. The *fault* was with the design, and hence, the designers.
        
        An extended bolt puncturing the gas tank during a rear end collision was the *cause* of Ford Pintos exploding. The *fault* was with the design, and hence, the designers.
        
        Both of these items could have been claimed to be perfectly free of design flaws while being used as "intended."
        
        This argument did not help the designers in not being found liable for their design flaws.
        
        The divide by zero error was the *cause* of the operating system's failure. The *fault* was with the operating system. The *operating system* crashed. An operating system failure is *always* the fault of the operating system, and hence, its designers.
        
        Read any textbook on the design of operating systems and in the first page or two you find some sort of statement along the line of, " A faulty app should never cause the operating system to fail." This is correct design.
        
        Let me repeat. If an app fails, it is the fault of the app. If the operating system fails, no matter what an app has done, it is the fault of the operating system. An operating system must *assume* apps badly written by complete incompetents.
        
        It doesn't matter what operating system. Windows, Linux, Mac or just the beads on your abacus.
        
        * It is the responsibiltiy of the operating system not to fail.*
        
        The fact that such failures can be explained away as the fault of the app by people who should know better makes me grieve for the state of engineering these days. It can only result in products being produced with greater and greater "craposity" factors eventually resulting in a culture of complete "crapitude."
        
        KFG
        
        Parent Share
        twitter facebook
        
        Re:How is an app the fault of NT? (Score:3, Informative)
        
        by rakslice ( 90330 ) writes:
        
        Okay, but the only specific failure cited in the article has nothing to do with NT.
  - Re:One Word (Score:3, Insightful)
    
    by seattle2napa ( 609603 ) writes:
    
    I love how complete bullshit gets moderated to +5 Informative on slashdot. Why not do a tiny bit of fact checking and slam people for misinformation rather than praising them for anything negative having to do with Microsoft???
- Re:the harrr-rrrrror (Score:5, Informative)
  
  by s20451 ( 410424 ) writes: on Sunday October 27, 2002 @03:26PM (#4542613) Journal
  
  US shooting down Airbus 320
  
  You're referring to the destruction of Iran Air flight 655 by the USS Vincennes near the Strait of Hormuz, on July 4, 1988. For one thing, it was an Airbus A300 (bigger and older than an A320). The failure there was mostly in human decision making, not in the AEGIS radar system, which faithfully reported that the airliner was travelling at 450 knots on a steady bearing towards Vincennes, roughly four miles outside the commercial air corridor, and not broadcasting IFF information (which of course they wouldn't, as a foreign civilian airliner). It was the officers of Vincennes who interpreted this information as a threat, misidentified the target as an Iranian F14, and destroyed it.
  
  Parent Share
  twitter facebook
- Re:Can We Say Google? (Score:3, Offtopic)
  
  by Gordonjcp ( 186804 ) writes:
  
  Please stop all this "STFW" crap. It's easy to search the web, but there may be people out there reading slashdot who haven't put their tale of programming woe on a web page. In the time it took you to post that crap, you could have thought for a moment, then not said anything.
- - Re:Incorrect function usage. (Score:5, Insightful)
    
    by antis0c ( 133550 ) writes: on Sunday October 27, 2002 @04:36PM (#4543007)
    
    No, I meant read it until you understand it. I don't want anyone working for me that doesn't think understanding documentation is a good thing or doing something the correct way rather than "it works so I might be doing it right."
    
    And there's a difference between not being able to code and understanding a particular function. I may read a function's man page 2 or 3 times to make sure I understand correctly what is going on. Not nessesarly because I'm incompetent, but because the wording my be confusing (wow, confusing wording in a manpage? Who would have thought..). That doesn't mean every single function for a particular language requires you to read the documentation for it multiple times. I assume nothing. Assuming something leads to bugs and insecurity. I've been programming in C for many, many, many years. When I do a little PHP programming to create some web interfaces I don't assume that just because both C and PHP have a function called strlen, and the general documentation says it returns the length of a string, that they work identically. So I read the entire strlen documentation for PHP to understand exactly whats happening. It only took less than a minute, but now I'm not assuming. I know. This goes for lots of things. The more complex functions you use, the more important it is to fully understand them.
    
    The point is coding correctly is the most important skill to learn. I have friends that hack together scripts and programs from examples and snipits of other code and a little bit of their own code to glue it together, with little to no understanding of what they are actually doing. Then months later something breaks they can't fix and they act as if it was the author who wrote the example code's problem.
    
    No, it's there fault. Not because they hacked together examples, but because they didn't take to the time to make sure they knew what the examples were doing, that the examples were implemented correctly, and that they understood exactly how the code in the examples worked.
    
    Take a look at OpenBSD's philosphy. [openbsd.org]. You can learn a lot from it.
    
    Parent Share
    twitter facebook
- Re:Code Vaults (Score:3, Funny)
  
  by sg_oneill ( 159032 ) writes:
  
  Oh yeah...
  At a previous job , we where having some after work drinks, and I started fking around with a RAD app we had developed for a military contract. In a fit of semi drunken bordeom we whacked in lots of pink fluffy clouds and a "my little pony" logo on the boot up screen.
  
  Forgot to restore it.
  
  Next morning the mil guys came in to look at how the prototype was going, and on boot up, up pops "my little pony" with all the little clouds and all. Extremely campy.
  
  Khaki guy not impressed.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

On Fox tonight @ 8pm (Score:4, Funny)

Re:On Fox tonight @ 8pm (Score:3, Funny)

The book "Fatal Defect" (Score:5, Informative)

Mars Orbiter Lost Over Metric Conversion (Score:4, Informative)

Mars Orbiter Lost Over Metric Conversion (link) (Score:2, Informative)

Here are my Top 4: (Score:5, Informative)

OT: Scuds and Patriot missile defenses (Score:5, Interesting)

Re:Toilet glitch (Score:5, Funny)

Re:That is NOTHING -- 10,000 died in Bhopal, India (Score:3, Insightful)

Re:That is NOTHING -- 10,000 died in Bhopal, India (Score:5, Insightful)

Re:That is NOTHING -- 10,000 died in Bhopal, India (Score:4, Informative)

Re:That is NOTHING -- 10,000 died in Bhopal, India (Score:3, Insightful)

10,000 dead in Bhopal, India != Concern (Score:3, Funny)

RTM Worm (Score:5, Interesting)

You forgot the part that went wrong. (Score:3, Interesting)

Re:RTM Worm (Score:5, Interesting)

Y2K? (Score:5, Insightful)

Re:Y2K? (Score:4, Funny)

You just can't win, can you... (Score:4, Insightful)

Re:Y2K? (Score:3, Informative)

Re:Y2K? (Score:4, Funny)

Look out.... (Score:3, Funny)

How about the AT&T Switch failure in NY? (Score:5, Informative)

It was a bad break in C code (Score:5, Informative)

See RISKS Digest (Score:2, Informative)

Risks digest (Score:5, Interesting)

Also on UseNet (Score:3, Informative)

One time this guy (Score:5, Funny)

Pleanty of examples here: (Score:2)

I had a professor (Score:5, Funny)

Re:I had a professor (Score:5, Funny)

RISKS Digest (Score:4, Informative)

Why, the world's favorite mail client, (Score:5, Interesting)

Re:Why, the world's favorite mail client, (Score:4, Funny)

Re:Why, the world's favorite mail client, (Score:5, Insightful)

That was an easy setup (Score:5, Interesting)

Re:That was an easy setup (Score:3, Informative)

Re:Thanks! (Score:3, Funny)

Failures (Score:4, Informative)

Pretty sure this was posted earlier on slashdot (Score:2, Informative)

A Great Story (Score:5, Interesting)

That's kind of silly (Score:4, Insightful)

Re:That's kind of silly (Score:4, Informative)

Re:That's kind of silly (Score:4, Interesting)

Re:A Great Story (Score:4, Informative)

Re:Why this cant be right... (Score:4, Insightful)

Re:Why this cant be right... (Score:4, Informative)

Shared download class (Score:5, Funny)

Easy, (Score:5, Funny)

Don't be so narrow (Score:5, Insightful)

Re:Don't be so narrow (Score:4, Insightful)

Train collision (Score:5, Interesting)

Re:Train collision (Score:3, Insightful)

Non-life threatening, but interesting bug... (Score:5, Interesting)

Airbus (Score:5, Interesting)

Re:Airbus (Score:3, Informative)

I've got a course for you (Score:3, Interesting)

deep c secrets (Score:5, Informative)

Re:deep c secrets (Score:3, Interesting)

Re:deep c secrets (Score:5, Informative)

Bye bye credit purchases (Score:3, Interesting)

I can't recomend comp.risks too highly (Score:3, Informative)

NASA software bugs (Score:3, Informative)

Always Mount a Scratch Monkey (Score:3, Insightful)

Good site (Score:4, Informative)

Insidious bug from the wayback machine (Score:3, Informative)

Re:Insidious bug from the wayback machine (Score:4, Informative)

One of the best resources I've found (Score:5, Informative)

Airport flight schedules (Score:5, Funny)

Lets Not Forget the Best... (Score:5, Funny)

Mars Pathfinder (Score:3, Interesting)

F-16 AOA and WOW (Score:4, Interesting)

My favourite quote on the subject (Score:3, Funny)

New Slashdot Category? (Score:3, Insightful)

Re:New Slashdot Category? (Score:3, Insightful)

London Ambulance disaster (Score:3, Insightful)

Computer-Related Risks by Peter G. Neumann (Score:4, Informative)

Sleipner A (Score:3, Informative)

Re:Challenger (Score:5, Informative)

Re:Challenger (Score:2)

Wind was the cause. . . (Score:5, Insightful)