What is the Worst Tech Mistake You Ever Made? 503
"In the interest of full disclosure, this is mine:
I was working at a Fortune 50 bank as a consultant. I was due to go on vacation for a week and the company did not have webmail. I decided that I would try forwarding emails to my corporate account. (I know this was a bad idea, and probably against several corporate policies.) I set it up so that any email that came in would forward to my consulting company's account. My mistake was I also left Delivery Receipt on. This was not Microsoft, it was Lotus Notes. The system began forwarding the incoming mail to my account. But then it would get a Delivery Receipt, which in turn would be forwarded to my account, which would generate another delivery receipt, ad infinitum. When I got back from vacation they claimed I had brought down the email system for 4 hours. This incident caused the bank to stop allowing consultants to set up email rules. What's your story?"
File errors (Score:3, Interesting)
And the library did not have the system source media anymore so we spend the next day looking for any machines with a similar version of the deleted file and moveing them back by hand.
rm (Score:0, Interesting)
A small one (Score:5, Interesting)
Thankfully, that's the worst I've done so far.
rm -rf * (Score:3, Interesting)
...five minutes later after coming back from getting coffee: D'oh!
I actually did this once... while logged in as root... at the top level in /home... on a production server. Thank baphomet for nightly backups!
Hopefully none of my clients are reading this. :-)
One that I saw... (Score:5, Interesting)
They were running 13 servers at remote locations (and I mean remote, as in out in the boonies 4 hours from nowhere on back roads) and these servers were unpatched, had out of date or innactive anti-virus and were connected to the net via a combination of satellite and dedicated (always on) dialup. Their communications were secured with nothing more than Windows 2000's built in VPN.
Needless to say, my audit report told them that they had big beefy powerful angels on their side since they hadn't yet had a noticable intrusion. (They had no way of detecting one, but at least the servers weren't hosting porn sites.) I warned them that a virus or worm would come along though and knock the whole thing out. The CIO scoffed at my report, called me an alarmist and said that my opinions were right up there with the Y2K doomsayers.
When Slammer hit, I had described the vulnerabilities and outcome so accurately that this guy actually accused me of writing it myself. Took the whole corporate network down and they couldn't bring it back up until their techs visited each site. It took two teams seven days to get to all the sites. The company lost 6 business days, three customers and a months worth of transaction records.
Needless to say the CIO was demoted (they didn't fire him, which I consider itself a major tech mistake) and had me re-issue my audit report which they then followed to the letter taking every precaution I suggested.
Getting a cheap power supply (Score:2, Interesting)
FoxPro-based MRP & Bad Networking (Score:3, Interesting)
My mistake was to give the techie "thumbs up" under pressure. I folded to the "We needed this yesterday" argument despite my misgivings about the software. I paid for that mistake for the next year in slavish tech support. We became the software company's test bed as we found bug after bug. The software "worked", but operator efficiency dropped, and uptime was sub-optimal. "Customization" caused problems, etc., etc.
The second mistake I made was to attempt to use VPN over Broadband with Citrix MetaFrame. Although MetaFrame was a pretty secure and slim protocol for remote desktops, the Internet provider on the remote site had horrible latency problems and was run by a group of amatures. I should have stuck with the original Sprint frame relay proposal.
Morals of the story: don't let PHB push you into a solution you don't trust, and when network reliability is important, pay for assured quality of frame relay.
My Commander told me to kill the network (Score:5, Interesting)
Commander thought I was brilliant, and so did I. I had fractured our network into at least 10 different domains. No one could talk to anyone, effectively "simulating" an enemy jamming attempt. It would take hours to restore the network, with many mad commo guys having to drive about with Pluggers, early GPS devices, to restore each radio to propper time.
Then a tank flipped. Someone died. No one could call for help. I am so damn smart.
No moon black, At 2 in the morning, in an upside down tank, the gunner figured out how to put his radio in plain text to call for help. It took him almost half an hour.
Doing work twice must be better than once.. (Score:2, Interesting)
in 2000, a co-worker was migrating a large Catholic Diocese, one of the top ten say, from Novell to Microsoft (I still don't know why) as I had somewhat purposefully(on my part) been asked not to come back for a while (but that's another, dumber story).
Anyway, not having done any such migrations before, after thoroughly RTFM, he set up, almost entirely correctly, the migration service and began moving users. The syncing tool was set to run just before backups, so that the backup would reflect that days migrations and updates.
It was supposed to go like this: copy all files from the Novell directory, nightly, to the new user directories on microsoft shares unless the Microsoft file was newer (hence indicating that user was migrated) and eventually all users, over the course of a week, would be migrated and the sync turned off. everyone transparently suddenly works with microsoft shares and la di da off they go.
It was an excellent plan with the exception of forgetting to check the little box that made sure that newer files were not overwritten with the old ones from the (now defunct) novell servers during syncing. So every night the old files would overwrite then newer ones. People started to complain about the third day that their changes to documents and such weren't "sticking", and on the last day of the migration, we figured out what had happened.
So every night, before backups, the newer files were being overwritten and then backed up. This included the Accounting, Newspaper articles, judgements, spreadsheets, EVERYTHING. For a whole week, 1600 users lost their data and it wasn't backed up on purpose. Oops. Funny thing though, our company kept the account and what remains of that company still works on it to this day!!
What happened to the co-worker? Well we all just kinda laughed it off and that 19 year old kid became the second youngest CCIE up to that point in time, and a year later got his second CCIE in security and is making comfortably north of 120k/yr now.
-- This sig has a cholesterol count of 680... higher is better right?
Ahh, stories. (Score:5, Interesting)
I wasn't in
My next command was 'ls'. It returned: unable to find
AAAAARGH!
I now know how to solve that under solaris. Under
Ever since then, my prompt has had my current directory in it. That experience certainly made me more careful.
Better (or worse) was when a stupid service rep came in to replace a bad CPU on a sun e10000. The idiot shut down the sub-system, and powered off the board correctly. He then managed to pull out the wrong board, despite the blinken lights. Of course it was the peoplesoft domain. Running year end reporting.
AAAAARGH!
Re:Damning evidence (Score:4, Interesting)
This also gives you time to ponder the wisdom of first running a SELECT statement with the same WHERE clause and comtemplate whether you want to do this.
I ruined a phone book (Score:3, Interesting)
I mean, what, I'm supposed to proofread the entire phone book by myself?
Anyway, the software used some kind of crazy soundex routine to "fix" addresses that it wasn't able to resolve, and thousands of people ended up with completely incorrect address information. The book went to press, was distributed, and a day later the phones were ringing off the hook. We had to pick up the old books, fix the data, schedule more press time (no easy feat), re-print, and re-distribute.
Total cost to correct was around $1M, got my ass chewed royally, but managed to keep my job anyway.
Must be doing *something* right!
Suggestion to avoid accidental "rm"s (Score:4, Interesting)
However, accidentally separating a wildcard from text is an infrequent mistake that can cause much pain. For example, typing rm -rf *
Zsh, by default, will complain at you and ask you if you *really* mean it if you use a bare wildcard with an rm command. Invaluable, and has saved my ass a few times.
Another 'mistake' that got 40 students detentions (Score:2, Interesting)
I had read/write access on one of the folders on the public drives because I requested Apache and PHP for one of my assignments that other people wrote in notepad with all those annoying javascript tricks they ripped from sites and the IT staff couldn't figure out how to install it. So, they gave me a folder and I installed it myself. Needless to say, I think they forgot I had read/write access on that folder, and I started to hide various files in there, some legimate like Dev-C++. And some not so legimate like some games. Some I made myself, and some not (Like this really addictive game where you have to dodge all these little dots. Sounds simple, but aint, especially with some special dots that home in on you, 2x speed dots, etc etc. It was Japanese tho, but yeah, great game).
Anyway, like in the second-last week of school, they started catching people who were playing games. In one day, they caught 40-odd people, and thank God I wasn't one of them because I was too busy practicing for a coding competition. I was able to get out of my class to get into the computer room where another class that was the same grade was there, playing a game that involved insane amounts of clicking. It was so obvious, with all that mouse-clicking going on, why wouldn't the teacher notice? They were supposed to be working on javascript...
Anyway, the next day our grade had to go to the hall while a teacher called out student IDs... It's surprising people didn't hate me for that, heh
And I managed to get some more people in trouble after a really simple HTML page I coded that was able to show the photo and past subject details of a person when you gave it a student ID was passed around. Apparently it was confidential data, but there was this link on the school intranet that said "Student Profile" that let to a page that worked like mine, and it worked two years ago. They found out and removed the text box, but...
One day I was bored, and went to that page again and viewed the source and found a line that contained a hidden value in a form named 'student_id'. So I just coded a page that posted to the same page, with a textbox for student_id, and voila! It worked.
The original page had "Example of a SQL query in ASP" for the title, too. It's amazing how bad the IT department is at my school...
P.S. I haven't been caught for either of the events, yet
Re:Damning evidence (Score:4, Interesting)
My biggest mistake was in my first programming job years ago. I intentionally wrote an infinite loop into a program that was running on a very powerful (for the time) reasearch unix box used at the Naval research lab where I had an internship. It was a sonar imaging optimization routine and I would let it run for short periods (10-30 seconds typically) and then CTRL-C it to force it to stop and inspect the log file to find the results. I was new to unix and so I would use "ps" as opposed to "ps -aux" to see what processes I had running. I had multiple sessions up and managed to leave one of my programs running, switched sessions, ran ps which showed no processes running and went to lunch. The sysadmin was also a meeting and then lunch. When I returned I had a bunch of nastygrams telling me to kill my job immediately, not to run processes that hog the CPU because other projects couldn't use the system and to get approval before running long running jobs because the CPU time was billed (this was around 1985). I actually sat down, ran ps again, saw no job, and wrote back saying I didn't know what they were talking about. The sysadmin (who had returned from lunch) came over to visit me and educated me on a whole bunch of things.
try it from a cronjob (Score:3, Interesting)
I once added the following to a cronjob
rm -rf $foo/*
My intention was to wipe contents of a directory that I was reusing. Unfortunately "foo" was unset. The cronjob ran overnight with rm -rf traversing every NFS mounted drive in the company. I remember coming in at 10 the next morning and thinking "christ what kind of idiot deleted all of my files?", and then "shit! that idiot deleted everyone's files" and then "shit that idiot is me!".
Ever since then I usually do something like
rm -rf ${foo:?}
mkdir $foo
Later as I recovered my composure I started thinking "Now why can't those idiots set their umask correctly?".
The only positive aspect of what happened was that it revealed a weakness in the backup procedures being following by the IS department.
Personally I count my self lucky to have had the benefit of such a humbling experience w/out loosing my job.
Re:Damning evidence (Score:4, Interesting)
I've adapted that idea to a lot of other situations; my SQL queries always start out as "-- delete ..." until I'm sure about what I'm typing.
Re:rm (Score:3, Interesting)
mv
I managed to get it stopped, but not before
~~~~~~~
Brought down the house, so to speak... (Score:3, Interesting)
We noticed that one of the filesystems that held the log files for an Oracle Application Server (two machines, shared storage) was filling up.
At this company, the security wannabees gave no one root access, but gave sudo privs to all UNIX admins. No big deal, huh? Well, they gave permission to everything in
Anyhow, my boss asked me to clear out the rotated logs in an attempt to free up some space.
I logged on to one of the two boxes and went to the directory in question. I typed "rm *.*"... Permission denied. Bummer. I guess I'll have to use sudo.
I typed in 'sudo chown [myid]
I got my 'attaboy' and continued working.
After about an hour, we went to lunch (boss went to lunch with me almost daily.) He gets a call on his cell from the PHB (although, to be fair, 'balding head boss' would be more appropriate.) He said that the OAS cluster for the largest app we supported was down.
After about 30 minutes of investigation and head-scratching on the part of my teammates still at the office, my boss got another call. One of my teammates asked him "who is [my id here]?"
My boss asked me if I knew, and my heart nearly exploded. I told him it was me.
I didn't even think to mention the change I made as a possible cause because so much crap happened every day that I forgot about one project about 5 minutes after completing it. I always fess up immediately when I make a mistake, so my boss knew I wasn't trying to hide anything...
Apparently, the server crashed when it had to rotate the log file (too large) and couldn't write to the directory. It wouldn't come back up again (with a completely non-descript error message, of course) after the crash for the same reason.
I'd left the directory permissions set to my user id. D'oh!
What makes this funny (in that sick kinda way) is that this app server crashed constantly, and the higher-ups tried to make themselves look good by being concerned (even though no business loss was actually incurred.) They always wanted a root cause analysis for every crash, and they were all the same - "unknown. vendor support not available because software is past end of life."
The higher-up jumped on this opportunity to make a freaking "oh my God, this guy is so dangerous" case out of it because it gave him something concrete to go to his higher-ups with, after so much "idunno" action.
I was given a written warning (my boss was forced to do so.) He smiled and laughed with me over the stupidity of it.
Re:A long time ago (Score:1, Interesting)
We had a nightly schedule of doing tape (round-reel, no less) backups of all client databases and partial backups as well (modified files since last partial backup) before starting nightly batch processing. To accomplish this, we had to kick all the users off the system, one at a time manually. Well since I was Mr. Hot Shot and had been reading the man pages online, I had come across a new command and figured to give it a try.
This is where the problem started. These machines were from the 70's and 80's mostly, and this new command - Ctrl-A Logoff, apparently wasn't tested. Plus I was doing this without authorization. To make a long story short - yep, everybody got logged off the system all right - including me, the Operator. I couldn't get back on!
I started getting panicky. Ctrl-A Logon was supposed to allow logins again. But apparently there was a bug in the OS, and the system was apparently frozen.
Ended up having to call my manager and IPL the system, delaying processing by about 2 hours IIRC. My manager was pretty cool for the most part, but she made it clear that things like this should Never Happen Again. I was in danger of being fired over that one.
Ah, those were the good old days. I still have dreams about them sometimes. We did _everything_ manually - checking in tapes, hanging them up in the library, starting jobs, tracking everything (start times, end times, backed up, printed) on paper -and- entering tape info in the computer database, as well as our own printing - but it was good times, good times.
Bobby, Art, Debbie - anybody from the old HP Data Center that might be reading this - I miss yaz.
One that nobody's posted yet (Score:2, Interesting)
Anyways: back in my post-college, pre-moving-to-Portland days, I worked at Radio Shack, and had unofficial but responsible assistant manager status after a year or so. Among other things, closing duties included putting a long-play videotape in the VCR attached to the store security cameras. No big deal, it was right by the PC in the back office where you closed everything out, impossible to forget and nothing every happened anyway. Until, of course, the night I forgot to do it, which also happened to be the night I got a call from security around 1 AM, to let me know the alarms had been triggered and I'd have to go down to meet the police and see what had happened. About a $1000 loss in stolen display merchandise, and no evidence. Oops...
Re:easy... (Score:2, Interesting)