How Fast is Your Turnaround Time? 418
petrus.burdigala writes "I work for a mid-sized commercial software company (~20 Mloc) and we are frequently challenged by our supervisors to get fixes around the clock. Overall, we manage to get a 'bullet-proof' patch in about 4-5 weeks (from coding->QA->Build/Packaging->shipment), which I consider not so bad. But the other day, we got an urgent request from our support team to come up with a decent fix in 48 hours. I think they're a tiny bit unrealistic. So I wanted to get feedback from my peers: are we doing that bad? It takes months for other software vendors to issue zero-day exploit fixes, are our customers being unreasonable?"
four or five WEEKS? (Score:4, Interesting)
Exploits should be a high concern for any company
We don't do "box" software (Score:5, Interesting)
It is uncommon, but not unheard of to have an 8 hour fix. In cases of customer data vulnerability, legislation has been made such that if we are aware of a problem, we have an automatic injunction against us continuing to do business unless the problem is resolved. So when we have a security flaw, our bank stops working untill it is fixed. So yeah 48 hours would have people fired for sure.
Compliance/security are the only two things that can spark a release with less than 72 hours notice though.
Management strategy (Score:5, Interesting)
Yeah, your turn around time seems good and yes, the customer's request is beyond industry norm.
That might mean one of three things:
One: Customer is being foolishly optimistic.
Two: The entire industry is bad about turn around time, and can, if pushed improve it to 48 hours.
Three: Customer needs it really quick and is hoping to get it quicker by asking. They know 48 hours is well beyond the norm, but are hoping you can do it anyway, because the more time it is unpatched the more they are screwed. They know that if you don't ask, you can't get, so they are at least 'asking'.
Me, I think it is a combination of all three. Customer is being a bit optimistic, the industry is bad about turn around time, and also the customer knows it is a bit optimistic but is making the request anyway in hope you will provide amazingly good service.
Extreme Programming (Score:2, Interesting)
Depends on the team and the bug. (Score:5, Interesting)
We had a better patch later, but the initial emergency patch was VERY fast.
On the other hand, if the initial bug report is "Sometimes the program hangs, no, I don't know when. Maybe every week or two." -- well, that's gonna be hard. Exploits generally have the advantage that an exploit is by nature at least somewhat reproducible, and the hardest part is often getting a reproducer. I've had it take six hours to develop a usable reproducer, and three minutes to develop a patch.
Release time depends hugely on process and procedure. IMHO, an ideal procedure would have some kind of way to get a Temporary Patch out into the field ASAP when there's an exploit.
That's a big "it depends" (Score:4, Interesting)
The usual bug fix cycle depends on complexity, impact, and risk. High risk of breaking things and low impact? Generally gets scheduled for the next release (4ish times per year). Low complexity and risk but medium impact? Code today, regression test the rest of the week, push this weekend. On average, mission critical bugs can get fixed in 8 hours or less around here, small to medium stuff is put on a weekly(ish) cycle with *lots* and *lots* of testing, and large stuff gets rolled to the next major release, unless it just can't wait that long.
Turnaround time (Score:5, Interesting)
We generally get fixes for real bugs out within 24 hours, unless the problem is traceable to the OS, the only factor really out of our immediate control. Even then, we do a quick evaluation to see if we can replace the OS function. Over the years, we've replaced quite a few of them, but rarely within 24 hours.
But we know our code backwards and forwards; I wrote the majority of the current codebase myself, and I can generally get to within a few lines of the problem just by a bug's description... the rest is a matter of minutes and testing. This app is very large - comparable to Photoshop in terms of feature count - but it is also very stable after 15 years of whack-a-bug and a continuous drive to make the internal structure as orderly and regular as possible.
It is my observation that the more programmers you have involved, the slower your turnaround time (for everything from bugs to features) will be. Likewise the larger the entity, the slower it will generally move. Almost every layer of management and corporate compartmenting disease will contribute to slowing down the process.
For the apps that I use that I have had the experience of reporting bugs, it is my general experience that bugs often are never fixed at all. One browser, "Omniweb", truly my favorite in terms of features, has bugs that make it essentially unusable for me. Crashing, slowing, lockups and so on - really serious problems. I've reported them, they never were fixed, in fact the software was never updated. Eventually, I just went back to firefox. Then as Leopard came out, after years of doing nothing, they released a "Leopard version" in which, perhaps, I might find those bugfixes if I looked... but as I say, I have moved on and no longer have any enthusiasm for the product. Slow bug repair (or ignoring them) is synonymous with telling your customers you really don't care what kind of experience they have with your software.
Apple, with all their emphasis on customer experience, does this too. They've had bugs in hand for very long periods where they simply don't address them. If your bug isn't something they think will affect a lot of people, it isn't likely to be fixed. I've not yet purchased Leopard, preferring not to catch early-adopter syndrome bugs myself, but when I do, I would not be the least bit surprised to find you still can't refresh a remote share that's been changed by the remote OS; that the wifi differs hugely in compatibility between PPC and Intel hardware; that mail still hoses the sent mail box based on the return address; that shell fonts are poorly rendered; that shell ANSI compatibility is still broken; that the OS still provides locked-up beachballs at the most inconvenient moments; that the OS still puts the wrong things away on the HD when RAM gets tight, and consequently becomes massively unresponsive... Basically, Apple doesn't have good control of their OS, are unable to respond to bugs in a timely fashion, so much so that they triage out bugs based on report counts, and the common patter is that Apple provides a great customer experience. So while my own experience is that bug fixes are important and can be quick in turnaround, here's Apple showing us that you can make a complete thrash out of the entire bugfix issue and still come out smelling like roses. So is a few weeks too long? Probably not, if you have a good marketing department. :-)
Re:That works both ways. (Score:5, Interesting)
Then you're going to have a bigger problem! It's the same thing in any kind of relationship, just bowing and scraping and always saying "it's my fault" is going to cause bigger problems in the future than just saying "nope, we're not gonna fix that. or "sure, well fix it, but not now, you'll get your patch when it's tested properly, in the meantime, do this instead"
Re:Web based (Score:3, Interesting)
Sometimes (just sometimes) it's obvious what the bug is, and it's obvious that testing is meaningless. Would you want to hire a company which does meaningless things to please you?
Re:Parent is right. (Score:5, Interesting)
The fact that the parent was moderated down just shows me that the arrogance, contempt, and stupidity in corporate America is alive and well - especially in IT.
If you had a single license and no paid support... Well... We might have a general update next month with a public patch. We might not. Have a nice day.
Of course when you sell software as a service then thats how it works.
As a side note, one customers feature request created a completely separate build just for that customer which was annoying to the programmers but since they paid good money for it, they got what they asked for. Although... I remember the programmers eventually including the features for everyone else as a optional package just to avoid that so in the end even the single client customers benefited.
This reminds me of an "unreasonable" customer (Score:4, Interesting)
So, I saluted and said I'd try really hard for 3 weeks for the first version, then about three months longer for a version that would work all the time. Which is what happened.
Do you know the impact on this customer of not having the fix that soon? Maybe it's worth it to them...
Bug fix turnaround time (Score:4, Interesting)
Really, it depends on your environment, and what needs to be done.
I'll use one of my web site as an example. It's all PHP and Perl, so ya, it's programming (I'm sure people will argue this).
Since I wrote all the code, I know it all inside and out. If you say "there's a problem [here]", I know exactly what file to look in, and what code to look for. I've banged out changes, tested them, and put them into production in a matter of minutes.
On a high traffic web site, we had a java applet which was being used by about 25,000 people per day. For little things, I'd change the code, test on all applicable platforms, and roll out the change in a few hours. Even then, the bosses were sometimes displeased with the time it took. Since I was careful to test, I never rolled out bad code, so I was never pushed into the long QA cycles.
Working with one company, things were a lot different. It went something like this.
1) Propose the change to your manager, with supporting documentation.
2) Manager would go to the project coordinator (i.e., customer liaison)
3) project coordinator would go to the customer
4) customer would approve the change.
Up to here was anywhere from an hour to a week. Sometimes the customer would put stipulations on the change, such as "there's a big event happening, or going to happen, don't make the change until X time."
5) document the proposed changes
6) hold a meeting with development, QA, the project coordinator, and management. Discuss the potential
changes.
1-3 days later
7) hold another meeting with the same people to rehash the changes.
1-3 days later
8) hold another meeting with the same people to rehash the changes.
9) Write the changes. Make them available to the QA team.
3-7 days later
10) Explain to the QA team that the errors they are experiencing with the fix have nothing to do with the fix, they were preexisting problems with another piece of code.
1-7 days later
11) hold another meeting with development, QA, project coordinator, and management, to explain that the error has been fixed with the supplied changes. The other problems are elsewhere.
1-3 days later
12) hold a strategy meeting to plan on how to fix the other problems.
13) fix the other problems, and break more things.
1-3 days later
14) have QA test the other changes.
14) roll back changes in step 13
15) beta test the previous changes, and notify customer
16) Customer balks at other pre-existing problems.
17) Repeat steps 5 to 15 again, until the customer gets tired of balking.
18) Implement changes.
Then start the process all over with step 1 to fix the other pre-existing problems.
The solution really is...
1) Identify the problem.
2) Gather together the appropriate staff who won't talk outside of your group.
3) Fix, internally test, and implement the resolution.
4) If anyone asks, there was no problem to start with, and you were all really working on steps 5 to 15 of the previous plan on another problem.
Funny how that works.
But, it's a matter of, is it a trivial fix, or something that requires serious rewriting? Did someone miss trapping invalid input in one line, or is it a poor coding practice through all of the code? Is it an included library that simply needs to be upgraded and recompiled?
Re:That works both ways. (Score:2, Interesting)
Except when it's defective by design - then I don't want to hear "we're not gonna fix that," because it's going to send me to a competitor.
For example, when Microsoft said they "can't" remove IE from Windows, because it's integrated into the OS. Well, who chose to integrate it? Or when Apple says they "can't" fix the certificate bug in Safari, because of the limitations of Keychain. Who designed Safari and Keychain, guys?
48 hours is pretty reasonable if you ask me. (Score:2, Interesting)
And by immediately, I mean between 10 minutes and 10 hours -- if it's going to be fixed at all.
Certainly there are those minor cosmetic bugs that no one cares about -- client included. And there are those other usability bugs that have acceptable work-arounds. Those two get fixed in the next set of upgrades -- if the client ever wants upgrades.
But anything that actually affects the client's on-going business has to get fixed absolutely immediately.
And we're capable of this for a number of reasons:
- we build with "developer empowering" code, so it's easy to make small changes to significant areas.
- we don't have as many bugs and seems to be the average
- in general, much of our code promotes "self-healing" of user data
- sensitive data routes (financial, encryption, security, money, accounting) get extra care during initial development, so fewer bugs are emergencies.
As the owner of a business, I'm with the client on this one. If my web host had a bug that stopped me from writing a cgi script, I'd need it fixed pronto. If my pipe bursts, I need a plumber immediately. If my bank is closed when I need money, it's a problem. Any client whose business is affected by your bug is being very patient if they're willing to wait two days to resume their regular business operation.
You're stalling their business.
That said, client education is very important. That's why I've collected a list of almost 100 news articles of huge bugs in huge companies -- banks, NASA, various militaries, etc -- so when clients say rediculous things like "I'm paying for software, why does it have bugs", I can point to a billion dollar fighter jet, with four nuclear warheads, and say that it has bugs too. But that's not to get more time, it's just so that they understand there will be bugs, and they'll be fixed right away.
And yes, that's 24/7/52. (I take off the last Friday in July, not that anyone wishes me well for it)
Re:Parent is right. (Score:3, Interesting)
The important thing is your customer facing person(for that matter your manager) should be aware that even if he or she thinks the bug is trivial and can be fixed in an hour always stick to the 2 weeks target. You can use the same bug as the reason why you don't want to rush the fix early. Tell the customer that you are not happy with the fact that your development process introduced this bug and don't want to repeat the same in the bug fix also and promise to deliver a high quality fix in two weeks time.
In my experience most of the customers are happy to know that the problem will not be repeated than the fix for current problem because they might already have found a workaround to move their business forward. Obviously they cannot stop all the business and wait for your fix.
By the way I was working in a company offering SaS and the customers daily business was dependent on the product. The customers used to have high expectation on the turn around time.
Re:how long is a piece of string (Score:2, Interesting)
In my experience, the turnaround time is mainly driven by the urgency of the customers and the inherent delays in the company. As a dev, security fixes rarely take more than a few hours to find and fix. The time between then and release is entirely dependent on the company's methods and procedures, the efficiency with which you can slip into the code tree and deploy hotfixes (as opposed to entirely new builds), and the urgency and importance of the customer (which drives the willingness of the company to ship something before all the normal methods and procedures have run their course).
For example, if you (the customer) are important enough to talk to dev directly, and are happy to use a beta build in whatever state the product is in as long as it fixes the specific bug, and you're important enough for management to not care about QA and other impediments, chances are you can have a fix in a day. If one of those is not true, you'll wait for the next regular update, whenever that may be (days, months, etc.).
Re:Turnaround time (Score:3, Interesting)
Re:That works both ways. (Score:3, Interesting)
4-5 weeks for a maintenance patch is fine, but if this bug is stopping your client from doing their work, you should be getting single fixes out the door much faster than that. Patches need to be prioritized. Go back to your last rev, fix only the problem in question and release that small subset as a hotfix.
IMO unless you're writing an operating system, I'd say that taking 4-5 weeks to release a hotfix would mean there's something wrong in your architecture. Perhaps the system was designed for a much smaller implementation and people have just been growing jungles in the existing framework to meet new requirements?
The project should be broken down in to manageably small subsystems. Each subsystem should be autonomous enough to work nearly independently of the other systems. That way you should be limiting the mainstay of your regression testing to the area of the bug itself. Each subsystem should have internal check code and hooks for testing. (a debug mode) Even the best systems still have Achilles heels, but they should be kept few and designed with expertise. Any well crafted system should have limited the damage by design.
This attention to detail in design isn't always possible, deadlines are funny like that. If companies and clients only understood that spending 2x the time up front reduces the backside time by 10x, our jobs would all be a little easier.
Many vendors (MS included) develop the patch and allow the users who call in with the problem to install at their own risk before exhaustive testing has been completed.
meh, off my soapbox, need to go to work myself.