Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?
Government Software Science

Ask Slashdot: How To Encourage Better Research Software? 104

An anonymous reader writes "There is a huge amount of largely overlapping but often incompatible medical imaging research software — funded by the US taxpayer (i.e. NITRC or I Do Imaging). I imagine the situation may be similar in other fields, but it is pronounced here because of the glut of NIH funding. One reason is historical: most of the well-funded, big, software-producing labs/centers have been running for 20 or more years, since long before the advent of git, hg, and related sites promoting efficient code review and exchange; so they have established codebases. Another reason is probably territorialism and politics. As a taxpayer, this situation seems wasteful. It's great that the software is being released at all, but the duplication of effort means quality is much lower than it could be given the large number of people involved (easily in the thousands, just counting a few developer mailing list subscriptions). No one seems to ask: why are we funding X different packages that do 80% of the same things, but none of them well?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How To Encourage Better Research Software?

Comments Filter:
  • by Anonymous Coward on Friday April 29, 2011 @04:15PM (#35978902)

    I do medical imaging as my day job. The parent understates the "spec" problem -- its just as much a testing problem. The typical spec I work against is "create a tool that distinguishes this disease state from some other disease state and from healthy normals with optimal power". Optimal power is, of course, only defined by the results you get or against other software (probably that measures different facets of disease). Moreover, the spec gets driven by log10 increases in image numbers --- that is 1:10:100:1000:10000 images. So the original spec is generally an idea for a few images -- then as the idea gels the sample battery size is increased. A lot of places don't have 100+ image sets -- particularly for cutting edge imaging methods. There's also a catch-22 -- in general if you know how to detect algorithm failure you'd build that in to the code. By the time you get to testing on 1000 subjects there's enough code in place that it's hard to justify a rewrite using "proper SDLC". (Go off and re-read Joel on Software about the value in "rewriting" software!) Besides do you want the creative people managing software development or do you want them moving on to the next great idea?

    As far as the original poster's whine -- I don't buy the "didn't have git and hg". SCCS for example wasn't pretty but certainly worked in the particle physics community which was globally distributed from the 1980's. There was a lack of sharing for two reasons: 1) if you are competing for customers and or grant money you publish the idea but don't give away the code (it's your competitive edge) 2) if you have a new idea its often the case that the available code you could find wasn't worth the effort to merge. Now one of the problems is that there is a huge buy-in for most of the toolkits -- its hard, for example, to simply lift a function out of ITK to use elsewhere. If you want to use ITK you have to buy-in and create ITK apps. It's also non-trivial to drag a function from some other framework into ITK. (This is not to pick on ITK, it's a good toolkit; it applies to most other frameworks too.) Moreover, there are a couple of different classes of image processing users -- those who are worried about whether software works (or seems to work) and those who worry about whether its right. Ideally you want both, but testing for "works" is different than testing for "right".

    Heck even up until 6-7 years ago many labs had their own image format used in processing. DICOM data comes off the imaging device -- but DICOM is a very flexible standard. (Here flexibility means that about half the stuff you need to know to really do large scale processing is stored in well defined locations; the rest is vendor specific and vendor software revision specific.) So most toolkits munge the incoming data into some standard format -- simple formats sound great, but can often lack sufficient detail for a particular analysis. The Mayo Analyze 7.5 format, for example, spent years as a ubiquitous standard, but couldn't sanely store oblique images. Its at least settling down to a handful of decent storage formats which helps with interop.

    Medical image research is not software engineering.

Happiness is twin floppies.