Open-Source Bioinformatics Programs? 28
An anonymous reader asks: "This summer I have the opportunity to work in a bio research lab creating a web site for data about proteins. Part of my job is to do bioinformatic analysis of the proteins to determine what types of support their are for the preliminary gene predictions. I have been using DNA Stryder (a Mac program) for sequence alignments plus translations from DNA sequences to protein sequences, and I was wondering if any of the Slashdot crowd knew of similar programs for Linux? I have looked into Bioperl , Biopython, EMBOSS, and BioConductor, but they seem to be more oriented towards servers and less towards stand-alone applications. What programs would you suggest, especially those that might be geared more towards biologists rather than computer scientists?"
Freshmeat (Score:2, Informative)
If you go here and have a look you will see some interesting programs that meet your needs. I was looking for some biochem programs the other day in this web site.
Read the O'Reilly book (Score:4, Informative)
This has lots of useful information and references and is a great starting point. It might be a bit dated, though.
Re:Read the O'Reilly book (Score:3, Informative)
PS: Personally haven't checked this out, but you might want to take a gander at O'Reilly's Sequence Analysis in a Nutshell: A Guide to Tools [oreilly.com]
Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases pulls together
OS Bioinformatics software (Score:3, Informative)
On the flip side, when people more interested in the biology than the technology write software, they tend to write just enough to get the job done and then stop. Software from this camp is often buggy and has a bad UI or no UI at all. It gets the job done, but only if you know exactly how to use it.
Anyway, you might want to take a look at R - http://cran.stat.ucla.edu/ [ucla.edu]. It's more geared towards statistics but it does have some protein modules.
Hold on (Score:3, Interesting)
I'm not a bio researcher, but I am a programmer and I work in the field. My father is a biology researcher and we've been talking about putting together a GUI app that interfaces with various tools to provide an easy interface to common tasks that bio researchers have: basecalling, vectormasking, clustering, sequence alignment, along with a nice GUI that lets you play with the results (search the results, order them, associate them with different databaes, relate them to gene ontologies - essentially a powerful set of data visualization tools).
It's all focused around EST management. Our goal is to get an app that a non-power-user can get up and running, out-of-the-box, for managing small sets of libraries.
It's pretty obvious that there are a very solid set of OSS base tools that implement the algorithms for doing analysis on ESTs, but in terms of glue apps that bring all of the tools together into a cohesive whole, there's not much out there.
What IS out there is hopelessly complex to expect an average bio researcher with little time on his hands to get up to speed with ("Download and install mysql? wtf?").
The problem is that most of these tools are geared towards large institutions with dedicated bioinformatics departments. They have the resources to hire a couple sysadmins and programmers and set up a high-throughput management system. Most small-timers don't really have the resources to get these apps working. I want to write something targeted specifically to smaller labs.
I just started writing it last weekend.. so it's not like there's much there yet - OTOH I have programmed a high-throughput EST management system for my work, so I have a good idea of most of the design issues.
An OSS app in this area would rock. It's a great opportunity to add to the wealth of OSS tools in the field, and I think it would solve a real need.
If you want to talk about it, reply to this post and also send me a message kav062 at yahoo dot com.
-Laxitive
Re:Hold on (Score:2)
Re:Hold on (Score:2)
We looked at various applications (magpie, emboss, gboss, others) and database schemas (BioSQL, others from application papers). We
Re:Hold on (Score:2)
It not entirely clear from your description what your goals are, but it sounds maybe something like the GCG web interface (Wisconson Package). That program is not open source, and I think EMBOSS is essentially an open source versoin of the GCG command line programs. A quick search reveals at least a few projects attemp
Perl is not optional (Score:2)
If you have a real interest in bioinformatics, I cannot stress enough that you should learn Perl. Even if you are a biologist by background, Perl is not like Java or C, and stresses more on getting things done rather than on abstract computer science concepts.
Once you learn Perl, using something like BioJava will give you all you need to handle sequence data. For instance, you could build a data pipeline that you use on all of your sequences of interest, instead of a graphical tool which pretty much force
ApE plasmid editor (Score:1)
it has many of the same functions as Strider, plus some that strider dosen't have.
Works on windows, linux, OS X etc
http://www.biology.utah.edu/jorgensen/wayned/ape/ [utah.edu]
Try Chimera and BioKnoppix (Score:3, Informative)
That said, take a look at Chimera [ucsf.edu], which is an app written at UC San Francisco. It is mostly useful for visualizing, but I know there is a sequence viewer, and some other tools in there too.
Now, for all the aspiring bio geeks I give you BioKnoppix [upr.edu]. Go download and burn the ISO. Then use that CD to boot any x86 box into a full Linux install with many of the popular bioinformatics tools already installed.
Enjoy!
-Steve
My ex-job (Score:2)
My ex-employer had produced a standalone/server webserver that integrated many of these tools, but market forces, and a lack of VC forced them to
Re:My ex-job (Score:1)
Unfortunately, this is what happens when you try to do this: You get a biologist, a bioinformatician, an object/data modeler and a DBA (and 20 other people of various disciplines) in a room and they spend years concocting the 'perfect' set of protocols/models/standards. They attempt (and fail repeatedly) to create a model that is all encompassing for whatever process th
Subject Doesn't Match the Article (Score:2)
Here's a clue, you stupid twit -- not all linux programs are open source.
Oriented Towards Servers? (Score:4, Informative)
Secondly, translations? Database searches? Sounds like you're doing some very basic Bioinformatics work. Not to say that your research isn't meaningful, just that the problems you're approaching are easily solved by a computational biologist. For example, here's a snippet of Bioperl code that will read in a set of GenBank sequences, translate them and print the results to a new file:
my $seqin = Bio::SeqIO->new( -file => 'myseq.gbk', -format => 'genbank' );
my $seqout = Bio::SeqIO->new( -file => '>translated.gbk', -format => 'genbank' );
while ( my $seq = $seqin->next_seq ) {
$translated_seq = $seq->translate;
$seqout->write_seq( $translated_seq );
}
Seems pretty simple, right? There are similar, simple wrappers around BLAST, FASTA and some other common algorithms in computational biology. Check out the Beginners HOWTO [bioperl.org] on the Bioperl website, it explains Bioperl without requiring previous CS experience. I think it's a good intro, but I also wrote it so I'm slightly biased.
If programming is not your style, check out JEMBOSS [sourceforge.net]. It's a Java-based GUI wrapper for EMBOSS [emboss.org].
Cheers and good luck.
Bioconductor... (Score:2)
Useful bioinformatics programs (Score:3, Informative)
Also keep an eye on POY [amnh.org] that does direct optimization on sequences. Also available for all platforms with BSD style licence.
For just viewing and manual editing of alignments there is BioEdit [ncsu.edu]. Free, but not open source. Windows only.
For a general sequence assembly/analysis/kitchen sink approach try the Staden Project [sourceforge.net]. Open source and available for Windows, Linux and OSX.
Hope this is useful. I have never worked with protein sequences, but I have done a lot of DNA sequenceing and alignment!
Shameless plug (Score:2, Interesting)
My company, CLC bio [clcbio.com], just released a free bioinformatics application that works for Linux, Mac and Windows (not open source). It was designed with molecular biologists and biochemists in mind, rather than bioinformaticists. Thus, a lot of effort has been put into the user interface.
The program is a 0.9 beta and so far it only has basic functionality: GenBank searching, DNA/RNA to protein translation, alignment, tree reconstruction, graphical viewers, and a few other things. More will follow in the coming
Two pointers... (Score:1)
http://biolinux.org/ [biolinux.org]
http://apt.bea.ki.se/packages.html [bea.ki.se]
The sites have RPMs for most of the basic packages along with descriptions (some packages are a bit old, however).
Here's an easy one that works on Linux + Windows (Score:1)
Genezzo as an OSS database for bioinformatics (Score:1)
Genetic Data Environment (GDE) for Linux and Mac (Score:2)
GDE [bioafrica.net]
I haven't done intense work with it recently. It appears to be a GUI for a huge collection of software.
If you just need a program for molecular biology work (DNA sequence and protein sequence analysis, organization, and publication quality layouts) then I suggest you check out Clone Manager 7 [scied.com]. It's very pricey, but if your lab can afford it, it's a good piece of software. I know it's a *dows only program, but I can confirm that it works well with the latest version of Wi
Linux for Biotech cdrom (Score:1)
where to start (Score:1)