Ask Slashdot: Statistical Analysis Packages For Libraries? 146
HolyLime writes "I'm a librarian in a small academic library. Increasingly the administration is asking our department to collect data on various aspects of our activities, class taught, students helped, circulation, collection development, and so on. This is generating a large stream of data that is making it difficult, and time consuming, to qualitatively analyze. For anything complicated, I currently use excel, or an analogous spreadsheet program. I am aware of statistical analysis programs, like SPSS or SAS. Can anyone give me recommendations for statistical analysis programs? I also place emphasis on anything that is open source and easy to implement since it will allow me to bypass the convoluted purchase approval process."
R or WEKA ... Wait, What Exactly Are You Doing? (Score:5, Informative)
The other suggestion has a better GUI but is really heavyweight. WEKA [waikato.ac.nz] has helped me time and time again perform advanced statistical calculations [ibm.com] on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data [waikato.ac.nz] (don't worry, this isn't climate data). It's well documented [waikato.ac.nz].
These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck!
Re:R or WEKA ... Wait, What Exactly Are You Doing? (Score:3, Informative)
PSPP (Score:5, Informative)
What output do they want and what answer? (Score:4, Informative)
Blue skying the toolset is not gonna work. What output do they want, then figure out what tools can generate that output.
If the most important thing is inserting pretty graphs into newsletters, thats one thing.
If the most important thing is hard core data warehousing analysis (for a library?) thats another thing.
The other thing is what answer do they want? They're just looking for data to back up an unpopular decision or glorify themselves demonstrating their amazing management talents. So figure out what that is (by asking them?) and help them get the data they want. Don't give them a graph of declining circulation if they're trying to emphasize their brilliant leadership. Don't give them a graph of increasing student help, if they're trying to justify downsizing.
Do NOT stick with Excel (Score:5, Informative)
Excel and other spreadsheets suck at stats:
* Burns, P. (2005). Spreadsheet Addiction. [burns-stat.com]
* Cryer, J. (2001). Problems with using Microsoft Excel for StatisticsPDF [uiowa.edu].
* Pottel, H. (n.d.). Statistical flaws in Excel. PDF [coventry.ac.uk]
* Practical Stats (n.d.), Is Microsoft Excel an Adequate Statistics Package? [practicalstats.com]
* Heiser, D. (2008). Errors, faults and fixes for Excel statistical functions and routines [daheiser.info]
For a more comprehensive and technical discussion, see the papers by Yu (2008); Yalta (2008); and McCullough & Heiser in Computational Statistics and Data Analysis 52(10).
Re:R or WEKA ... Wait, What Exactly Are You Doing? (Score:4, Informative)
Why do you think R is not easy to implement? My company has been using SAS for a long time and we are finally making the change to R. As far as OP's requirements are concerned, I think R is way superior to SAS or SPSS because of its free, modular nature. It is clean, simple and suitable for a wide range of users. The commercial packages are filled with way too much business lingo garbage for me.
I personally think commercial support is overrated. I can install software on my own. I know how to browse through manuals and other information to find what I need. For a package like R, I almost always get any questions answered in at most few hours on online forums. So what exactly do I get from commercial support for my money? But, if OP needs commercial support, there is an enterprise edition of R by Revolution Analytics located here: http://www.revolutionanalytics.com/products/revolution-enterprise.php [revolutionanalytics.com]. Might be worth looking into.
Bottom line: R all the way.
Re:Maybe a slightly different tool (Score:2, Informative)
Agreed. Access is a sh*tty database but you seem to be saying that volume is your problem not functionality. However if you've got an Excel license you've probably got an Access license already and Access will allow you to re-use a lot of what you've put together in Excel while handling the volume of data better.
Unfortunately I also agree with the other posters, if you're after more relevant advice you really need to give a bit more background on:
- your skill set (Excel user/VBA hacker/Stats major/Hardcore programmer)
- what do you mean by 'statistical analysis'? This is too broad a description
- the data you're using (volumes, sources, complexity)
Another option if volume is your only problem is to not use all the data. Take a random sample and work from that - this is common practice even for people/orgs with high end stats packages.
Re:Go Ahead and List Them Then (Score:4, Informative)
R with RKWard (Score:5, Informative)
I will also echo the sentiment that, by itself, R is fairly low-level and typically requires at least some simple programming to get what you want.
However, there is a very nice graphical front end for R called RKWard (http://rkward.sourceforge.net/). With RKWard, importing and exporting data, running basic analyses on it (descriptive statistics, linear regression, t-tests, etc.), and producing basic graphs is very straightforward and does not require detailed knowledge of the R language. Plus, RKWard is also a nice development environment for writing R code, so if you want to take your project further, you can easily do so. So, I'd recommend giving RKWard + R a look.
Re:R or WEKA ... Wait, What Exactly Are You Doing? (Score:2, Informative)