Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Education Open Source Stats

Ask Slashdot: Statistical Analysis Packages For Libraries? 146

HolyLime writes "I'm a librarian in a small academic library. Increasingly the administration is asking our department to collect data on various aspects of our activities, class taught, students helped, circulation, collection development, and so on. This is generating a large stream of data that is making it difficult, and time consuming, to qualitatively analyze. For anything complicated, I currently use excel, or an analogous spreadsheet program. I am aware of statistical analysis programs, like SPSS or SAS. Can anyone give me recommendations for statistical analysis programs? I also place emphasis on anything that is open source and easy to implement since it will allow me to bypass the convoluted purchase approval process."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Statistical Analysis Packages For Libraries?

Comments Filter:
  • by eldavojohn ( 898314 ) * <eldavojohn@noSpAM.gmail.com> on Wednesday November 16, 2011 @03:31PM (#38076772) Journal
    R [r-project.org] is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package [r-project.org] (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion ... what there is, however, is a massive collection [r-project.org] of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.

    The other suggestion has a better GUI but is really heavyweight. WEKA [waikato.ac.nz] has helped me time and time again perform advanced statistical calculations [ibm.com] on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data [waikato.ac.nz] (don't worry, this isn't climate data). It's well documented [waikato.ac.nz].

    These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck!
    • Re: (Score:3, Informative)

      Came here for the mention of R, and leave satisfied. R is an excellent choice.
      • R is good for analysis (although it has a steep learning curve) - but it seems to be that the poster has more of a data management problem than an analysis one. 'Administrators' are unlikely to be wanting a inference or projections, they will just be wanting informative series of data on usage (nice graphs and tables etc). So a good database solution is probably the most important step, then exporting tables into something that will make nice reports (Excel might be okay) is the next.

        I'm a statistician s

    • I too like R. You might link it with TINN-R (http://sciviews.org/Tinn-R/) to simplify some of the coding process. Last I had heard there was also some work on a GUI for R but I don't know if that's progressed very far.

      SPSS is fairly easy to use and I would recommend it over SAS for basic analyses, but, as parent suggested, it really depends on what you want to do. You might be pretty happy just downloading some Excel macros which can be found through web searches (or, better yet, writing your own).
    • by garcia ( 6573 )

      I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck!

      Sounds like you may be correct. More information is definitely required in order to recommend the proper product.

      However, R would definitely be my go to choice when someone is asking about SPSS/SAS. Speaking of that, being a SAS guy I really need to take the time to get R experience.

      Anyone with decent recommendations, aside from

      • Anyone with decent recommendations, aside from R's own website, where to do a quickstart when you're a SAS geek?

        This blog [statmethods.net] explains some of the stuff you do in R and as he does it, he compares it to SAS.

        Example:

        Unlike SAS, which has DATA and PROC steps, R has data structures (vectors, matrices, arrays, dataframes) that you can operate on through functions that perform statistical analyses and create graphs. In this way, R is similar to PROC IML.

        And here's an entire book on the topic [google.com] (although may be difficult to find)!

      • Anyone with decent recommendations, aside from R's own website, where to do a quickstart when you're a SAS geek?

        About Quick-R

        R is an elegant and comprehensive statistical and graphical programming language. Unfortunately, it can also have a steep learning curve. I created this website for both current R users, and experienced users of other statistical packages (e.g., SAS, SPSS, Stata) who would like to transition to R. My goal is to help you quickly access this language in your work.

        http://www.statmethods.net/index.html [statmethods.net]

    • R

      There is nothing that beats it on any platform. Some links:

      http://www.sr.bham.ac.uk/~ajrs/R/r-gallery.html [bham.ac.uk]
      http://addictedtor.free.fr/graphiques/index.php [addictedtor.free.fr]
      http://opencpu.org/ [opencpu.org]
      https://r-forge.r-project.org/ [r-project.org]
      http://hlplab.wordpress.com/ [wordpress.com]
      http://rseek.org/ [rseek.org]
      http://www.r-bloggers.com/ [r-bloggers.com]

    • In addition, mondrian is a good complement to R for some interactive data visualisations. http://rosuda.org/mondrian/ [rosuda.org] The OP really needs to make clear what he wants to do, though.
    • by Anonymous Coward

      Try SOFA (www.sofastatistics.com) alongside R. SOFA (Statistics Open For All) focuses on making some of the most important statistical tests easy to use and understand. It also has attractive charting and report tables. Disclosure - I am the lead developer of SOFA.

    • Re: (Score:2, Informative)

      by kiwigrant ( 907903 )
      Try SOFA (http://www.sofastatistics.com/ [sofastatistics.com]) alongside R. SOFA (Statistics Open For All) focuses on making some of the most important statistical tests easy to use and understand. It also has attractive charting and report tables. There are also videos, on-line documentation, and direct support from the developer. Disclosure #1 - I am the lead developer of SOFA. #2 I already posted accidentally as AC
    • I second R, and would also suggest adding in R Commander [mcmaster.ca]. Adds a fairly usable GUI simplifying lots of common tasks, while maintaining the flexibility of R.

    • I second the suggestion of R. I have only dabbled with it, but it is quite powerful and has a great community. You might also want to consider something a little more general purpose though. Python with the NumPy and SciPy packages can handle just about any statistical problem you want to consider and it has the versatility to do a whole lot more, such as handle any intermediate steps. It is completely free and you can download an excellent complete package at http://code.google.com/p/pythonxy/wiki/Welc [google.com]
    • .
      R would be the way to go for heavy lifting, or even LibreOffice which has a database function in it for regular things (Scientific Linux and CAELinux are packaged with R, R's gui's and some other useful tools, I recommend CAELinux and you can run it directly from the DVD so no need to install).. There's a book I found helpful "introductory statistics with R" by Dalgaard as well as the gui extensions: rcmdr, rattle, rapid-i, and rstudio noted farther down in another post. there is also the R Journal (jo
    • by ceoyoyo ( 59147 )

      SPSS and SAS aren't exactly point and click either. If you want to do serious stats, you're going to have to type. R actually has a fairly straightforward syntax, and is designed to be used interactively. Thre are also lots of good beginner tutorials.

      If the poster needs some data management help as well, there's rpy, which lets you use R from Python - all the power of a real programming language, including database access, linked to R.

    • It would seem to me like R would be like trying to use a sledge hammer to tap in finishing nails. Sounds to me like you need some sort of database that can be easily called when you need to analyze the data. Hiring a data modeler or some other similar IT consultant. Have them set up the db and pre-write the kinds of queries you will need. From there, what type of software you need is highly dependent on the types of statistics you are running. From the sounds of it excel will work just fine.
    • Good to see this as first answer, I came here to suggest R to see I'm not the only one. Go with R!
  • by bluefoxlucid ( 723572 ) on Wednesday November 16, 2011 @03:35PM (#38076828) Homepage Journal

    I find that libraries carry a lot of common information and not so much uncommon information. This sort of muckery seems to encourage concentration of information into a smaller and smaller realm, constantly sorting out first the never-used, then the minimally-used, to maximize volume of return but minimize the use of the library as a haven for obscure and long-forgotten knowledge. Effectively, like burning some books while not burning other books--removes knowledge.

    As with all things, there must be balance. A library where you don't increase holding of more useful texts is less immediately useful; although if you removed all the most used texts, you would have an interesting outcome... the obscure and oft-overlooked need retention, too.

    • Libraries don't necessarily enjoy removing materials from the collection, but the two main reasons to do so are to make sure we have current/accurate materials and make room in our always limited shelf space. (The first is of presumably higher importance in an academic library.)

      Unless libraries can get an unlimited budget for expansion of their physical space or off-site archives, weeding materials will be a necessary evil.

  • Try this giant list [wikipedia.org]

    From personal experience, I can recommend WINKS. It's ridiculously easy to use.

  • by MetalliQaZ ( 539913 ) on Wednesday November 16, 2011 @03:39PM (#38076880)

    Sage (formerly SAGE?) is an open source mathematical package that includes statistical functions. I wanted to add that to the usual mentions of R, etc.

    However, are you sure this is what you want? It sounds to me like your real problem is that you have too much data to store. If you're currently using Excel to process your data, and it has been working except that you are running out of space, perhaps what you really need is a database, like Access. If you want OSS, you can probably try LibreOffice, or engage a local student to design a web based system based on MySQL.

  • SAS is a great package but is probably prohibitively expensive. An open source version like R is probably more appropriate.
  • A good database? (Score:2, Interesting)

    by Anonymous Coward

    Hear me out. We deal with about 3 million data-producing elements and track in real-time to near-real-time. We ingest everything into MySQL (via macros, scripts, tools, etc.) and normalize the data on the way in. For analysis we simply query. Those queries may have their outcome displayed in a simple report generator, or (more often than not) via HTML5 Canvas graphs/charts, Cacti graphs, etc. What we're doing doesn't lend itself well to a SAS type solution. If you could use SAS for what you're doing,

  • PSPP (Score:5, Informative)

    by Geste ( 527302 ) on Wednesday November 16, 2011 @03:46PM (#38076946)
    Look at the free SPSS work-alike PSPP. http://www.gnu.org/software/pspp/ [gnu.org] Sounds like R might be a bit much for your needs.
    • Sounds like R might be a bit much for your needs.

      Agreed. Another good alternative is MYSTAT [systat.com], the free "student" version of SYSTAT. Note also that many academic institutions negotiate site licenses for SYSTAT, so you might already have the full version available to you.

  • Depending on the type of "analysis" you might be better off with something like PowerPivot. There's alot that you can probably gleen from your data without doing sophisticated statistics, but instead using PowerPivot to slice/dice/summarize/chart your data in different ways. It is easiest to use if you structure your data in a data warehouse/star schema fashion.

  • by vlm ( 69642 ) on Wednesday November 16, 2011 @03:50PM (#38076998)

    Blue skying the toolset is not gonna work. What output do they want, then figure out what tools can generate that output.
    If the most important thing is inserting pretty graphs into newsletters, thats one thing.
    If the most important thing is hard core data warehousing analysis (for a library?) thats another thing.

    The other thing is what answer do they want? They're just looking for data to back up an unpopular decision or glorify themselves demonstrating their amazing management talents. So figure out what that is (by asking them?) and help them get the data they want. Don't give them a graph of declining circulation if they're trying to emphasize their brilliant leadership. Don't give them a graph of increasing student help, if they're trying to justify downsizing.

    • Agreed. A lot depends on what you want to accomplish. "Analysis" can be completely different beasts from project to project. The term "analysis" is kinda thrown around loosely and encompasses a lot of things. So it's important to not dive into the analysis if you don't have a very specific goal in mind.

      • The library staff is currently working jointly with the school administration to determine what kinds of information we want to look at and analyze. Though it increasingly looks like the statistics we currently collect are going to be analyzed in more and various ways. I just wanted to take the initiative and come to the table with a potential solution in the form of a low cost software package capable of providing that functionality.
  • Stick with Excel (Score:3, Insightful)

    by syousef ( 465911 ) on Wednesday November 16, 2011 @03:50PM (#38077008) Journal

    Seriously, stick with Excel. You and anyone who comes after you would need to learn whatever statistical package you introduce. That is either overkill for the kind of data you're collecting and analysing, or it's a full time job requiring specialist knowledge for which they should be hiring someone else.

    Excel has a few bugs but for the most part it's very capable. Ensure you run the service packs and can install the addons that come with it (analysis pack). Get them to send you on advanced short courses for Excel and Statistics. If there isn't that kind of commitment there's no room for any statistical package.

    Almost all ask slashdot stories that are work related can be answered the same way - bad idea: you're already out of your depth and if you can't be bothered to google for the information the project is doomed.

    • Excel has a few bugs but for the most part it's very capable.

      Care to name some of those bugs? I have not come across a single one!

    • by Anonymous Coward on Wednesday November 16, 2011 @04:02PM (#38077170)

      Excel and other spreadsheets suck at stats:

      * Burns, P. (2005). Spreadsheet Addiction. [burns-stat.com]
      * Cryer, J. (2001). Problems with using Microsoft Excel for StatisticsPDF [uiowa.edu].
      * Pottel, H. (n.d.). Statistical flaws in Excel. PDF [coventry.ac.uk]
      * Practical Stats (n.d.), Is Microsoft Excel an Adequate Statistics Package? [practicalstats.com]
      * Heiser, D. (2008). Errors, faults and fixes for Excel statistical functions and routines [daheiser.info]

      For a more comprehensive and technical discussion, see the papers by Yu (2008); Yalta (2008); and McCullough & Heiser in Computational Statistics and Data Analysis 52(10).

      • by iroll ( 717924 )

        Did you even read the articles you linked? From Pottel:

        My overall assessment is that while Excel uses algorithms that are not robust and can lead to
        errors in extreme cases, the errors are very unlikely to arise in typical scientific data analysis.
        However, I would not advise data analysis in Excel if the final results could have a serious
        impact on business results, or on the health of patients. For students, itâ(TM)s my personal belief
        that the advantages of easy-to-use functions and tools counterbalance the need for extreme
        precision.

        Emphasis mine. I highly doubt that the OP's data require more than a couple of significant figures of precision. While their stats could influence resource allocation, differences of a few percent are unlikely to be deal-breakers--think about it; the library is likely to be dealing with budget items that range in the thousands of dollars, probably in blocks. You're not going to accidentally budget for a whole class based on a wiggle of a percent in

        • by syousef ( 465911 )

          Did you even read the articles you linked? From Pottel:

          He would not have been modded informative here if he'd actually read what he linked to. Some days this place really gets me down. If this is the level of quality at a site for geeks, no wonder society is in decline.

      • by syousef ( 465911 )

        PLEASE ACTUALLY READ WHAT YOU LINK TO.
        MODERATORS: LOOK AT WHAT YOU ARE CALLING INFORMATIVE.
        YEP, I'M YELLING. DEALING WITH STUIPIDITY IS FRUSTRATING.

        Excel and other spreadsheets suck at stats:

        That is one camp of thought. There are others. Every package has it's limitations

        * Burns, P. (2005). Spreadsheet Addiction. [burns-stat.com]

        Doesn't talk about never using statistics. Talks about misusing them by pressing them past their limits. "I know there are many spreadsheets in financial companies that take all night to compute. These are complicated and commonly fail. When such spreadsheets are replaced by code more suited to the task, it is not unusual for the computation time to be cut to a few minutes and the process much easier to understand."

        * Cryer, J. (2001). Problems with using Microsoft Excel for StatisticsPDF [uiowa.edu].

        Focuses on poor charting in the Excel 95 era. Title should be problems for using Excel for graphing. The article is a decade old. Excel has had several refreshes.

        * Pottel, H. (n.d.). Statistical flaws in Excel. PDF [coventry.ac.uk]

        Another article about Excel 97 and 2000. Decade old software. Many flaws since addressed, and new flaws added. Clearly Excel bashing was popular around 2000.

        * Practical Stats (n.d.), Is Microsoft Excel an Adequate Statistics Package? [practicalstats.com]

        This one suggests it's just fine for the submitter's purposes.

        "Excel’s limitations, and its errors, make this a very questionable practice for scientific applications. For business applications where questions might be simpler and precision not as necessary, Excel may be just fine"

        * Heiser, D. (2008). Errors, faults and fixes for Excel statistical functions and routines [daheiser.info]

        For a more comprehensive and technical discussion, see the papers by Yu (2008); Yalta (2008); and McCullough & Heiser in Computational Statistics and Data Analysis 52(10).

        Gets very technical, and I bet some of those remarks are valid, but if it's important you become aware of and work around the problem. If it's not, there is no problem. If you don't understand what you're asking Excel to calculate and why it might be wrong, it doesn't matter.

        The more you go into this, the more it requires specialist training. The idea that just replacing one software package with flaws and features you don't understand with another geekier more difficult product with flaws and features you don't understand is ridiculous. As is moderation on slashdot. The comments are being moderated by monkeys practicing to type up Shakespeare..

    • This is an awful advice which ignores everything the submitter asked for. http://www.practicalstats.com/xlsstats/excelstats.html [practicalstats.com]
  • R and Python (Rpy2) (Score:3, Interesting)

    by mpetch ( 692893 ) <mpetch@capp-sysware.com> on Wednesday November 16, 2011 @03:51PM (#38077026)
    I have grown accustomed to doing statistical analysis using Python and R using http://rpy.sourceforge.net/rpy2 [sourceforge.net]
  • by Raiford ( 599622 )
    Check out PDL (Perl Data Language). It may not be the most convenient solution but it's free and has a great, informed and responsive user group.
  • by LWATCDR ( 28044 ) on Wednesday November 16, 2011 @04:00PM (#38077142) Homepage Journal

    It almost seems like you are not doing statistics as much as creating reports from data.
    Maybe you should be using a database instead of a spreadsheet or a statistics program.
    The Uber geek way would be to set up a LAMP server and create a webased system.
    The more convent way would be something like Access.
    You can then use Excel to manipulate the data as needed or the database program.

    In the end if you know excel you may want to stick with it. I see people use Excel for databases all the time. Drives me a bit nuts but sometimes what ever works is just fine.

    • by jgrahn ( 181062 )

      It almost seems like you are not doing statistics as much as creating reports from data. Maybe you should be using a database instead of a spreadsheet or a statistics program.

      I don't see why even a database would be needed. "Increasingly the administration is asking our department to collect data on various aspects of our activities, class taught, students helped, circulation, collection development, and so on."

      Seems to me that information already exists in a library, and the report generation is the only thing missing. And possibly looking into the database on the 1st of every month and writing down the number of books on a piece of paper.

      Or a reply to the administration "S

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Agreed. Access is a sh*tty database but you seem to be saying that volume is your problem not functionality. However if you've got an Excel license you've probably got an Access license already and Access will allow you to re-use a lot of what you've put together in Excel while handling the volume of data better.

      Unfortunately I also agree with the other posters, if you're after more relevant advice you really need to give a bit more background on:
      - your skill set (Excel user/VBA hacker/Stats major/

  • by esme ( 17526 ) on Wednesday November 16, 2011 @04:05PM (#38077210) Homepage

    I suggest you post your question to the code4lib mailing list [nd.edu]. It's going to get you much more informed and practical advice. You might even find some people who already have a good workflow who will share their tools.

    -Esme

    • Agreed ... odds are, they're not running homebrewed circulation software and someone in the library community has tried to extract metrics from whatever they're using.

    • I suggest you post your question to the code4lib mailing list [nd.edu]. It's going to get you much more informed and practical advice. You might even find some people who already have a good workflow who will share their tools.

      -Esme

      I shall try exactly that. Thank you for directing to that mailing list!

  • Depending on how large your dataset is, you may have luck using Matlab (or the opensource gnu octave). These programs will let you do *whatever* you want with the data (plotting, correlation, fft, etc).

    With at least Matlab, there are some MySQL plugins available that will let you get data out of your database and into arrays rather quickly. And of course, both matlab and gnu octave let you import csv and plaintext datafiles.

    Here is the matlab plugin I have used very successfully (and it's open source. No id

  • by Anonymous Coward

    I love R, but if you want something that looks more like SPSS, you could try the free SPSS clone PSPP:
    http://www.gnu.org/software/pspp/

  • if a full stats package is a bit heavy, try python + http://www.scipy.org/ [scipy.org]
    below is using the ipython shell

    In [1]: import scipy

    In [2]: x = [1,3,6,8,9,4,9,0,5,3,6,8,6,8]

    In [3]: scipy.mean(x)
    Out[3]: 5.4285714285714288

    In [4]: scipy.std(x)
    Out[4]: 2.7957693986829897

    and if you need more than that you can really delve into its stats submodule http://www.scipy.org/doc/api_docs/SciPy.stats.html [scipy.org].
    • Mod parent up!

      While the examples this poster gives may seem too simple to be of much use in practice, the possibilities of using python are much greater, in the end, than learning some domain-specific language. Python has a much bigger ecosystem around it. For example, when you'd want to add a graphical user-interface, there are thousands of solutions to choose from.

      Plus, but this is personal, I think it is really a shame that the developers of those domain-specific solutions actually thought they needed to

  • by Anonymous Coward

    What is your ILS? Depending on what it is, you may already have access to just about all of what you need there along with Excel. Atriuum from Booksys has wonderful features like you are asking about, record tracking, and it exports to Excel very well. Voyager from Ex Libris had wonderful integration with Access and my boss could pull out some amazing statistics with it.

    If you don't have an ILS then seriously look at Atriuum as they are great for the smaller libraries.

    lordjim AT gmail DOT com

  • OK, this is a horribly shameless self-plug, but hey, it's directly relevant. I started two projects aimed at tracking reference statistics: Libstats [google.com], which is PHP-based and open-source. I'm also one of the founders of Gimlet [gimlet.us], which is hosted and closed-source, but provides a similar workfow.

    If you're looking to spend some time delving in code, Libstats is looking for maintainers -- I'm no longer working in libraries, so it's largely orphaned.

  • What you think is large might be trivial even for OpenOffice/LibeOffice.

    Also, the real solution might be to automate data collection and storage in a database. Manipulation would then sort itself out.

    If you're at a University, then you should go to the Math Dept and talk to some Statistics grad student or maybe even an econometrics grad student in the College of Business. Heck, there's probably Comp Sci undergrads looking for a project to add to their resume.

  • by PPH ( 736903 ) on Wednesday November 16, 2011 @04:39PM (#38077694)

    ... rand() serves most of my statistical needs.

  • I use and like JMP from SAS. They offer a free 30 day demo and I think it does a good job at data visualization and statistical modeling, or as they call it, discovery. It will interface with SAS, R, Excel along with various database packages for additional capability that may not exist in the core product. I found it pretty easy to pick up with a fairly active user base to help get started.
    • I agree whole-heartedly. I've been using JMP since version 2.0. Great for exploratory data analysis. SAS differentiates it from SAS proper by limiting the data sets it can deal with to RAM, but with 4GB of RAM common these days that's not likely to be an impediment.

      Almost twenty years ago I compared the sort routine in JMP to Excel's. 30K rows, 28 columns, sort on 3 columns. JMP took about 1% of the clock time Excel did.

      Academic pricing is pretty good.

  • Their product list is here [rapid-i.com]. In particular, I think you would be interested in RapidMiner and RapidAnalytics. WIkipedia has a good overview of RapidMiner [wikipedia.org].

    Video tutorials for both RapidMiner [rapid-i.com] and RapidAnalytics [rapid-i.com] are available on their website. Those videos are a great way to get a good sense of what the product line is capable of. Searching on YouTube will find plenty more that focus on specific use cases and more advanced functionality.

    All of their software is dual licensed with a GPL version and closed so

  • R with RKWard (Score:5, Informative)

    by binarstu ( 720435 ) on Wednesday November 16, 2011 @05:09PM (#38078226)
    I will echo the support for the open-source statistics package R. R is incredibly powerful, and in the natural sciences it is fast becoming the standard statistics software.

    I will also echo the sentiment that, by itself, R is fairly low-level and typically requires at least some simple programming to get what you want.

    However, there is a very nice graphical front end for R called RKWard (http://rkward.sourceforge.net/). With RKWard, importing and exporting data, running basic analyses on it (descriptive statistics, linear regression, t-tests, etc.), and producing basic graphs is very straightforward and does not require detailed knowledge of the R language. Plus, RKWard is also a nice development environment for writing R code, so if you want to take your project further, you can easily do so. So, I'd recommend giving RKWard + R a look.
  • It seems to me that all you need is descriptive statistics (change from last month, mean, min, max, etc and probably graphing). Using a general spreadsheet application like Excel or Calc will do the job just fine. Remember that Excel is designed to support business calculations and what you are asked to provide is exactly that! Using a dedicated statistics software for this task (in your environment) is a waste of resources. Full stop.

    However, the solution may not be straight-forward to solve in Excel or an

    • You're not suggesting a complex open-source application that will require intensive work and special skills to implement to solve a basic task? You must be new around here.

      • You're not suggesting a complex open-source application that will require intensive work and special skills to implement to solve a basic task? You must be new around here.

        I will concur with my colleague above.

  • http://www.sofastatistics.com/ [sofastatistics.com]
    • by Anonymous Coward

      http://www.sofastatistics.com/ [sofastatistics.com]

      Yes this might be what you are looking for to generate your reports. Free software and comes with video tutorials

  • by Virtucon ( 127420 ) on Wednesday November 16, 2011 @05:33PM (#38078606)

    I've used them all and in terms of engineering and academia, MATLAB seems to be where most theoretical prototyping is done. The license costs for academic/student use are reasonable but it's about $2K for a commercial single seat license. Octave is the MATLAB open source alternative and for most basic functions it does well however it doesn't have the extension packages available that MATLAB does.

    My favorite and one I use all the time is "R" because it does have great open source community support and there's not a lot it can't do.

  • As others have said, if you're mainly doing reports, stick with Excel or a database solution. Excel lets you look at your data from a variety of angles (pivot tables, etc), and has usable graphs. As usual, Microsoft has numerical issues, so you may get wrong answers under certain conditions, but hey, it's Excel.

    What is it that "anything complicated" means? Fancy graphs? Fancy partitioning/aggregation of data? Modeling and forecasting? Summary statistics? Graphs that aren't fancy, but Excel doesn't provide?

    A

  • I also work for a (relatively) small academic library, but our campus has free licenses for SAS and JMP. I had to go through hoops to get it (bureaucracy being what it is) but I use SAS all the time for inventory and usage data. It helps that I was a SAS programmer once upon a time, but I love it for its abilities to clean data as much as its statistical chops. Check around campus if you haven't done so already. You may find access to one or both of these to be easier than you think.
  • by rmcd ( 53236 ) * on Wednesday November 16, 2011 @06:54PM (#38079664)

    If you do go with R, be sure to check out Rstudio (rstudio.org), which is a very nice front-end for R.

    In response to the posters who tell you that R is low quality because it's open source, I can tell you that's nonsense. I have Stata, Matlab, and R on my machine, and access to SAS on a research server. There are times to use each, but all else equal I use R. It's not trivial to learn, but it's a powerful high-quality piece of software, widely used in the statistics community. Whether it's appropriate for your use depends on you and the task. But it's great software.

  • Not open-source but very easy to generate reports off relation data sources. http://www.yellowfin.bi/ [yellowfin.bi]
  • It would help if you provided some more details about what you are trying to do. What sort of statistics re you looking to analyze? or are you juts collecting statistics such as average number saved, variations by month, etc. From your post it sounds like you are looking more for activity statistics than statistical analysis; in which car a stat package would just add unneeded complication to your efforts.

    A stat package isn't going to make it any easier to analyze the data; it'll just make it easier to gene

  • As a Perl guy, Python(x,y) [google.com] has a complete scientific computing package. While Perl and Ruby can do these things, Python(x,y) [google.com] does it in a slick way.

    It is a Windows only package as far as I can tell.

    Perl, Python and Ruby can deal with Excel and R but Python(x,y) [google.com] provides a nice interface for everything.

  • PSPP is a nice idea, but lacks functionality. SPSS is ridculously priced, even with IBM's "discount" for non-profits.
    DeduceR and R commander give you access to the full power of R under R GUI. DeduceR gives you spreadsheet
    like data entry and basic stats, and then you can load R Commander for a menu driven interface more advanced
    functions.

  • Data mining with Rattle and R .... http://rattle.togaware.com/ [togaware.com]

    Most librarians were probably not math majors, and are unlikely to be expert in statistics. But if you can work your way through the book, you may get enough insight into your data to ask good questions from a local Math department. No doubt some graduate student(s) can get a paper out of it, or at least some applied class project credit.

    But if you don't understand what it is you are looking for, you probably won't coax them into figuring out wha

    • Data mining with Rattle and R .... http://rattle.togaware.com/ [togaware.com]

      Most librarians were probably not math majors, and are unlikely to be expert in statistics. But if you can work your way through the book, you may get enough insight into your data to ask good questions from a local Math department. No doubt some graduate student(s) can get a paper out of it, or at least some applied class project credit.

      But if you don't understand what it is you are looking for, you probably won't coax them into figuring out what questions you ought to be asking. So start with the book.

      While a free version is on the site, support the work by buying a hardcopy for the library ;>

      I am actually curious as to how many librarians have math degrees. I have only met 2 so far; myself and my former professor back in grad school.

  • Businesses use Customer relationship management [wikipedia.org] systems. These tools also provide statistics.
  • You might want to take a look at LibAnalytics [springshare.com] (full disclosure: I work for Springshare). If you're actually trying to do statistical analysis, then I think others' recommendations will server you better - but if you're looking for a way to track many different sorts of data and generate reports, then I think LibAnalytics would serve you very well.
  • Subject line fills in the banks omitted in other responses.

  • This is not the answer you're looking for... Could you post a reply here with whatever you chose to do?

Say "twenty-three-skiddoo" to logout.

Working...