Forgot your password?
typodupeerror
Python

Ask Slashdot: Switching From SAS To Python Or R For Data Analysis and Modeling? 143

Posted by timothy
from the in-the-parlance-of-our-times dept.
An anonymous reader writes "I work for a huge company. We use SAS all the time for everything, which is great if you have a bunch of non-programmer employees and you want them to do data analysis and build models... but it ends up stifling any real innovation, and I worry we will get left behind. Python and R both seem to be emerging stars in the data science game, so I would like to steer us towards one of them. What compelling arguments can you give that would help an old company change its standard if that company is pretty set in its ways?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Switching From SAS To Python Or R For Data Analysis and Modeling?

Comments Filter:
  • Cost (Score:5, Informative)

    by TemporalBeing (803363) <bm_witness@yaho o . c om> on Thursday July 03, 2014 @10:38AM (#47376169) Homepage Journal
    The cost of training them to use R will be signifantly cheaper than what you are spending on the SAS licenses, which (last I knew) was a yearly purchase for each user.

    And yes, while I have not used R myself, I would certainly recommend it over Python for this use case as it is very dedicated to doing the kinds of things that SAS is good at in a very efficient, friendly manner. I've seen a number of people use it to do some very neat statistical analysis, and their stuff was a lot simpler than the SAS scripts that I use to write years back.
  • I made the switch (Score:5, Informative)

    by TyFoN (12980) on Thursday July 03, 2014 @10:58AM (#47376361)

    Personally at least.
    I used to work in one of the largest banks in the world, and everything we did was SAS/MSSQL.
    I had some personal stuff in R, but most of the other analysts didn't seem too interested except using what I made for them except for one phd in the German department. I never pushed it though since there was so much legacy code, including code I had written my self.

    Now I have switched to a start-up bank, and I am the only analyst.
    I've used R/RStudio/Shiny with PostgreSQL in the back very successfully, with all code in git. Now I can bring good analysis forth much faster than I used to in SAS that can be viewed on any device with the option of downloading the source data in excel and csv.

    The management loves this.

    If you show them a few good ones they will want more, but I wouldn't start to rewrite all the legacy code. SAS isn't bad when you have it set up properly.

    But another good thing about R is that you get access to innovation in the statistics fields faster, and you don't have to pay huge sums of money for extra features.

    RStudio and Shiny is a bit expensive for the pro versions, but nothing compared to SAS, and the open source versions are free.

  • Re:R... (Score:5, Informative)

    by MightyYar (622222) on Thursday July 03, 2014 @11:08AM (#47376463)

    R is definitely still ahead for data modeling, but Python has some advantages too. With a bigger set of modules (libraries) to choose from and high popularity in the financial sector, there are big improvements all the time. For the purposes of this discussion, the most important Python modules are:
    IPython [ipython.org]: powerful interactive shell
    numpy and scipy [scipy.org]: numerical, matrix, and scientific functions (matlab-ish)
    pandas [pydata.org]: R-like data structures and data analysis tools (analysis mostly limited to regression)
    statsmodels [sourceforge.net]: statistical analysis, complements pandas
    sk-learn [scikit-learn.org]: machine learning

    So can Python do everything that R can? No. Or, at least, not as easily. But it is improving in that direction quite quickly, and if Python's data analysis capability meets your needs, then you can likely do everything in one language instead of calling R routines from another.

  • One vote for Python (Score:5, Informative)

    by werepants (1912634) on Thursday July 03, 2014 @11:37AM (#47376779)
    Granted, I don't have much experience with R, but Python has some notable benefits - it is very well established and you can find tools to do just about anything. It is fast and easy to develop, and very easy to learn thanks to the readability and plentiful resources online. I imagine you'll have an easy time finding people with python experience, as well.

    I haven't used it for any "big data" tasks, but for a number of small, interactive data analysis utilities it has been really enjoyable to work with. One standout tool for me has been pyqtgraph, which is lightning fast and creates some really impressive interactive visualizations. It's also got some pretty incredible features out of the box - arbitrary user-definable ROIs, instantly change any plot to a log-log, or even do a Fast Fourier transform with just a right click. If I sound like a fanboi, I kind of am - after trying to deal with the agony of 3D data manipulation in matplotlib (python's matlab package), it's a whole different world.

You can bring any calculator you like to the midterm, as long as it doesn't dim the lights when you turn it on. -- Hepler, Systems Design 182

Working...