Ask Slashdot: Switching From SAS To Python Or R For Data Analysis and Modeling? 143
An anonymous reader writes "I work for a huge company. We use SAS all the time for everything, which is great if you have a bunch of non-programmer employees and you want them to do data analysis and build models... but it ends up stifling any real innovation, and I worry we will get left behind. Python and R both seem to be emerging stars in the data science game, so I would like to steer us towards one of them. What compelling arguments can you give that would help an old company change its standard if that company is pretty set in its ways?"
Cost (Score:5, Informative)
And yes, while I have not used R myself, I would certainly recommend it over Python for this use case as it is very dedicated to doing the kinds of things that SAS is good at in a very efficient, friendly manner. I've seen a number of people use it to do some very neat statistical analysis, and their stuff was a lot simpler than the SAS scripts that I use to write years back.
I made the switch (Score:5, Informative)
Personally at least.
I used to work in one of the largest banks in the world, and everything we did was SAS/MSSQL.
I had some personal stuff in R, but most of the other analysts didn't seem too interested except using what I made for them except for one phd in the German department. I never pushed it though since there was so much legacy code, including code I had written my self.
Now I have switched to a start-up bank, and I am the only analyst.
I've used R/RStudio/Shiny with PostgreSQL in the back very successfully, with all code in git. Now I can bring good analysis forth much faster than I used to in SAS that can be viewed on any device with the option of downloading the source data in excel and csv.
The management loves this.
If you show them a few good ones they will want more, but I wouldn't start to rewrite all the legacy code. SAS isn't bad when you have it set up properly.
But another good thing about R is that you get access to innovation in the statistics fields faster, and you don't have to pay huge sums of money for extra features.
RStudio and Shiny is a bit expensive for the pro versions, but nothing compared to SAS, and the open source versions are free.
Re:R... (Score:5, Informative)
R is definitely still ahead for data modeling, but Python has some advantages too. With a bigger set of modules (libraries) to choose from and high popularity in the financial sector, there are big improvements all the time. For the purposes of this discussion, the most important Python modules are:
IPython [ipython.org]: powerful interactive shell
numpy and scipy [scipy.org]: numerical, matrix, and scientific functions (matlab-ish)
pandas [pydata.org]: R-like data structures and data analysis tools (analysis mostly limited to regression)
statsmodels [sourceforge.net]: statistical analysis, complements pandas
sk-learn [scikit-learn.org]: machine learning
So can Python do everything that R can? No. Or, at least, not as easily. But it is improving in that direction quite quickly, and if Python's data analysis capability meets your needs, then you can likely do everything in one language instead of calling R routines from another.
One vote for Python (Score:5, Informative)
I haven't used it for any "big data" tasks, but for a number of small, interactive data analysis utilities it has been really enjoyable to work with. One standout tool for me has been pyqtgraph, which is lightning fast and creates some really impressive interactive visualizations. It's also got some pretty incredible features out of the box - arbitrary user-definable ROIs, instantly change any plot to a log-log, or even do a Fast Fourier transform with just a right click. If I sound like a fanboi, I kind of am - after trying to deal with the agony of 3D data manipulation in matplotlib (python's matlab package), it's a whole different world.