
Ask Slashdot: Switching From SAS To Python Or R For Data Analysis and Modeling? 143
An anonymous reader writes "I work for a huge company. We use SAS all the time for everything, which is great if you have a bunch of non-programmer employees and you want them to do data analysis and build models... but it ends up stifling any real innovation, and I worry we will get left behind. Python and R both seem to be emerging stars in the data science game, so I would like to steer us towards one of them. What compelling arguments can you give that would help an old company change its standard if that company is pretty set in its ways?"
R... (Score:3)
Why Python and not C or ERLANG or COBOL?
Re: (Score:2)
Re: (Score:2)
They are why I tend to recommend python for analysis and modeling, a good set of libraries (and the community that goes with them) plus a relatively low barrier to entry in terms of installing and learning the language.
Python is better overall but R is more like SAS (Score:5, Insightful)
R has more single function high level commands devoted to stats, these are done right internally and are self consistent with other functions for further processing. But its not as general a programming language as python. if you want something different than the canned functions in R then you will need to write them yourself at which point you might as well be using python. however if you like SAS then chances are R will seem more like what you are hoping for.
Re: (Score:2)
R has more single function high level commands devoted to stats, these are done right internally and are self consistent with other functions for further processing. But its not as general a programming language as python. if you want something different than the canned functions in R then you will need to write them yourself at which point you might as well be using python. however if you like SAS then chances are R will seem more like what you are hoping for.
R has more single function high level commands devoted to stats, these are done right internally and are self consistent with other functions for further processing. But its not as general a programming language as python. if you want something different than the canned functions in R then you will need to write them yourself at which point you might as well be using python. however if you like SAS then chances are R will seem more like what you are hoping for.
The original poster failed to define creativity in the context of the end-user and his problem solving. I think that the end-user of SAS should be the one asking if R or python or other language is more suitable.
Perhaps the poster should visit the customer(s), to see what they are doing, and return with an R equivalent or propose R as a better solution. No need to change 4 quarters for a Dollar.
Re: (Score:2)
Inefficient (two interpreters), inelegant (two syntaxes), and there is usually no point to it. Both languages are roughly as capable.
Re: (Score:2)
[...] as opposed to C which is a low-level programming language.
Assembler is a low-level programming language.
Machine language is a low-level programming language.
C is not a low-level programming language, although you can do low-level programming with it.
http://en.wikipedia.org/wiki/L... [wikipedia.org]
Re: (Score:2)
The best way to program for sure.
Re: (Score:2)
C-x M-c M-butterfly.
The best way to program for sure.
Madame butterfly?
Re: (Score:2)
C-x M-c M-butterfly.
The best way to program for sure.
Madame butterfly?
No, it's a reference to this [xkcd.com].
Re: (Score:1)
Re: (Score:2)
C is most certainly a low-level programming language. There's a reason people call it "portable assembly language".
"Portable assembly language" is an oxymoron. And I have never heard anyone use that phrase to describe C.
Of course, as with almost all programming languages, people build useful abstractions in C to bridge the gap somewhat. But that doesn't make C itself a high-level language, any more so than does the use of functions and macros to increase the expressive power of an assembly language.
Never mind building abstractions. The C language itself is a significant abstraction from the machine level. Only a small handful of operators and constructs in C have a close analogue to assembler statements (e.g., accumulation, shift and bitwise logical operators.) Therefore I maintain that it is not a low-level language.
Re: (Score:1)
"Portable assembly language" is an oxymoron. And I have never heard anyone use that phrase to describe C.
A quick web search will solve that for you. The phrase has been bandied about for quite a few years now, although many people disagree on the topic.
Never mind building abstractions. The C language itself is a significant abstraction from the machine level. Only a small handful of operators and constructs in C have a close analogue to assembler statements (e.g., accumulation, shift and bitwise logical operators.) Therefore I maintain that it is not a low-level language.
While I certainly agree that C is a significant (and useful) abstraction from the machine -- or more specifically, the assembler -- level, I think you've glossed over quite a few things that are much closer in C to how it works at the machine code level, compared to most other languages. Some that spring to mind are:
Re: (Score:2)
C is a mid-level language. It is low level compared to Python but high level compared to assembler.
If forced to choose low or high to describe C, I would reluctantly say low.
Re: (Score:2)
This is what R was basically designed to do.
On the other hand, I understand from several recent writings that lots of non-statistical experts have been finding that R also makes it Easy To Do It Wrong.
Re:R... (Score:5, Informative)
R is definitely still ahead for data modeling, but Python has some advantages too. With a bigger set of modules (libraries) to choose from and high popularity in the financial sector, there are big improvements all the time. For the purposes of this discussion, the most important Python modules are:
IPython [ipython.org]: powerful interactive shell
numpy and scipy [scipy.org]: numerical, matrix, and scientific functions (matlab-ish)
pandas [pydata.org]: R-like data structures and data analysis tools (analysis mostly limited to regression)
statsmodels [sourceforge.net]: statistical analysis, complements pandas
sk-learn [scikit-learn.org]: machine learning
So can Python do everything that R can? No. Or, at least, not as easily. But it is improving in that direction quite quickly, and if Python's data analysis capability meets your needs, then you can likely do everything in one language instead of calling R routines from another.
Re:R... (Score:5, Interesting)
So can Python do everything that R can?
No, but Rpy can.
I've used R, and it really has a lot of strong points, but I prefer to access it these days via Rpy, which gives me all the power of R along with everything else I get from Python (other libraries, better application development frameworks, etc.)
Both R and Python are real programming languages that are going to be completely useless to non-programmers, so neither of them is a SAS replacement, but of the two, I'd choose Python+Rpy over R for flexibility, power and ease of use (the latter is of course a strongly personal preference... if you really think like a traditional stats geek R will likely seem nicer, as it is clearly created for and by such people.)
Re: (Score:2)
Re: (Score:2)
Things have progressed recently. I've finally made the jump to 3. All of those libraries that I linked work with 3.
Re: (Score:2)
Not all libraries. OpenCV for instance.
Re: (Score:2)
I didn't link to that one, but if you are persistent you can make it work with 3. Their next release is 3.0, so they will then support Python 3. For now you'd have to run the development version. If you are dependent on something that OpenCV can do that sk-learn can't then yeah, stick with 2.7 for now.
Re: (Score:1)
Why Python and not C or ERLANG or COBOL? ..
While the question is interesting, it's off topic. You may as well ask the same question about any development task. Clearly the person asking the question already decided that the advantages of Python outweigh the advantages of C,ERLANG and COBOL. He is now asking whether the advantages of R outweigh the advantages of Python. Which is an entirely different topic.
Re: (Score:2)
R always gets the analysis job done for me, but when I recommend it I feel a need to include a warning that its data typing is strange.
For example, there are about five types which are like arrays, but which are only sometimes compatible with each other.
Re: (Score:1)
Without a list of the installed SAS components this is pretty much impossible to answer.
If you have the full eBI/eDI suite and a host of "solutions", then you're going to be spending a lot of time in R/Python trying to replicate the years that SAS has spent on developing that environment.
If you have core modules - BASE, STAT, ETL etc, then you can "program" at a relatively low level to your heart's content.
More info please.
Pandas (Score:5, Interesting)
Python and R are sort-of converging via Pandas [pydata.org]. I'm partial to Python, but Pandas really starts to blur the lines conceptually.
Re: (Score:2)
Using R vs Python with Pandas brought home the microlanguage vs libraries debate for me. I'm more experienced and comfortable programing in Python, so generally prefer it. But writing a program to solve the same problem in R or Python, I found the R version would be much faster. On the other hand, the Python version tended to give the correct answer, whereas the R version tended to have weird bugs I couldn't figure out.
As an open source enthusiast, I'd say an unfortunate advantage Python has is its "benevol
Innovation is more than tools (Score:5, Insightful)
Re: (Score:1)
Re: (Score:2)
mod up spot on
it ain't the tools, it is the tool users or their culture
Cost (Score:5, Informative)
And yes, while I have not used R myself, I would certainly recommend it over Python for this use case as it is very dedicated to doing the kinds of things that SAS is good at in a very efficient, friendly manner. I've seen a number of people use it to do some very neat statistical analysis, and their stuff was a lot simpler than the SAS scripts that I use to write years back.
Re: (Score:3)
Slightly different beasts I think. R is a really impressive analysis tool. Python is a scripting language. The latter is quite a bit more versatile, but ... probably isn't the right tool to solve the problem outlined in the OP.
Re: (Score:2)
The latter has a number of libraries that are geared towards the same problem area, such as Pandas.
Re: (Score:2)
Slightly different beasts I think. R is a really impressive analysis tool. Python is a scripting language. The latter is quite a bit more versatile, but ... probably isn't the right tool to solve the problem outlined in the OP.
However, his question was related to coming from SAS. SAS Scripting is not a general language eithers; it very much like using GNU Octave, Mathamatica, Matlab, and R - able to do some general things (open/read/write/close files) but is generally very data set oriented. So R is very much a suitable replacement.
Re: (Score:2)
Re: (Score:1)
Belief vs Experience (Score:3)
The cost of training them to use R will be signifantly cheaper than what you are spending on the SAS licenses
And yes, while I have not used R myself, I would certainly recommend it over Python for this use case
So not having used R yourself, why do you believe it is the better and cheaper solution?
Re: (Score:2)
The cost of training them to use R will be signifantly cheaper than what you are spending on the SAS licenses And yes, while I have not used R myself, I would certainly recommend it over Python for this use case
So not having used R yourself, why do you believe it is the better and cheaper solution?
B/c I don't do much in data modeling and working with that kind of data. The SAS scripts I wrote were over 10 years ago (2002). R isn't that old. I had one time which it might have (last summer), but then forgot about it. But for me that's 1 time in >10 years. If I needed to get into doing the stuff I did with SAS again, then yes I'd be looking at using R; but that's unlikely for me.
And yes, I have seen others use it so I know how much easier it would be for me to get into using R than updating myself
More than cost (Score:2)
For R there exist attempts at GUI's (like e.g. R-commander) that offer point-and-click functionality but they're more sketchy.
I think that giving non-programmers access to R will result in a flood of help requests because they
Re: (Score:2)
I know both SAS and R, and I think that for people who've never programmed, the GUI-based version of SAS wins on end-user usability because end-users can click together (simple and limited) analyses on really big datasets. This has far-reaching consequences for the learning curve.
For R there exist attempts at GUI's (like e.g. R-commander) that offer point-and-click functionality but they're more sketchy.
Others have mentioned Rstudio, and that looks like it would fit the bill just fine for those users from a cursory glance; and if they could drop the money on SAS they could certainly drop the money for commercial versions of RStudio and get the extra help.
I think that giving non-programmers access to R will result in a flood of help requests because they really do need some notion of programming to use the R language. With SAS that's more in the background because the GUI tool is relatively well done, and use of the butt-ugly, antiquated and clumsy mainframe-style SAS language can usually be avoided.
Never touched that version. I only had a single desktop license for the small company that I worked at. We had it b/c the guy I replaced knew SAS very well and sold the management on it. Management just wanted the functionality; they didn't care and had th
Re: (Score:2)
Re: (Score:2)
R has an excellent gui in RSTudio: www.rstudio.com I would recommend it as a much better interface to R.
Looking at that website and seeing their prices they are almost certainly aiming to compete with SAS and using an Open Source product to boot. Good for them!
What's the business case? (Score:5, Insightful)
Is it your feeling that SAS is "stifling any real innovation" or do you have examples of projects that are impossible with SAS but possible with Python or R?
Do those example projects actually help the bottom line of the company or are they just "cooler"?
If you can think of examples that have clear financial benefits to the company, you have a solid business case already.
If there are no such examples or other factors negate the benefits, then the company has nothing to gain by switching and should not switch.
Short answer; if you're asking on Slashdot for reasons to switch from product X to product Y, you probably have no real reason to switch.
Re: (Score:2)
or there are real reasons and he just doesn't know them. he didn't even tell us what his company does, except that it's "huge". if it's finance then there probably are good reasons. if it's healthcare, then not so much, though since R developers are more common (=cheaper) now, the benefit of just ditching those extremely expensive SAS licenses may still be enough.
i agree, though. he should do his own research and then ask slashdot if necessary, which it really shouldn't be. still, i kind of want to see him
Re: (Score:2)
This right here.
Nothing happens in a company of any size without a business case.
To amplify upon what has already been said, you need to show the financial benefit to the company. You need to justify the cost to acquire the technology and train people on it. You need to quantify the ROI so that management can weigh the cost of the technology versus all of the other costs that they have to cover every year.
A good thing to research is whether or not any of your competitors are using what you want to use. H
Re: (Score:2)
Short answer; if you're asking on Slashdot for reasons to switch from product X to product Y, you probably have no real reason to switch.
The long answer was pretty good, but I disagree with the short one. Asking a (presumably) knowledgeable group of people questions like this is a good way to get a more complete picture of the problem space, and asking people from other companies might just score him a few stories about what worked for them and what didn't work.
Here's an anecdote from me: back when I was a fresh-faced, naive junior engineer I wanted to sell management on an open source alternative to an expensive commercial package by targe
Re: (Score:2)
Unlikely to change (Score:1)
You're using SAS because it's a closed-source, paid-support software package. R and Python aren't replacements for that.
Having said that: if you're building statistical models, and you want to provide some interfacing for your non-programming employees, you don't want Python, you want R, in particular, RStudio. You can build routines and packages in R and interface them through RStudio that will still allow your employees to (mostly) ignore programming. However, there's still a huge jump in competence that'
Unlikely to change (Score:1)
While healthcare companies have a history of using SAS. The FDA does not *require* SAS
"The FDA does not endorse or require any particular software to be used for clinical trial submissions, and there are no regulations that restrict the use of open source software (including R) at the FDA." -- http://blog.revolutionanalytics.com/2012/06/fda-r-ok.html
Official FDA policy and requirements is outlined here: http://www.fda.gov/iceci/enforcementactions/bioresearchmonitoring/ucm135196.htm
Re: (Score:1)
He's right. FDA itself uses R, and FDA employees have even written R packages. Here's a poster (pdf) about this.
http://blog.revolutionanalytics.com/downloads/FDA-Janice-Brodsky-UseR-2012.pdf
The right competitor to SAS is Statistica (Score:2)
R isn't a replacement for SAS---in practical use it requires much more command line programming ability and although it has an enormous number of packages, many of them are 'academic quality' (meaning good enough to make papers) and fewer are highly validated production quality with all the edge cases & stability tested.
Some SAS capabilities can run 'out of core' (unlike R) so you can process data sets which would not fit into RAM.
Statistica (StatSoft) is the closest direct competitor (Windows only unfo
Re: (Score:2)
There are no complelling arguments... (Score:5, Insightful)
Re: (Score:1)
this
Re: (Score:2)
Wisdom right here.
"real" innovation? Re:no complelling arguments... (Score:2)
Because if not... if this is really large company, you may be perceived as a "precious little snowflake that also complains a lot."
And if this is a really large company, they're going to be able to coast along on the status quo for LONGGG time and I don't know why anybody would listen or care about a whiny
Apples, meet Oranges... (Score:5, Insightful)
SAS is not a language; it's a full multi-tiered solution for the aggregation, normalization, and analysis of data. There's a language as well, but that's just one part of the whole solution. Python and R, while absolutely fantastic languages, are not a full solution.
So, first step...if you're going to offer an alternative, actually have an alternative. I don't know your SAS buildout nor do I know the data sources it consumes, so I can't really point to what else you need to add or how you need to construct it to produce a more flexible replacement to your existing and current SAS infrastructure.
Second step...a roadmap for migration. It's one thing to sign a lease for a new apartment or to buy a new house, and another to shift your life from the old place to the new. If you don't have a plan, at least in broad strokes, then you're going to be doomed when you look for executive sponsorship. You need to make sure that you get all the stakeholders' input as well, lest you leave something out in your roadmap...and then end up with someone who sees you as a problem. That person will most likely be in a position to scuttle the whole thing, as well.
Third step...figure out how to define the benefits in terms of the stakeholders' needs. You're going to replace a system they use; why should they want you to do so? And you have to define it from their perspective, with regard to things they care about. Beware of getting geeky on this...it's very likely that at least one of the people whose support you will need will not be a geek and will be concerned with the output more than the technical means used to produce it. Don't hard-sell, either...pushing too hard will get the door slammed in your face, and even potentially polarize people against you. (See above, under "in a position to scuttle the whole thing.")
There will be steps after that, but those will be largely determined by how the first three steps go. It may involve bringing in outside vendors, doing requirements analysis...a lot of it depends on details of your company as well and how they normally do things. But above all else, remember this: don't buck the system too hard, and don't knock the company you work for. Trying to get a lot of people to support and cooperate with you while telling them that their way of doing things sucks is suicide.
Re: (Score:1)
Re: (Score:2)
Yes, but, you can run R and Python over data in pretty much any backend (Teradata, Hadoop, etc.). That's usually the second conversation - how you want to accomplish your goals.
That you think that just "Teradata" or "Hadoop" is the other thing needed in addition to Python or R to replace an SAS implementation tells volumes about how much you don't know about SAS and what it really does to satisfy customer requirements. You don't replace SAS with nothing more than a bare database and a Python interpreter.
And you can't just say "I want you to throw out your existing infrastructure just so that I can use X programming language...you figure out how to make it happen" to the company y
Re: (Score:1)
You can run SAS in pretty much any backend too.
I made the switch (Score:5, Informative)
Personally at least.
I used to work in one of the largest banks in the world, and everything we did was SAS/MSSQL.
I had some personal stuff in R, but most of the other analysts didn't seem too interested except using what I made for them except for one phd in the German department. I never pushed it though since there was so much legacy code, including code I had written my self.
Now I have switched to a start-up bank, and I am the only analyst.
I've used R/RStudio/Shiny with PostgreSQL in the back very successfully, with all code in git. Now I can bring good analysis forth much faster than I used to in SAS that can be viewed on any device with the option of downloading the source data in excel and csv.
The management loves this.
If you show them a few good ones they will want more, but I wouldn't start to rewrite all the legacy code. SAS isn't bad when you have it set up properly.
But another good thing about R is that you get access to innovation in the statistics fields faster, and you don't have to pay huge sums of money for extra features.
RStudio and Shiny is a bit expensive for the pro versions, but nothing compared to SAS, and the open source versions are free.
Re: (Score:3)
If you show them a few good ones they will want more, but I wouldn't start to rewrite all the legacy code.
This. Submitter should build a few small projects that give a different end result than the current code base. If you're just swapping R for SAS but delivering the exact same output, no management will care. The sample projects either needs to report the data in different ways, or visualize the data [d3js.org], or even as this parent suggested, simply provide a copy of the output as a spreadsheet.
Innovation will come by thinking about the problem differently and exploring different ways to ask questions to gain insigh
Re: (Score:1)
You made the switch by switching jobs. That isn't the answer to the original question.
Python FTW (Score:1)
Re: (Score:2)
You have to *demonstrate* that SAS is better (Score:3)
Go do something in R or Python that is useful to the company but impossible or very difficult in SAS.
Then show it to the hard-core SAS users. If they're interested, demonstrate it to your boss along with how it can save the company (and especially your cost center) money.
Re: (Score:1)
they won't be interested. SAS users are dolts; it's one of those languages where you can do a handful of (admittedly useful) things very easily, but is a tarpit for any kind of general procedural development.
just develop something cool, show it to the SAS people (who won't understand) as a pro forma exercise, then go to the boss.
SAS and data science? (Score:1)
if your company is using SAS, then i don't think what you're doing is data science. analysis is not data science.
Re: (Score:2)
yes it is. to borrow your sig, data science is just "exxxtreme data analysis."
R is better for non-programmers (Score:3, Insightful)
Re: (Score:1)
I agree that R is better for non-programmers. R is a tool you can use to answer all kinds of questions. It is popular economists, psychologists, mathematicians and people who need a computer to get their work done.
I'm more of a computer person. R drives me nuts. To me, R feels like a hodge-podge of features that aggregated together over decades. Python is different. It has a Benevolent Dictator For Life and it feels cohesive. If Python is the Parthenon, then R is the Grand Bazaar. Your individual mileage ma
Try and prove it (Score:1)
SAS on Z/OS ?
Good luck with your python code.
I think you should learn SAS.
Research and Recruitment (Score:4, Interesting)
I work for a large Fortune 25 company. We have an existing SAS presence and we do some good work in SAS. There are two main reasons that we are bringing R into our environment: research and recruitment/retention.
R is extremely common across research right now. When a new paper comes out describing a new algorithm or modeling technique, the odds are extremely good that it comes with R source code. With R in-house, there is very little time or effort to try these things out to see if they can help our current work. With SAS, we would need to invest time recoding everything or worse, wait until it is baked into SAS itself. That is a huge barrier to adopting new approaches.
Recruitment and retention are related to R's popularity in research. Let's face it, data scientists are a hot commodity right now. Lots of companies are looking to hire them and there aren't enough good people to go around. We're seeing that a lot of the new talent have been using R in their graduate work rather than SAS, and are interested in an environment where they can continue using R. Additionally, it's harder to retain people once you've hired them if they can't use what's become a lingua franca.
SAS remains a great tool, and we're not going to get rid of it. Rather, we want to add R to the toolbox.
(I don't mention python here... We've got some folks working with Python especially for NLP, but for the work we do there's a lot more folks using R across industry and academia.)
As someone who moved from SAS 1 year ago... (Score:3, Interesting)
I work in IT at a large company (>30k employees) who recently dropped SAS. Before we did, we tried out R but what we found out was that except for IT and some tech savvy engineers, nobody seemed to get anything done without help, even after training.
We had decided to drop SAS due to the ludicrous license costs (at one point we were paying more on renewals than we did when we purchased it! WTF?) and due to some issues with their installation/upgrade process that they were not able to resolve within a reasonable timeframe. We ended up switching to StatSoft's STATISTICA, which has a much lower price point (~30% of what we paid for SAS), predictable renewal fees (20% of purchase price), vast feature set (in the Data Miner package we have), excellent Office integration and import/export compatibility with SAS data files. Oh, and it also features R integration so you can still use R from within it if you want. Users became proficient very quickly, after receiving some training.
I recommend you consider their solutions... Open source is not always best, especially when it comes to borderline tech-illiterate business users.
R - Consider Which R (Score:2)
I would recommend R. It's the language college grads are getting trained in. The reason for that is simple. There's no licensing costs for a simple R dev environment. However, I wouldn't use the free stuff for anything that ad hoc. If you have a production big data job I would look at something like Vertica (purchases by HP a couple years ago.) Extremely fast big data DB engine. Not only will it run R, but it has the ability to break the R up into smaller chunks at execution time and distribute the e
One vote for Python (Score:5, Informative)
I haven't used it for any "big data" tasks, but for a number of small, interactive data analysis utilities it has been really enjoyable to work with. One standout tool for me has been pyqtgraph, which is lightning fast and creates some really impressive interactive visualizations. It's also got some pretty incredible features out of the box - arbitrary user-definable ROIs, instantly change any plot to a log-log, or even do a Fast Fourier transform with just a right click. If I sound like a fanboi, I kind of am - after trying to deal with the agony of 3D data manipulation in matplotlib (python's matlab package), it's a whole different world.
JASP! (Score:1)
i'd take a look at JASP.
http://jasp-stats.org/ [jasp-stats.org]
- it has an attractive UI like SAS and SPSS, and you're not stuck writing code for an analysis that should be quite straight forward.
- the analyses are themselves implemented in R, and python is to be supported.
- an API is in the works for implementing arbitrary analyses
Use both (Score:2)
Python seems to be gaining favor but IMHO the downside is that it's a general purpose language and not built with statistics in mind.
R is quite easy to use both from installation to language standpoint. It's trivial to install and there are many, many packages (of differing quality) on cran. You can easily take advantage of multiple processors, GPUs, even Hadoop (to an extent). The main downside is that it's mostly constrained by the memory of the host system. So even though it's easy to load a 20G dataset
Julia (language)? (Score:2)
http://julialang.org/ [julialang.org]
Re: Julia (language)? (Score:1)
I'd recommend Julia for traditional scientific computing- things based on continuous math like systems of equations. Julia's sweet spot is similar to MATLAB.
While the R has a lot of similarities to MATLAB, but it "feels" like it is aimed at the stats & machine learning user.
Re: (Score:1)
julia is nowhere near, in intent, as well as development, the statistical tasks needed by the poster.
R for Speed of Implementation, Python for Scale (Score:2)
Re: (Score:3)
It isn't, but many of the modules are written in C or other thread-capable languages. For instance, if you are using sk-learn to analyze a dataset with a machine-learning algorithm, your Python code will run on a single processor but the calls to sk-learn to do your heavy lifting will distribute across cores.
Use both (Score:2)
I find R's syntax really annoying for actually doing anything. So I do all the data acquisition, manipulation, etc. in Python and use the RPy2 bridge to just run the actual analysis in R. Best of both worlds.
Python is the better programming language (Score:1)
The arguments in favor of R boil down to this: R is more widely used by statisticians and has a much larger library of statistical packages. But R is not a very good programming language [r4stats.com], is difficult to learn, and is not well suited to integrate with or be used for more general purpose programming tasks.
Python, on the other hand, has a vast library of packages but does not yet have nearly as many packages specialized for the statistical computing domain. The arguments in favor of Python are, in essence, t
Python because (Score:1)
Simple arithmetic shows us (Score:1)
I was able to figure that out with this bit of C code:
printf("%d", 'R' - 'C');
I'm not sure how to do that in R though.
Suck it up and Program in SAS (Score:3, Insightful)
Supporting all 3 options (Score:1)
I work for a large University in a division that provides financial data to ourselves, as well as other academic institutions. We had been a SAS only shop since our inception in the early '90s, save for a few FORTRAN users here and there.
We wanted to support more options for the researchers using our service, and today, we support SAS, R, and Python. One nice thing about SAS is SAS/SHARE. Basically, it makes your native SAS files (*.sas7bdat) available as tables in a database over ODBC or JDBC, with full in
R is not an emerging star (Score:2)
R has been around for a long time and has long been a standard.
Pythons sklearn is indeed an 'emerging star'.
Personally I use both.
Also have a look at some of the many stand alone tools vowpal wabbit (blazingly fast for regression learning, scales to ridiculous amounts of data) is superb, as is sofia-ml (for clustering, again scales quite well)
I tie them all together in python, since there are python bindings for R, and you can use pythons 'Subprocess' module to pipe commands and data for commandline tools t
Do it! (Score:1)
why is cool desirable (Score:2)
I mean, if SAS works, why waste time on hot cool stuff that may be obsolete in a year or two ?
this whole innovation for the sake of innovation thing is so last century
(see a post on crooked timber about a week or so ago, also P Krugman in his blog flagged a New Yorker article on the cult of innovation)
R for current productivity (Score:1)
Very different user requirements (Score:1)
Both Python & R great, yet do check under the (Score:1)
Don't forget JavaScript (Score:2)
I particularly like what Joyent is doing with their big data analysis. Node.js is catching up to python in performance and in time it should be a very strong server side language for many applications including Data Analysis and Modeling. So don't forget JavaScript. It may be very immature for use in this field but there is no reason that trailblazing can happen shortly with Node.
Depends what you want (Score:2)
I work in bioinformatics, and use both R and python. The data models in R are stronger than in python, and packages like ggplot are easier to use than matplotlib. That makes it a relatively easy entry. It's also much more similar to SAS than python is. However, R has some big limitations. It is _very_ slow and is a memory clogging beast. It also has some very annoying quirks, like the horrible object model. I find python to be much more flexible, and absolutely required for larger data sets. With the right
Re: (Score:2)
Yes, the submitter is an idiot incapable of his own research, but this sounds like SAS astroturf FUD (yes, there is such a thing).
Re: (Score:2)
"infrastructure and connectivity involved with SAS." "You need to do a lot more research on data analysis, data mining, analytics, and integration before even talking about a solution."
nope, that's a string of buzzwords written by marketing. at least we agree that the submitter is an idiot, i'm just adding the AC to the list as well.