BookmarkSubscribeRSS Feed
HardernTim
Calcite | Level 5

I've been making plans for getting my master's in statistics and I want to work on other skills that would help me get a stats job down the line. I know both are in demand, but I'm not sure which is best for me to learn right now. All I have for programming experience is a little bit of Visual Basic and C++ in high school and some Java in undergrad, but I've definitely never gotten far enough with any of these to be very skilled. I'd rather work in an academic setting, which I've heard uses R more, but I think SAS might be easier for me to learn considering the little programming experience I have. I got the free university edition of SAS and found the free first course offered on their site, but then I found out about the whole SAS vs. R debate and I figured I'd ask about it before I invested too much time in learning something I maybe shouldn't bother with. Of course, the ideal would be to learn both, but where should I start?

 
6 REPLIES 6
Kurt_Bremser
Super User

Learn both.

R is the emerging, open-source statistics package, and seems to be widely used. But it has its drawbacks (no real authority for the checking of modules for validity, among others).

SAS is the data warehousing environment, which is much more than "just" statistics. For the work I do (ETL, preparing data for statisticians), R is useless.

For extracting data from multiple sources, getting it into shape, and keeping track of everything (data, users, codes, the whole metadata), SAS is the way to go.

jimbarbour
Meteorite | Level 14

Well, you have to understand that you're asking on a SAS focused forum.  We're all SAS guys, so we may be a bit biased. 

 

I have read that R is quite popular in academia, but that's about all I know. 

 

I've worked with other languages that take a lot more programming to get things done like Assembler, C, Java, etc.  Even though Java is more modern, I stuck with SAS because SAS does so much work for me.  It's much faster at least for me to write in SAS.  

 

Now, that's the reason I've stuck with SAS, but that doesn't tell you much about R unfortunately.  I just don't have much depth with R other than a few elementary programs.

 

That said, I do look at the Tiobe and Red Monk indices regarding the popularity of programming languages. 

 

On the Tiobe Index, R is 14, and SAS is 21.  A language that you didn't mention that has shown tremendous growth in recent years is Python, which is currently #2 on the Tiobe index.  Python has a number of statistical packages and now rivals R and SAS in functionality from what I've read.  The CEO of Tiobe has some interesting things to say about the popularity of R; see the preceding link.  Another up and coming statistical language is Julia, currently at #26.  Julia is a project out of MIT whose goal is to be a language that can do statistical processing, machine learning, etc. but be very fast and efficient.  In other words, Julia strives for all of the features of Python, R, and SAS while having the performance of something like C or C++.  I'd say Julia is one to watch, but who am I?

 

The Red Monk index lists Python at #2 and R at #13.  Red Monk only lists the top 20; SAS isn't mentioned.

 

Popularity isn't everything, and how popularity is determined is the subject of no small amount of debate, but these indices are none-the-less interesting.  They're at least something to throw in the mix.  

 

Sorry I can't be more definitive,

 

Jim

andreas_lds
Jade | Level 19

Using sas for some time, i looked at R code last summer and i was impressed by how ugly a programming language can look like in the 21st century. @jimbarbour mentioned, that some members of the sas community may be biased 😉

Jim had another good point: Python. Learning that language will open other career opportunities besides statistics.

ballardw
Super User

Adding a somewhat biased response.

 

A couple of years ago my organization sponsored a couple of days of "introduction to R" training for 30 or so folks that are involved in various types of analysis. The instructor of the course installed the software on our training computers.

 

During the training demonstrations about every hour of practice one or more of the students couldn't run the example the instructor was providing because the machines he set up did not have the modules active. Which to my way of thinking meant that there is going to be a lot of "can't use this example code" floating around on the internet or someone's process because everyone has their own favorite module to do the same essentially the same thing.

 

None of the examples in the class used what I would call close to a moderate sized data set, the largest was about 1000 records of 20 variables or so (yes I know that isn't the proper R-ese since R is matrix based but that's the way I think). I shuddered to contemplate cleaning up data with 100,000+ rows.

 

I was also not very happy with the how to read text files into the data models (common problems during the demonstrations) having dealt with some pretty complex and obnoxious text files in the past.

 

The lack of a single source of authoritative help would be, to me, a severe concern in an actual corporate organization when you have a software package that practically requires users to constantly download software to use "features" from unknown sources. Response times to problem solutions for critical processes that start failing after they were created for some reason would be another strike against this software.

 

In my career and hobby activities I have only used something like 35 or 40 programming or scripting languages. Outside of Assembler for the IBM360 this was the most obtuse code I had ever encountered.

ammarhm
Lapis Lazuli | Level 10

I will try to give me own humble opinion as a doctor and an academic who uses SAS, R and Python.

I agree with KurtBremser, you should learn both… and not only that, you should also learn python in addition these two.

SAS has a number of major advantages over other languages/ environments:

  1. Its speed and ability to handle large databases is miles ahead of R. Proc SQL is an absolute beast in this area (yes you can run SQL code in R but there is no comparison to the way SAS implements it).
  2. All procedures in SAS are thoroughly and comprehensively tested and validated before being released. You know you can trust the output from any SAS procedure to be correct, period! (that is if you code it correctly of course). Most R libraries (apart from base R) are written by individuals and that makes them less reliable (see below)
  3. SAS just works out of the box once installed, you do not need to instal and load libraries as in R or packages for Python that not infrequently fail to load (especially in enterprise setting with security protocols that might interfere with installation of external packages)
  4. Importantly, the support you get from forums (and I mean SAS communities) is just incredible, you cannot get this level of friendly and quick support for R (you have to turn to stackoverflow, and my experience it is not as good as SAS communities…by light years)

SAS is just good, efficient, reliable and there is massive enterprise supporting it. It is here to stay and it is not used by large enterprises for no reason.

 

SAS has drawbacks….

  1. It is a commercial software, meaning it is expensive, and every module you add costs a fortune… you spend hundreds of dollars for an annual licence and then you only get base SAS
  2. SAS lacks support for machine learning, you need to (guessed it, spend more money) to get a seperate license for SAS Viya… a different module to run AI analytics, this is a HUGE disadvantage, I think.
  3. It is sometimes cumbersome to do some tacks in SAS, graphics editing sucks in SAS and you might need a long code in proc template to make a minor adjustment in a graph
  4. Code sharing (GitHub integration) between users is not as easy as in R

R is incredibly diverse when it comes to the different statistical analyses you can do with it. There is library for basically everything you can think of… This means it is easier and (with shorter code) to solve a specific problem in R, and while this is a strength, it is also R's major weakness. It sometimes becomes confusing to see that the same problem can be solved in many different ways using different libraries that sometimes give you marginally different answers. Furthermore, these libraries are maintained by individuals, and if that person decides for some reason not to maintain the library in question, then it might not work with an R update etc….and while there is good documentation for all R libraries, there is no comparison to the extensive SAS documents supporting each and every procedure in SAS.

Also R is not as memory efficient as SAS, this is especially noticeable when dealing with really big datasets, where R can become incredibly slow..

BUT… R is open source, FREE, versatile, can be installed on Windows or Mac (unlike SAS) supports machine learning and graphics are much easier to manipulate than in SAS… most academics use R nowadays, at my place of work, all researchers use R, and none uses SAS, and working on a project with others is impossible without a reasonable basic knowledge of R.

Some people might find R syntax difficult to get used to… but it gets nicer as you become more comfortable with the code. There are also libraries (dplyr) the improve the syntax. 

Also more the jobs nowadays require knowledge in R than SAS:

https://blog.revolutionanalytics.com/2017/02/job-trends-for-r-and-python.html

 

 

Python is a must today, it is the go to language for machine learning (and certainly not SAS Viya). It is a magnificent environment to work with. The code is very easy to learn, and it is a very powerful environment for manipulating databases. It is also FREE with a lot of resources online, but it falls well behind R and SAS in terms of its statistical abilities. The packages to deal with complex statistical analyses are just not there yet, but it is rapidly growing. I cannot emphasise enough that it is the go to language when it comes to machine learning.

 

So, I do not see SAS, R and Python as competitors, they are complementary to each other depending on the task at hand. I have on many occasions previously made the decision to try to use only one of these three as the only tool to do my analyses for ALL my projects, but failed spectacularly, as each is sometimes better suited for a specific problem. 

Good luck!

Ksharp
Super User
If you are planning to stay in Pharamacy Field , then learning SAS.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3000 views
  • 9 likes
  • 7 in conversation