We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Back to School - Analysing Ontario's University Data

by Super Contributor on ‎09-23-2016 06:53 AM - edited on ‎09-23-2016 02:29 PM by Community Manager (330 Views)

Welcome to SAS Analytics U!  This is my second year of writing this blog, and I’m very excited.  I have spent the summer brushing up on some of my skills, and learning some new ones.  I have some pretty cool ideas for this year’s series, but if you have any suggestions or topics you’d like explored, please reach out to me. I’m happy to help out where I can!  If you’re new to SAS, welcome to the community – you’re going to have a great time, and learn a lot about data analysis, statistics, and more!



Get the data

Because this is my first post of the school year, I wanted to do something related.  I was very excited when I found out the Council of Ontario Universities made their data available through the Common Universities Data of Ontario (CUDO) http://cou.on.ca/numbers/cudo/. I downloaded a couple of different datasets but only used two for this article,  Number of Applicants and number of Registrants.


How to go about getting SAS University Edition

If you don’t already have University Edition, get it here and follow the instructions from the .pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.


Getting the data ready

I did have to do some pretty heavy data manipulation in Excel to get the data down to what I want; for example, I deleted all the University-specific data and just kept the Annual totals.  I also had to transpose the data from a “wide” dataset (categories in rows and columns) to a “long” dataset (only three or four columns but all the categories in rows; also depending on the data, the categories may be duplicated). 


The reason I’ve done this is to make the graphs easier to create; because I’ve planned my data, I know that I want Year along the X-axis and the Counts along the Y.  The way the data was initially set up, the years were individual columns, which would make the graph I wanted impossible.  Transposition allows for a much easier analytical process, and this brings us to a key point – planning is essential when doing analytics.


The results

Because the data is in annual format, I wanted to do a number of different line graphs.  Because I’m using SAS University Edition, I have a full suite of Tasks that are going to make analysis and data management exponentially easier.  For the purpose of this article, I’ll only be using the one task but a couple of different ways. 


First, here’s a screenshot of the Task as I’ve set it up for the first graph:




When I run the task, the graph is below.  A couple of points about the data: 1) Applicants are first-time students who are applying to first year studies; 2) Undergrads are counted as of November 1 of the year; 3) Students are only counted if they are enrolled in full-time studies.




Context matters when working with data

A couple of things are apparent in this data that raise some key points.  First, the significant increase in Applications in 2003 seems to be an anomaly, either a data entry error or possibly something else that has gone wrong.  Anyone living in Ontario in 2002 or 2003 and was following the news will remember that this was the year that the Ontario Academic Credit (OAC, or more familiarly Grade 13) was taken out of the Ontario curriculum and so this was the year that both Grade 12 and Grade 13 Graduates applied to universities.  The following year, only the Grade 12 students applied so the numbers continued back on the previous gradual increase. 


The other point to note here is that the number of Undergrads seems to increase at a higher rate than the number of applicants.  At first this doesn’t make sense, but I think this is because many students are staying in undergraduate programs longer, doing double majors, and people leaving the workforce and going back to school.  Obviously both these would need to be investigated further to come to conclusions.


Trends by gender

The next graph is looking at Undergrads by sex.  Using the Group variable (group=’Undergrad’) and Category as my legend, it’s interesting to note that Females are consistently and significantly higher than Males in undergraduate programs.  Unfortunately, I was unable to find any data on Gender by Program by Year, so I can’t say if this is across all programs or if there are programs that are still male-dominated.




I’m really intrigued by the graph for Applicants by Gender.  As noted above, 2003 saw a huge spike in students applying to university – for some reason, the spike was largely comprised of female students.  For the rest of the periods, the Male / Female numbers do seem to parallel each other. 




Now it’s your turn!


Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.


Need data for learning?


The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:




We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:




Click Analytics U, then select "Subscribe" from the Options menu.


Happy Learning!





Your turn
Sign In!

Want to write an article? Sign in with your profile.

Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.