BookmarkSubscribeRSS Feed

How to group your data using SAS

Started ‎03-31-2017 by
Modified ‎08-04-2021 by
Views 3,716

Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

SAS University Edition lets you group your data any way you like.jpg

 

One of the most useful capabilities in any data analysis is taking some data and grouping it into logical categories. Whether its numeric, date, or character, being able to take ABC, DEF, and GHI and group them to “Category 1” makes life a lot easier, and may potentially tease out relationships previously unseen.

 

There are any number of ways of doing this in SAS, but the one I wanted to focus on today is PROC FORMAT.

 

Get the data

The data can be downloaded from the province of Nova Scotia’s Open Data site, here. Multiple files formats of the data are available; I chose CSV for Excel. I did clean up the data prior to importing it, specifically removing the 90th percentile columns and deleting the “Rolling totals” rows. Otherwise, the data was imported as it is.

 

Get started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:
 

Get Started

 

Get the data ready

The data was already in a format that I could use, as it was in standard CSV format. 

 

The results

Here’s a few rows of the data as it was imported:

 

 1.png

 

You’ll note the procedures are highly varied, and the Consult_Median column seems to have a wide range of values (including missing). I’d like to be able to have a little more logic applied to these for graphing and analysis.

 

The first step is to build the groups, and because there are over 100 distinct procedures, I’m going to pick on two – Knee Scope and Knee Scope with ACL Repair.

 

proc format;
 value $procedure 'Knee Scope' = 'Ortho'
                 'Knee Scope with ACL repair' = 'Ortho'
                 other = 'Other';
run;

 

 

You’ll notice that I explicitly state the values I want grouped, and then I indicate other = ‘Other’. SAS University Edition is smart enough to know that by specifying other, I want all remaining values to be in the one group.

 

The next step is to apply my new format to my data using PROC PRINT. The noobs suppresses the column with the Row Number, and label just uses the variable names as the column headers.

 

I’m also specifying I want to see Year and Consult_Median in my output, without any specific formatting applied.

 

proc print data=work.import noobs label;
label procedure='Procedure Type';
format procedure $procedure.;
run;

 

 

Here is the new output, with “ORTHO” highlighted:

 

 2.png

 

So that’s the first step – what about the numeric data? I’ll use the Consult_Median column, and I want to bucket it into weeks / months; I’m keeping the format for the Procedure in the code.

 

proc format;
value $procedure 'Knee Scope' = 'Ortho'
   'Knee Scope with ACL repair' = 'Ortho’
   other = 'Other';               
value consult 0 - 14 = '1-2weeks’ 
             15 - 30 = '2-4weeks'
               30 - 60 = '1-2months'
               other = '2+months';
run;

 

 

Here I am specifying numeric ranges (notice without the ‘ around the actual values) and then

providing the value I want created.

 

In the PROC PRINT statement, I only have to add the new format to the consult_median column. Because the original values are numeric, I don’t need the $ like I do with the procedure (which is character).

 

proc print data=work.import noobs label;
label procedure='Procedure Type';
format procedure $procedure. consult_median consult.;
run;

 

 

Here’s my output with the brand new Consult_Median groups. You can obviously carry this through to Year (‘Before 2015’ and ‘After 2015’) Zones (‘Zones1_2’,’Zones3_4’) or any other variable.

 

3.png

 

As with everything, it is extremely important to validate, cross-check and revalidate your data. As an example, if you typed ‘Znoe 1’ for the Zone1_2 group, and had an “Other” bucket, all your Zone 1 data would end up in the wrong category and seriously throw your numbers off.

 

Now it’s your turn!

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

 

Comments

It is always good to remember that you can use formats to assign categories. You can do the same thing for numeric data, where the process is called "binning."

Version history
Last update:
‎08-04-2021 10:08 AM
Updated by:

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags