DATA Step, Macro, Functions and more

Sorting, subsetting

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Sorting, subsetting

So, probably a simple question.  I need to find the mean of a subset of a subset of a variable (3 variables total).   So lets say I am trying to find the mean of NFC team scores, first by division, and then by team in the division.  I have a scores variable, a variable, division, that includes NFC EAST, NFC WEST, NFC SOUTH, and NFC NORTH.  Then a third variable, team,  that contains all of the teams in the conference.  I can get the mean of the scores of the division, but when I try to go to that deeper level of teams in the division, i cannot do it.  Here is the code I would use to get the mean by division:

 

proc sort data = football;

by division;

run;

 

proc means data = football;

var scores;

by division;

run;

 

this works just fine.

 

when I try to go by team,

 

it gives me an error because I haven't sorted by team, which then messes up the sort my division.

 

any suggestions?


Accepted Solutions
Solution
‎09-20-2016 11:27 PM
Super User
Super User
Posts: 6,502

Re: Sorting, subsetting

Use the CLASS statement instead of the BY statement and you can get summary for any of the combinations you want.

proc summary data=football ;
   class division team ;
   var score ;
   output out=want mean=mean_score ;
run;

You will get overall mean, mean for each level of division, mean for each level of team and also mean of each team within division.

The _TYPE_ variable will tell you which of the class variables are contributing to the result record.

_TYPE_=0  will be the overall mean. 

_TYPE_=1 will be the TEAM means.

_TYPE_=2 will be the DIVISION means.

_TYPE_=3 will the the TEAM within DIVISION means (which for NFL teams will be the same as TEAM means since each team is in one and only one DIVISION).

 

View solution in original post


All Replies
Respected Advisor
Posts: 4,663

Re: Sorting, subsetting

Do you want:

 

proc sort data = football;
by division team;
run;
 
proc means data = football;
var scores;
by division team;
run;

?

PG
Solution
‎09-20-2016 11:27 PM
Super User
Super User
Posts: 6,502

Re: Sorting, subsetting

Use the CLASS statement instead of the BY statement and you can get summary for any of the combinations you want.

proc summary data=football ;
   class division team ;
   var score ;
   output out=want mean=mean_score ;
run;

You will get overall mean, mean for each level of division, mean for each level of team and also mean of each team within division.

The _TYPE_ variable will tell you which of the class variables are contributing to the result record.

_TYPE_=0  will be the overall mean. 

_TYPE_=1 will be the TEAM means.

_TYPE_=2 will be the DIVISION means.

_TYPE_=3 will the the TEAM within DIVISION means (which for NFL teams will be the same as TEAM means since each team is in one and only one DIVISION).

 

Occasional Contributor
Posts: 6

Re: Sorting, subsetting

Tom,

 

Using the class statement was dead on.  I racked my brain for a few hours trying to figure out how to code the problem correctly.  Thanks for your help!

 

Adam

Super User
Posts: 17,963

Re: Sorting, subsetting

In addition to @Tom solution, if you want control over the levels look at TYPEs and WAYS statements in PROC MEANS.

They allow you to control the different combinations of class variables, super useful and under utilized. 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 344 views
  • 3 likes
  • 4 in conversation