*EDIT* I probably should have added that I have to use a data step to complete this.
So I have a problem where I have the data of Rose Bowl football games including the winning college and the margin of victory (mov). I have to get a table with proc print that produces three columns: The college, how many times they've won, and their average mov for all of the winning games.
I can get my data sorted into groups by the school with their total number of wins, but I don't know how to get the average of their margins of victory without it being a running total of all schools rather than school by school. This is my current code that ends up with the avgmov value being a running total of all schools, rather than the school by school which I need.
proc sort data=rosebowl;
by winteam mov;
run;
data WinsBySchool;
set rosebowl;
count + 1;
by winteam;
retain totmov 0;
if first.winteam then count = 1;
totmov+mov;
if Last.winteam=1;
avgmov=totmov/count;
run;
Sounds more like a task for Proc MEANS
maybe
Proc means data = rosebowl ;
by winteam;
var MOV;
output out = winsbyschool n= mean= /autoname ;
run;
This would be a whole lot easier in PROC SUMMARY/PROC MEANS, since PROC SUMMARY/PROC MEANS has built-in the code to compute statistics within BY groups.
Something like this (UNTESTED CODE)
proc summary data=rosebowl nway;
class winteam;
var mov;
output out=_stats_ mean=avgmov n=count;
run;
I add that computing means in a DATA step is usually inefficient, and takes longer to program that using PROC SUMMARY/PROC MEANS. Also, the way you are doing things by computing numerators which are sums, and then dividing by the count, might work fine for Rose Bowl data, but in the real world, if you have missing values, then your method gives the wrong answers, while PROC SUMMARY/PROC MEANS gives the right answers. So I recommend that you don't compute means (or other widely available statistics) by writing your own DATA step code.
proc sort data=rosebowl;
by winteam mov;
run;
data WinsBySchool;
set rosebowl;
by winteam;
retain totmov4Wins 0 numWins;
*reset everything at the beginning of each team;
if first.winteam then do;
numWins=0;
totmov4wins=0;
end;
*if they win increment values;
if win=1 then do:
totmov+mov;
numWins+1;
end;
*if last record for team produce count and average;
if last.winteam then do;
avgmov=totmov/count;
output;
end;
*keep only variables of interest;
keep winTeam numWIns avgMov;
run;
I don't see any logic about checking if a team has won? I added that in, but it obviously depends on how your data is structured which I have no idea of.
I would highly recommend you add comments to your code.
@mysasusername wrote:
*EDIT* I probably should have added that I have to use a data step to complete this.
So I have a problem where I have the data of Rose Bowl football games including the winning college and the margin of victory (mov). I have to get a table with proc print that produces three columns: The college, how many times they've won, and their average mov for all of the winning games.
I can get my data sorted into groups by the school with their total number of wins, but I don't know how to get the average of their margins of victory without it being a running total of all schools rather than school by school. This is my current code that ends up with the avgmov value being a running total of all schools, rather than the school by school which I need.
proc sort data=rosebowl;
by winteam mov;
run;data WinsBySchool;
set rosebowl;
count + 1;
by winteam;
retain totmov 0;
if first.winteam then count = 1;
totmov+mov;
if Last.winteam=1;
avgmov=totmov/count;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.