I have a game score dataset that looks like the below. I am trying to calculate new three columns that conditions on Rank using something like:
Dataset:
Player Player_Rank Score_Game1 Score_Game2 Score_Game3
Tom 1 87 84 92
John 2 91 84 87
Peter 3 81 83 83
Hank 4 85 72 57
Alfred 5 60 58 54
Mike 6 79 74 64
Joe 7 49 81 62
Marc 8 70 87 47
Sean 9 73 51 69
New Column calculations PROC SQL code:
case when Player_Rank <= 3 then Score_Game1 else . end as Score_Game1_r;
case when Player_Rank <= 3 then Score_Game2 else . end as Score_Game2_r;
case when Player_Rank <= 3 then Score_Game3 else . end as Score_Game3_r;
Is there a way that I just do one case statement, something like the below instead of repeating it for all columns?
case when Player_Rank <= 3 then
Score_Game1 else . end as Score_Game1_r;
Score_Game2 else . end as Score_Game2_r;
Score_Game3 else . end as Score_Game3_r;
The end data would look like:
Player Player_Rank Score_Game1 Score_Game2 Score_Game3 Score_Game1_r Score_Game2_r Score_Game3_r
Tom 1 87 84 92 87 84 92
John 2 91 84 87 91 84 87
Peter 3 81 83 83 81 83 83
Hank 4 85 72 57 . . .
Alfred 5 60 58 54 . . .
Mike 6 79 74 64 . . .
Joe 7 49 81 62 . . .
Marc 8 70 87 47 . . .
Sean 9 73 51 69 . . .
The reason I want such is that I want to calculate the AVG(Score) for each game and also the AVG Score for each game just for players in all times top 3. Is there a more efficient way to calculate conditional averages, using proc sql or just data step? Thanks.
To my knowledge it is not possible to create more than one variable with only one CASE WHEN expression.
But you could accomplish that with a data step:
data want;
set have;
if 1<=Player_Rank<=3 then do;
Score_Game1_r=Score_Game1;
Score_Game2_r=Score_Game2;
Score_Game3_r=Score_Game3;
end;
else call missing(of Score_Game1_r--Score_Game3_r); /* this line can be omitted */
run;
Maybe you don't need the additional variables, because you can calculate the desired mean scores most easily by using PROC MEANS:
proc means data=have;
var score:;
run;
proc means data=have;
where 1<=Player_Rank<=3;
var score:;
run;
(The abbreviation score: assumes that only variables Score_Game1, Score_Game2 and Score_Game3 have names starting with "SCORE".)
If, for some reason, you had to use PROC SQL for the calculation, you could compute the averages like this:
proc sql;
select avg(Score_Game1) as avg1,
avg(Score_Game2) as avg2,
avg(Score_Game3) as avg3,
avg(ifn(1<=Player_Rank<=3,Score_Game1,.)) as avg1top3,
avg(ifn(1<=Player_Rank<=3,Score_Game2,.)) as avg2top3,
avg(ifn(1<=Player_Rank<=3,Score_Game3,.)) as avg3top3
from have;
quit;
Edit: Slightly simplified definition of AVGTOP1--AVGTOP3 and changed their names to AVG1TOP3 etc. in order to avoid possible confusion with the "top 3".
You can achieve your final goal by putting both summarisations (whole and top 3) side by side
proc sql;
select * from
(select
mean(score_game1) as mean_score1,
mean(score_game2) as mean_score2,
mean(score_game3) as mean_score3
from players) ,
(select
mean(score_game1) as mean_top_score1,
mean(score_game2) as mean_top_score2,
mean(score_game3) as mean_top_score3
from players where player_rank <= 3);
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.