I have a game score dataset that looks like the below. I am trying to calculate new three columns that conditions on Rank using something like:
Player Player_Rank Score_Game1 Score_Game2 Score_Game3 Tom 1 87 84 92 John 2 91 84 87 Peter 3 81 83 83 Hank 4 85 72 57 Alfred 5 60 58 54 Mike 6 79 74 64 Joe 7 49 81 62 Marc 8 70 87 47 Sean 9 73 51 69
New Column calculations PROC SQL code:
case when Player_Rank <= 3 then Score_Game1 else . end as Score_Game1_r; case when Player_Rank <= 3 then Score_Game2 else . end as Score_Game2_r; case when Player_Rank <= 3 then Score_Game3 else . end as Score_Game3_r;
Is there a way that I just do one case statement, something like the below instead of repeating it for all columns?
case when Player_Rank <= 3 then Score_Game1 else . end as Score_Game1_r; Score_Game2 else . end as Score_Game2_r; Score_Game3 else . end as Score_Game3_r;
The end data would look like:
Player Player_Rank Score_Game1 Score_Game2 Score_Game3 Score_Game1_r Score_Game2_r Score_Game3_r Tom 1 87 84 92 87 84 92 John 2 91 84 87 91 84 87 Peter 3 81 83 83 81 83 83 Hank 4 85 72 57 . . . Alfred 5 60 58 54 . . . Mike 6 79 74 64 . . . Joe 7 49 81 62 . . . Marc 8 70 87 47 . . . Sean 9 73 51 69 . . .
The reason I want such is that I want to calculate the AVG(Score) for each game and also the AVG Score for each game just for players in all times top 3. Is there a more efficient way to calculate conditional averages, using proc sql or just data step? Thanks.
To my knowledge it is not possible to create more than one variable with only one CASE WHEN expression.
But you could accomplish that with a data step:
data want; set have; if 1<=Player_Rank<=3 then do; Score_Game1_r=Score_Game1; Score_Game2_r=Score_Game2; Score_Game3_r=Score_Game3; end; else call missing(of Score_Game1_r--Score_Game3_r); /* this line can be omitted */ run;
Maybe you don't need the additional variables, because you can calculate the desired mean scores most easily by using PROC MEANS:
proc means data=have; var score:; run; proc means data=have; where 1<=Player_Rank<=3; var score:; run;
(The abbreviation score: assumes that only variables Score_Game1, Score_Game2 and Score_Game3 have names starting with "SCORE".)
If, for some reason, you had to use PROC SQL for the calculation, you could compute the averages like this:
proc sql; select avg(Score_Game1) as avg1, avg(Score_Game2) as avg2, avg(Score_Game3) as avg3, avg(ifn(1<=Player_Rank<=3,Score_Game1,.)) as avg1top3, avg(ifn(1<=Player_Rank<=3,Score_Game2,.)) as avg2top3, avg(ifn(1<=Player_Rank<=3,Score_Game3,.)) as avg3top3 from have; quit;
Edit: Slightly simplified definition of AVGTOP1--AVGTOP3 and changed their names to AVG1TOP3 etc. in order to avoid possible confusion with the "top 3".
You can achieve your final goal by putting both summarisations (whole and top 3) side by side
proc sql; select * from (select mean(score_game1) as mean_score1, mean(score_game2) as mean_score2, mean(score_game3) as mean_score3 from players) , (select mean(score_game1) as mean_top_score1, mean(score_game2) as mean_top_score2, mean(score_game3) as mean_top_score3 from players where player_rank <= 3); quit;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.