Re: PROC SQL - Block CASE statement instead of repeating across lines

dmz2000 · Posted 11-23-2015 03:39 PM

I have a game score dataset that looks like the below. I am trying to calculate new three columns that conditions on Rank using something like:

Dataset:

Player  Player_Rank Score_Game1 Score_Game2 Score_Game3
Tom     1   87  84  92
John    2   91  84  87
Peter   3   81  83  83
Hank    4   85  72  57
Alfred  5   60  58  54
Mike    6   79  74  64
Joe     7   49  81  62
Marc    8   70  87  47
Sean    9   73  51  69

New Column calculations PROC SQL code:

case when Player_Rank <= 3 then Score_Game1 else . end as Score_Game1_r;
case when Player_Rank <= 3 then Score_Game2 else . end as Score_Game2_r;
case when Player_Rank <= 3 then Score_Game3 else . end as Score_Game3_r;

Is there a way that I just do one case statement, something like the below instead of repeating it for all columns?

case when Player_Rank <= 3 then
    Score_Game1 else . end as Score_Game1_r;
    Score_Game2 else . end as Score_Game2_r;
    Score_Game3 else . end as Score_Game3_r;

The end data would look like:

Player  Player_Rank Score_Game1 Score_Game2 Score_Game3 Score_Game1_r   Score_Game2_r   Score_Game3_r
Tom     1   87  84  92  87  84  92
John    2   91  84  87  91  84  87
Peter   3   81  83  83  81  83  83
Hank    4   85  72  57  .   .   .
Alfred  5   60  58  54  .   .   .
Mike    6   79  74  64  .   .   .
Joe     7   49  81  62  .   .   .
Marc    8   70  87  47  .   .   .
Sean    9   73  51  69  .   .   .

The reason I want such is that I want to calculate the AVG(Score) for each game and also the AVG Score for each game just for players in all times top 3. Is there a more efficient way to calculate conditional averages, using proc sql or just data step? Thanks.

FreelanceReinh · Posted 11-23-2015 04:13 PM

To my knowledge it is not possible to create more than one variable with only one CASE WHEN expression.

But you could accomplish that with a data step:

data want;
set have;
if 1<=Player_Rank<=3 then do;
  Score_Game1_r=Score_Game1;
  Score_Game2_r=Score_Game2;
  Score_Game3_r=Score_Game3;
end;
else call missing(of Score_Game1_r--Score_Game3_r); /* this line can be omitted */
run;

Maybe you don't need the additional variables, because you can calculate the desired mean scores most easily by using PROC MEANS:

proc means data=have;
var score:;
run;

proc means data=have;
where 1<=Player_Rank<=3;
var score:;
run;

(The abbreviation score: assumes that only variables Score_Game1, Score_Game2 and Score_Game3 have names starting with "SCORE".)

If, for some reason, you had to use PROC SQL for the calculation, you could compute the averages like this:

proc sql;
select avg(Score_Game1) as avg1,
       avg(Score_Game2) as avg2,
       avg(Score_Game3) as avg3,
       avg(ifn(1<=Player_Rank<=3,Score_Game1,.)) as avg1top3,
       avg(ifn(1<=Player_Rank<=3,Score_Game2,.)) as avg2top3,
       avg(ifn(1<=Player_Rank<=3,Score_Game3,.)) as avg3top3
from have;
quit;

Edit: Slightly simplified definition of AVGTOP1--AVGTOP3 and changed their names to AVG1TOP3 etc. in order to avoid possible confusion with the "top 3".

PGStats · Posted 11-23-2015 10:27 PM

You can achieve your final goal by putting both summarisations (whole and top 3) side by side

proc sql;
select * from
(select 
    mean(score_game1) as mean_score1,
    mean(score_game2) as mean_score2,
    mean(score_game3) as mean_score3 
 from players) ,
(select 
    mean(score_game1) as mean_top_score1,
    mean(score_game2) as mean_top_score2,
    mean(score_game3) as mean_top_score3 
 from players where player_rank <= 3);
quit;

PG

PROC SQL - Block CASE statement instead of repeating across lines

Re: PROC SQL - Block CASE statement instead of repeating across lines

Re: PROC SQL - Block CASE statement instead of repeating across lines

Registration is open

SAS Training: Just a Click Away