Hello,
I know this topic appeared already in this forum however either I do not understand how to apply the %hop macro to my problem or it does not fit because I would need to 'hop' over multiple entries.
Here is the problem:
data number_game;
length player $1 result $1;
input player starttime :time5. playtime :time5. result $;
format starttime playtime time5.;
datalines;
1 12:00 12:00 L
1 12:00 12:15 L
1 12:00 12:30 W
1 12:00 12:45 W
1 12:00 13:00 W
1 12:00 13:15 W
1 12:00 13:30 W
1 12:00 13:45 W
2 12:00 12:00 W
2 12:00 12:15 L
2 12:00 12:30 L
2 12:00 12:45 W
2 12:00 13:00 L
2 12:00 13:15 W
2 12:00 13:30 W
2 12:00 13:45 W
3 12:00 12:00 L
3 12:00 12:15 W
3 12:00 12:30 L
3 12:00 12:45 L
3 12:00 13:00 W
3 12:00 13:15 W
3 12:00 13:30 W
3 12:00 13:45 W
;
run;
data number_game;
length win_streak 3.;
set number_game;
by player starttime;
retain win_streak;
if first.player then win_streak = 0;
if result = 'W' then win_streak + 1;
else win_streak = 0;
run;
In a series of games the player wins with the shortest time to a winstreak of 3 (consecutive wins, series of 3).
The time of the start of the winstreak should be calculated.
E.g. for player 1 this would be 12:30-12:00 --> 30minutes
Thank you in advance and best regards.
I'm sure someone will do this with a hash table, but how about this?
proc sql noprint;
select count(distinct player) into :np trimmed from number_game;
quit;
data winner;
set number_game end=last;
by player starttime playtime;
length win_streak 3 tot_time 4;
array Tp {&np} $6 _temporary_;
array Tt {&np} 4 _temporary_;
retain tpos 0;
if first.player or result='L' then win_streak=0;
win_streak+(result='W');
if win_streak=3 then do;
tot_time=playtime-starttime;
tpos+1;
Tp[tpos]=player;
Tt[tpos]=tot_time;
end;
if last then do;
best_time=min(of Tt[*]);
do i=1 to tpos;
if Tt[i]=best_time then do;
player=Tp[i];
output;
end;
end;
end;
keep player best_time;
run;
proc print data=winner; run;
This allows ties.
I'm sure someone will do this with a hash table, but how about this?
proc sql noprint;
select count(distinct player) into :np trimmed from number_game;
quit;
data winner;
set number_game end=last;
by player starttime playtime;
length win_streak 3 tot_time 4;
array Tp {&np} $6 _temporary_;
array Tt {&np} 4 _temporary_;
retain tpos 0;
if first.player or result='L' then win_streak=0;
win_streak+(result='W');
if win_streak=3 then do;
tot_time=playtime-starttime;
tpos+1;
Tp[tpos]=player;
Tt[tpos]=tot_time;
end;
if last then do;
best_time=min(of Tt[*]);
do i=1 to tpos;
if Tt[i]=best_time then do;
player=Tp[i];
output;
end;
end;
end;
keep player best_time;
run;
proc print data=winner; run;
This allows ties.
Very cool! I am always quite hesitant when it comes to arrays in SAS.
In terms of lines of codes my solution might be the simpler:
data number_game;
set number_game;
lag_playtime = lag2(playtime);
if win_streak = 3 then time_to_winstreak = intck('Minutes', starttime, lag_playtime);
drop lag_playtime;
run;
proc sql;
select player from number_game where time_to_winstreak in (select min(time_to_winstreak) from number_game where time_to_winstreak is not null);
quit;
However your example seems to be more versatile (lag function goes only up to 3).
Thank you!
@LuGa - actually there's a problem with my code -- if a person has more than one win streak of 3, the subsequent ones will overwrite earlier times, resulting in a worse time. To fix:
** change this line: ;
if win_streak=3 then do;
** ... to this -- this way, it will only consider a person's first win streak of 3 ;
if win_streak=3 and player not in Tp then do;
Slightly fancier - this will spit out all the players with a win streak of 3, ranked from best to worst. Btw, when defining the array Tp, be sure the length allocated for each bin (currently $6) is at least as long as the length of the player variable ($1 in your sample data).
proc sql noprint;
select count(distinct player) into :np trimmed from number_game;
quit;
data ranked;
set number_game end=last;
by player starttime playtime;
length win_streak 3 tot_time 4;
array Tp {&np} $6 _temporary_;
array Tt {&np} 4 _temporary_;
retain tpos 0;
if first.player or result='L' then win_streak=0;
win_streak+(result='W');
if win_streak=3 and player not in Tp then do;
tot_time=playtime-starttime;
tpos+1;
Tp[tpos]=player;
Tt[tpos]=tot_time;
end;
if last then do;
best_time=min(of Tt[*]);
rank=0;
do while (best_time>.);
rank+1;
do i=1 to tpos;
if Tt[i]=best_time then do;
player=Tp[i];
Tt[i]=.;
output;
end;
end;
best_time=min(of Tt[*]);
end;
end;
keep player best_time rank;
run;
proc print data=ranked; run;
Building off of your code, to calculate the start of each player's first winning streak, you could use the lag() function to look back.
data number_game_want;
length win_streak 3.;
set number_game;
by player starttime;
retain win_streak;
if first.player then do ;
win_streak = 0;
_done=0 ;
end ;
if result = 'W' then win_streak + 1;
else win_streak = 0;
lag2playtime=lag2(playtime) ;
if win_streak=3 and not _done then do ;
retain _done ;
_done=1 ;
TimeToWinStreak=lag2playtime-starttime ;
output ;
end ;
format TimeToWinStreak time5. ;
run;
You can use lag for more than 2 -- here's an example using lag10. I just tend to avoid it, but not for any great reason. Sometimes it's useful and certainly convenient. Just be careful not to use the lag function conditionally.
data test;
do i=1 to 15;
x=ranuni(0);
l10=lag10(x);
output;
end;
run;
proc print data=test; run;
A minor detail: do any of the "time" variables cross midnight? If you have a starttime of 23:59 and playtime values like 00:10 you have to deal with possible negative time interval values for comparisons of start of win streak.
I'm not quite sure how your problem would work if the same player had multiple starttime values, play sessions in effect.
For consideration, when you have a two value variable, such as your Result, you may want to consider using a numeric 1/0 value as there a many things that can be done with such much easier than with character values. For example if you code 1 as win and 0 as lose then the SUM of the result over any group of observations is the number of wins, the Mean would be the percent of wins in decimal form, i.e. .60 = 60% wins.
You might also try a double "look-ahead":
data want;
merge
number_game
number_game (
firstobs=2
keep=player result
rename=(player=player2 result=result2)
)
number_game (
firstobs=3
keep=player result
rename=(player=player3 result=result3)
)
;
retain start have_streak;
format start time time5.;
if player ne lag(player)
then do;
start = starttime;
have_streak = 0;
end;
if
not have_streak
and result = "W"
and player2 = player
and result2 = "W"
and player3 = player
and result3 = "W"
then do;
have_streak = 1;
time = playtime - start;
output;
end;
keep player start time;
run;
As you can see, the parts depending on the number of wins for a streak are repetitive, so they could be auto-created with a macro if different streak lengths are needed.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.