Hey All,
I am looking for some help, working with a large set of sports data. I have data that takes the following form:
Date Team Venue Score
XXX A Home 80
XXX B Away 90
XXX C Home 78
XXX D Away 65
YYY A Home 100
YYY D Away 87
And I want to create a column or something that looks like the following:
Date Team Venue Score didWin
XXX A Home 80 0
XXX B Away 90 1
XXX C Home 78 1
XXX D Away 65 0
YYY A Home 100 1
YYY D Away 87 0
There are no variables currently in the dataset in which I could do a transpose so that I could compare the home score and the away score. The games are always in pairs of home/away teams. I just need a way to compare the scores for each pair of teams on a given date, and then assign either a 1 or 0 for a win during that game.
AS A BONUS: I am also going to be tallying a total wins and total losses column at each point in the season, going back to the beginning of the season, so the data would look like this:
Date Team Venue Score didWin totalWins totalLosses
XXX A Home 80 0 0 0
XXX B Away 90 1 0 0
XXX C Home 78 1 0 0
XXX D Away 65 0 0 0
YYY A Home 100 1 0 1
YYY D Away 87 0 0 1
So coming into date YYY, I would know that the two teams that are playing each other both have zero wins and one loss up to that point in the season!
I attached a sample dataset to work with!
Thanks!
This is one way to do it but requires going through the data three times. I am sure someone has a more efficient solution.
/* create a gamenum variable as key for each game */
data scores(keep=date teams venue f gamenum);
set [library].teamdatasample;
/* games are in pairs */
gamenum=ceil(_n_/2);
label f = "Final Score";
run;
/* sort scores by gamenum and final score descending to assign winner and loser */
proc sort data=scores;
by gamenum descending f;
run;
/* assign a winner or loser */
data scores;
set scores;
by gamenum descending f;
if first.gamenum then
didwin = 1;
else
didwin = 0;
run;
/* now sort scores data by team and date to get win lose record over time */
proc sort data=scores;
by teams date;
run;
/* retain wins and losses */
data scores;
set scores;
by teams date;
retain totalwin totalloss 0;
if first.teams then
do;
totalwin = 0;
totalloss = 0;
end;
if didwin = 1 then
totalwin = totalwin + 1;
else
totalloss = totalloss + 1;
run;
/* resort by game number to order as original data */
proc sort data=scores;
by gamenum descending didwin;
run;
Awesome!
Post updated with data sample.
This is one way to do it but requires going through the data three times. I am sure someone has a more efficient solution.
/* create a gamenum variable as key for each game */
data scores(keep=date teams venue f gamenum);
set [library].teamdatasample;
/* games are in pairs */
gamenum=ceil(_n_/2);
label f = "Final Score";
run;
/* sort scores by gamenum and final score descending to assign winner and loser */
proc sort data=scores;
by gamenum descending f;
run;
/* assign a winner or loser */
data scores;
set scores;
by gamenum descending f;
if first.gamenum then
didwin = 1;
else
didwin = 0;
run;
/* now sort scores data by team and date to get win lose record over time */
proc sort data=scores;
by teams date;
run;
/* retain wins and losses */
data scores;
set scores;
by teams date;
retain totalwin totalloss 0;
if first.teams then
do;
totalwin = 0;
totalloss = 0;
end;
if didwin = 1 then
totalwin = totalwin + 1;
else
totalloss = totalloss + 1;
run;
/* resort by game number to order as original data */
proc sort data=scores;
by gamenum descending didwin;
run;
This looks really great! I was actually working on doing it in a similar way! Thanks for finishing the thoughts for me!
John
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: