Hey All,
I am looking for some help, working with a large set of sports data. I have data that takes the following form:
Date Team Venue Score
XXX A Home 80
XXX B Away 90
XXX C Home 78
XXX D Away 65
YYY A Home 100
YYY D Away 87
And I want to create a column or something that looks like the following:
Date Team Venue Score didWin
XXX A Home 80 0
XXX B Away 90 1
XXX C Home 78 1
XXX D Away 65 0
YYY A Home 100 1
YYY D Away 87 0
There are no variables currently in the dataset in which I could do a transpose so that I could compare the home score and the away score. The games are always in pairs of home/away teams. I just need a way to compare the scores for each pair of teams on a given date, and then assign either a 1 or 0 for a win during that game.
AS A BONUS: I am also going to be tallying a total wins and total losses column at each point in the season, going back to the beginning of the season, so the data would look like this:
Date Team Venue Score didWin totalWins totalLosses
XXX A Home 80 0 0 0
XXX B Away 90 1 0 0
XXX C Home 78 1 0 0
XXX D Away 65 0 0 0
YYY A Home 100 1 0 1
YYY D Away 87 0 0 1
So coming into date YYY, I would know that the two teams that are playing each other both have zero wins and one loss up to that point in the season!
I attached a sample dataset to work with!
Thanks!
This is one way to do it but requires going through the data three times. I am sure someone has a more efficient solution.
/* create a gamenum variable as key for each game */
data scores(keep=date teams venue f gamenum);
set [library].teamdatasample;
/* games are in pairs */
gamenum=ceil(_n_/2);
label f = "Final Score";
run;
/* sort scores by gamenum and final score descending to assign winner and loser */
proc sort data=scores;
by gamenum descending f;
run;
/* assign a winner or loser */
data scores;
set scores;
by gamenum descending f;
if first.gamenum then
didwin = 1;
else
didwin = 0;
run;
/* now sort scores data by team and date to get win lose record over time */
proc sort data=scores;
by teams date;
run;
/* retain wins and losses */
data scores;
set scores;
by teams date;
retain totalwin totalloss 0;
if first.teams then
do;
totalwin = 0;
totalloss = 0;
end;
if didwin = 1 then
totalwin = totalwin + 1;
else
totalloss = totalloss + 1;
run;
/* resort by game number to order as original data */
proc sort data=scores;
by gamenum descending didwin;
run;
Awesome!
Post updated with data sample.
This is one way to do it but requires going through the data three times. I am sure someone has a more efficient solution.
/* create a gamenum variable as key for each game */
data scores(keep=date teams venue f gamenum);
set [library].teamdatasample;
/* games are in pairs */
gamenum=ceil(_n_/2);
label f = "Final Score";
run;
/* sort scores by gamenum and final score descending to assign winner and loser */
proc sort data=scores;
by gamenum descending f;
run;
/* assign a winner or loser */
data scores;
set scores;
by gamenum descending f;
if first.gamenum then
didwin = 1;
else
didwin = 0;
run;
/* now sort scores data by team and date to get win lose record over time */
proc sort data=scores;
by teams date;
run;
/* retain wins and losses */
data scores;
set scores;
by teams date;
retain totalwin totalloss 0;
if first.teams then
do;
totalwin = 0;
totalloss = 0;
end;
if didwin = 1 then
totalwin = totalwin + 1;
else
totalloss = totalloss + 1;
run;
/* resort by game number to order as original data */
proc sort data=scores;
by gamenum descending didwin;
run;
This looks really great! I was actually working on doing it in a similar way! Thanks for finishing the thoughts for me!
John
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.