BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mahler_ji
Obsidian | Level 7

Hey All,

 

I am looking for some help, working with a large set of sports data.  I have data that takes the following form:

 

Date     Team     Venue     Score

XXX       A           Home     80

XXX       B           Away       90

XXX       C          Home       78

XXX       D          Away         65

YYY       A          Home        100

YYY       D         Away          87

 

 

And I want to create a column or something that looks like the following:

 

Date     Team     Venue     Score       didWin

XXX       A           Home     80              0

XXX       B           Away       90             1

XXX       C          Home       78             1

XXX       D          Away         65            0

YYY       A          Home        100          1

YYY       D         Away          87            0

 

There are no variables currently in the dataset in which I could do a transpose so that I could compare the home score and the away score.  The games are always in pairs of home/away teams.  I just need a way to compare the scores for each pair of teams on a given date, and then assign either a 1 or 0 for a win during that game.

 

AS A BONUS:  I am also going to be tallying a total wins and total losses column at each point in the season, going back to the beginning of the season, so the data would look like this:

 

Date     Team     Venue     Score       didWin        totalWins       totalLosses

XXX       A           Home     80              0                  0                       0

XXX       B           Away       90             1                  0                       0

XXX       C          Home       78             1                  0                       0

XXX       D          Away         65            0                  0                      0

YYY       A          Home        100          1                  0                       1 

YYY       D         Away          87            0                  0                       1

 

 

So coming into date YYY, I would know that the two teams that are playing each other both have zero wins and one loss up to that point in the season!

 

I attached a sample dataset to work with!

 

 

Thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions
ToddB
SAS Employee

This is one way to do it but requires going through the data three times. I am sure someone has a more efficient solution.

/* create a gamenum variable as key for each game */
data scores(keep=date teams venue f gamenum);
 set [library].teamdatasample;
 /* games are in pairs */
 gamenum=ceil(_n_/2);
 label f = "Final Score";
run;

/* sort scores by gamenum and final score descending to assign winner and loser */
proc sort data=scores;
 by gamenum descending f;
run;

/* assign a winner or loser */
data scores;
 set scores;
 by gamenum descending f;
 if first.gamenum then
 	didwin = 1;
 else
 	didwin = 0;
run;

/* now sort scores data by team and date to get win lose record over time */
proc sort data=scores;
 by teams date;
run;

/* retain wins and losses */
data scores;
 set scores;
 by teams date;
 retain totalwin totalloss 0;
 if first.teams then
 do;
 	totalwin = 0;
	totalloss = 0;
 end;
 if didwin = 1 then
	totalwin = totalwin + 1;
 else
	totalloss = totalloss + 1;
run;
 
/* resort by game number to order as original data */
proc sort data=scores;
 by gamenum descending didwin;
run;

View solution in original post

4 REPLIES 4
Reeza
Super User
Zip the file before uploading.
mahler_ji
Obsidian | Level 7

Awesome!

 

Post updated with data sample.

ToddB
SAS Employee

This is one way to do it but requires going through the data three times. I am sure someone has a more efficient solution.

/* create a gamenum variable as key for each game */
data scores(keep=date teams venue f gamenum);
 set [library].teamdatasample;
 /* games are in pairs */
 gamenum=ceil(_n_/2);
 label f = "Final Score";
run;

/* sort scores by gamenum and final score descending to assign winner and loser */
proc sort data=scores;
 by gamenum descending f;
run;

/* assign a winner or loser */
data scores;
 set scores;
 by gamenum descending f;
 if first.gamenum then
 	didwin = 1;
 else
 	didwin = 0;
run;

/* now sort scores data by team and date to get win lose record over time */
proc sort data=scores;
 by teams date;
run;

/* retain wins and losses */
data scores;
 set scores;
 by teams date;
 retain totalwin totalloss 0;
 if first.teams then
 do;
 	totalwin = 0;
	totalloss = 0;
 end;
 if didwin = 1 then
	totalwin = totalwin + 1;
 else
	totalloss = totalloss + 1;
run;
 
/* resort by game number to order as original data */
proc sort data=scores;
 by gamenum descending didwin;
run;
mahler_ji
Obsidian | Level 7

This looks really great! I was actually working on doing it in a similar way!  Thanks for finishing the thoughts for me!


John

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2980 views
  • 0 likes
  • 3 in conversation