BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cxkev182
Fluorite | Level 6

Good Evening,

I am required to do some analysis of Football (Soccer) teams. In a dataset of match history (sorted ascending by date), I have the following fields:

HomeTeam

AwayTeam

FTR

FTR being the full time result and can take the value of "H" (Home Win) "D" (Draw) or "A" (Away Win)

For each match, I need to count the number of Wins/Draws/Losses for each teams last 36 games.

So for instance, say an observation in the dataset is Manchester United v Arsenal, I want to count number of Win/Draw/Losses for Manchester United in their previous 36 games (consisting of both their home & away games) and also the same for Arsenal.

So the output would look like the following (sample data used):

HomeTeam     AwayTeam     FTR     H_Form_W     H_Form_D     Home_Form_L     A_Form_W     A_Form_D     A_Form_L

Man UTD          Arsenal          H          25                    5                    6                    19                    6                    10                   

This would seemingly be easy if only home games were needed for the home team and away games for the away team, however, I suspect a more complex macro is needed to handle the fact that when querying the dataset, the object we are looking for can appear in both the home & away team field.

Many Thanks for any help,

cxkev182

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Assuming you need the last 36 games, no matter whether they are home or away:

data split;

   set have;

   team = HomeTeam;

   if FTR='H' them result='W';

   else if FTR='A' then result='L';

   else result='D';

   output;

   team = AwayTeam;

   if FTR='A' then result='W';

   else if FTR = 'H' then result='L';

   else result='D';

   output;

   keep team date result;

run;

proc sort data=split;
   by team descending date;

run;

data want;

   set split;

   by team;

   if first.team then do;

     wins = 0;

     losses = 0;

     draws = 0;

  end;

  if wins + losses + draws < 36 then do;

      if result='W' then wins + 1;

      else if result='L' then losses + 1;

      else draws + 1;

   end;

   if last.team;

run;

Does this meet all the requirements?

View solution in original post

10 REPLIES 10
BruceBrad
Lapis Lazuli | Level 10

The 'brute force' method would be to use the 'point' option on the set statement to loop back through the file identifying whether the team in question was in the match, recording the result, and stopping once you have reached 36 matches or the start of the file. No macros required.

cxkev182
Fluorite | Level 6

BruceBrad, first of all, thanks very much for the reply. I am not familiar with the point option. I have googled to get some examples but can't seem to get my head round it. Could I ask for a sample piece of code how I could code this?

Thanks again for your help, I really appreciate it.

Astounding
PROC Star

Reasonably simple solutions should be possible.  Is there an additional variable in your data set that tells you the date of the match?  How do you know that the observations are in order?  Start there, and I'll be able to post something later this afternoon.

cxkev182
Fluorite | Level 6

Astounding, yes sorry there is a variable simply called 'Date', It's from this that I know and can keep the dataset in order. Look forward to any help you can give.

Reeza
Super User

Have you considered changing your data structure?

Team  Location FTR form_w form_d form_l

Manchester Away H 25 5 6

Arsenal Home AL 19 6 10

I think this may be more conducive to data analysis. 

BruceBrad
Lapis Lazuli | Level 10

Actually, I think Reeza's approach is the neatest way to go.

Step 1. Create two records from each of your records - one for each team as Reeza suggests

Step 2. Sort by date within team, and create a variable COUNT that counts the number of times the team has competed. That is, starting at 1 for the first occurence of the team etc.

Step 3. Create a dataset which only has the last record for each team.

Step 3. Merge this dataset with the step 2 dataset, selecting records where count is greater than the current value of count minus 16.

Step 4. Summarise this merged data, counting the number of wins etc.

Astounding
PROC Star

Bruce,

This sounds about right, and was along the lines of what I was thinking as well.  One suggestion:  If you sort within team by DESCENDING date, you can accomplish the rest in a single additional DATA step.

Good luck.

cxkev182
Fluorite | Level 6

Thanks again for the reply all. I think this approach would only get me the last x home games and last x away games for each teams, however, because teams don't always play home, away, home I wouldn't be capturing the right data in some instances for example if a team plays 2 home games in a row. I will definitely go with this approach however, if there is no alternative. 

Astounding
PROC Star

Assuming you need the last 36 games, no matter whether they are home or away:

data split;

   set have;

   team = HomeTeam;

   if FTR='H' them result='W';

   else if FTR='A' then result='L';

   else result='D';

   output;

   team = AwayTeam;

   if FTR='A' then result='W';

   else if FTR = 'H' then result='L';

   else result='D';

   output;

   keep team date result;

run;

proc sort data=split;
   by team descending date;

run;

data want;

   set split;

   by team;

   if first.team then do;

     wins = 0;

     losses = 0;

     draws = 0;

  end;

  if wins + losses + draws < 36 then do;

      if result='W' then wins + 1;

      else if result='L' then losses + 1;

      else draws + 1;

   end;

   if last.team;

run;

Does this meet all the requirements?

cxkev182
Fluorite | Level 6

Thanks to everyone for the help as you have all contributed. I wish I could give you all the correct answer as Reeza had the initial idea of structuring the data in this way, Bruce elaborated further on it, but I have set the correct answer to Astounding for taking the ideas and putting them into code for me.

Many thanks to all.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1794 views
  • 7 likes
  • 4 in conversation