DATA Step, Macro, Functions and more

Loop through each observation and count previous occurrences of values in dataset based on current observation

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Loop through each observation and count previous occurrences of values in dataset based on current observation

Good Evening,

I am required to do some analysis of Football (Soccer) teams. In a dataset of match history (sorted ascending by date), I have the following fields:

HomeTeam

AwayTeam

FTR

FTR being the full time result and can take the value of "H" (Home Win) "D" (Draw) or "A" (Away Win)

For each match, I need to count the number of Wins/Draws/Losses for each teams last 36 games.

So for instance, say an observation in the dataset is Manchester United v Arsenal, I want to count number of Win/Draw/Losses for Manchester United in their previous 36 games (consisting of both their home & away games) and also the same for Arsenal.

So the output would look like the following (sample data used):

HomeTeam     AwayTeam     FTR     H_Form_W     H_Form_D     Home_Form_L     A_Form_W     A_Form_D     A_Form_L

Man UTD          Arsenal          H          25                    5                    6                    19                    6                    10                   

This would seemingly be easy if only home games were needed for the home team and away games for the away team, however, I suspect a more complex macro is needed to handle the fact that when querying the dataset, the object we are looking for can appear in both the home & away team field.

Many Thanks for any help,

cxkev182


Accepted Solutions
Solution
‎03-10-2014 04:52 PM
Super User
Posts: 5,096

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Assuming you need the last 36 games, no matter whether they are home or away:

data split;

   set have;

   team = HomeTeam;

   if FTR='H' them result='W';

   else if FTR='A' then result='L';

   else result='D';

   output;

   team = AwayTeam;

   if FTR='A' then result='W';

   else if FTR = 'H' then result='L';

   else result='D';

   output;

   keep team date result;

run;

proc sort data=split;
   by team descending date;

run;

data want;

   set split;

   by team;

   if first.team then do;

     wins = 0;

     losses = 0;

     draws = 0;

  end;

  if wins + losses + draws < 36 then do;

      if result='W' then wins + 1;

      else if result='L' then losses + 1;

      else draws + 1;

   end;

   if last.team;

run;

Does this meet all the requirements?

View solution in original post


All Replies
Frequent Contributor
Posts: 129

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

The 'brute force' method would be to use the 'point' option on the set statement to loop back through the file identifying whether the team in question was in the match, recording the result, and stopping once you have reached 36 matches or the start of the file. No macros required.

Contributor
Posts: 23

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

BruceBrad, first of all, thanks very much for the reply. I am not familiar with the point option. I have googled to get some examples but can't seem to get my head round it. Could I ask for a sample piece of code how I could code this?

Thanks again for your help, I really appreciate it.

Super User
Posts: 5,096

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Reasonably simple solutions should be possible.  Is there an additional variable in your data set that tells you the date of the match?  How do you know that the observations are in order?  Start there, and I'll be able to post something later this afternoon.

Contributor
Posts: 23

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Astounding, yes sorry there is a variable simply called 'Date', It's from this that I know and can keep the dataset in order. Look forward to any help you can give.

Super User
Posts: 17,936

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Have you considered changing your data structure?

Team  Location FTR form_w form_d form_l

Manchester Away H 25 5 6

Arsenal Home AL 19 6 10

I think this may be more conducive to data analysis. 

Frequent Contributor
Posts: 129

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Actually, I think Reeza's approach is the neatest way to go.

Step 1. Create two records from each of your records - one for each team as Reeza suggests

Step 2. Sort by date within team, and create a variable COUNT that counts the number of times the team has competed. That is, starting at 1 for the first occurence of the team etc.

Step 3. Create a dataset which only has the last record for each team.

Step 3. Merge this dataset with the step 2 dataset, selecting records where count is greater than the current value of count minus 16.

Step 4. Summarise this merged data, counting the number of wins etc.

Super User
Posts: 5,096

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Bruce,

This sounds about right, and was along the lines of what I was thinking as well.  One suggestion:  If you sort within team by DESCENDING date, you can accomplish the rest in a single additional DATA step.

Good luck.

Contributor
Posts: 23

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Thanks again for the reply all. I think this approach would only get me the last x home games and last x away games for each teams, however, because teams don't always play home, away, home I wouldn't be capturing the right data in some instances for example if a team plays 2 home games in a row. I will definitely go with this approach however, if there is no alternative. 

Solution
‎03-10-2014 04:52 PM
Super User
Posts: 5,096

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Assuming you need the last 36 games, no matter whether they are home or away:

data split;

   set have;

   team = HomeTeam;

   if FTR='H' them result='W';

   else if FTR='A' then result='L';

   else result='D';

   output;

   team = AwayTeam;

   if FTR='A' then result='W';

   else if FTR = 'H' then result='L';

   else result='D';

   output;

   keep team date result;

run;

proc sort data=split;
   by team descending date;

run;

data want;

   set split;

   by team;

   if first.team then do;

     wins = 0;

     losses = 0;

     draws = 0;

  end;

  if wins + losses + draws < 36 then do;

      if result='W' then wins + 1;

      else if result='L' then losses + 1;

      else draws + 1;

   end;

   if last.team;

run;

Does this meet all the requirements?

Contributor
Posts: 23

Re: Loop through each observation and count previous occurrences of values in dataset based on current observation

Thanks to everyone for the help as you have all contributed. I wish I could give you all the correct answer as Reeza had the initial idea of structuring the data in this way, Bruce elaborated further on it, but I have set the correct answer to Astounding for taking the ideas and putting them into code for me.

Many thanks to all.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 429 views
  • 7 likes
  • 4 in conversation