BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Taasby
Calcite | Level 5

Hi everyone

 

I have a dataset of 3.257 observations on Premier League matches the last 10 seasons. It looks like this:

"HomeTeam","AwayTeam","Season","Date","Hometeam_goal_expectancy","Awayteam_goal_expectancy"
"Man City","Blackpool","2011","01/01/2011","3.19298245614035","1.36058252427184"
"Stoke","Everton","2011","01/01/2011","0.64210526315789","1.486875"
"Sunderland","Blackburn","2011","01/01/2011","1.93038596491228","1.34933333333333"
"Fulham","West Brom","2011","01/04/2011","1.34095238095238","1.98110552763819"
"Aston Villa","Sunderland","2011","01/05/2011","2.33229629629629","1.0790206185567"
"Newcastle","West Ham","2011","01/05/2011","1.83804347826086","2.00910891089108"
"Wolves","Chelsea","2011","01/05/2011","3.10180505415162","1.40392156862745"
"Chelsea","Blackburn","2011","01/15/2011","4.55042345276873","1.47929203539823"
"Man City","Wolves","2011","01/15/2011","3.48054237288135","0.55891402714932"
"Stoke","Bolton","2011","01/15/2011","1.24107744107744","1.18506787330316"
"West Brom","Blackpool","2011","01/15/2011","1.70254966887417","2.33219730941704"
"West Ham","Arsenal","2011","01/15/2011","0.85485197368421","2.13094170403587"
"Wigan","Fulham","2011","01/15/2011","0.15635179153094","1.67400881057268"
"Liverpool","Everton","2011","01/16/2011","1.86122448979591","0.9035294117647"
"Sunderland","Newcastle","2011","01/16/2011","1.014","1.85294117647058"
"Arsenal","Wigan","2011","01/22/2011","1.34653465346534","1.58260869565217"
"Everton","West Ham","2011","01/22/2011","2.49180327868852","0.63436123348017"
"Fulham","Stoke","2011","01/22/2011","1.49837133550488","0.69868995633187"
"Man United","Birmingham","2011","01/22/2011","0.97087378640776","0.78260869565217"
"Newcastle","Tottenham","2011","01/22/2011","1.16504854368932","1.07079646017699"
"Wolves","Liverpool","2011","01/22/2011","0.70358306188925","1.23893805309734"
"Blackburn","West Brom","2011","01/23/2011","1.19736842105263","0.6695652173913"
"Bolton","Chelsea","2011","01/24/2011","1.18032786885245","1.85022026431718"
"Wigan","Aston Villa","2011","01/25/2011","1.29449838187702","2.21973094170403"
"Liverpool","Fulham","2011","01/26/2011","3","1.14285714285714"
"Arsenal","Everton","2011","02/01/2011","2.75496688741721","0.76018099547511"
"Man United","Aston Villa","2011","02/01/2011","1.64705882352941","0.50224215246636"
"Sunderland","Chelsea","2011","02/01/2011","0.27814569536423","2.48888888888888"
"Birmingham","Man City","2011","02/02/2011","0.29220779220779","2.88288288288288"

 

For every observation I want to calculate the probability of all possible scorelines using the Poisson distribution. I restrict the team to score a maximum of 10 goals.

 

For instance for the first observation (see data above) I can manually plug in "3.19" and "1.36" into this code:

data final_6;
 do Home_goals=0 to 10; 
 do Away_goals=0 to 10;
 format probability 8.3;
 probability = pdf ('poission', Home_goals, 3.19) * pdf('Poisson', Away_goals, 1.36)*100;
 output;
 end;
 end;
 run;

(…) to get this table (containing 121 observations):

 

image.png

I aim to avoid manually typing in the data for every game (3.257 games). Also it would be neat if the output for the probability of the different scorelines could somehow be added horisontally to the observation for the match in question. So for the first game I would end up with something like this (including all the possible scorelines):

HomeTeam AwayTeam Season Date Hometeam_goal_expectancy Awayteam_goal_expectancy 0-0 1-0 2-0 3-0 4-0 ...
Man City Blackpool 2011 01/01/2011 3.1929824561 1.3605825243 1.057 3.3117 5.377 5.717

I know this will add 121 variables to every observation. But I will use them to sum up the joint winning probability for the home team (simply summing alle the probabilities where the hometeam have got more goals than the awayteam) and the away team (simply summing alle the probabilities where the awayteam have got more goals than the hometeam). 

 

Any help is very much appreciated!!

 

Best,

 

Frederik

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Taasby,

 

You can simply use the SET statement to pull the expectancies from your base dataset. I would stick to the "long" format with 121 observations rather than 121 variables for the individual scores because it's easier to work with. Here's an example:

/* Create sample data */

data have;
infile cards dlm='",';
length HomeTeam AwayTeam $20;
input HomeTeam AwayTeam Season Date :mmddyy. Hometeam_goal_expectancy Awayteam_goal_expectancy;
format date mmddyy10.;
cards;
"Man City","Blackpool","2011","01/01/2011","3.19298245614035","1.36058252427184"
"Stoke","Everton","2011","01/01/2011","0.64210526315789","1.486875"
"Sunderland","Blackburn","2011","01/01/2011","1.93038596491228","1.34933333333333"
;

/* Compute probabilities for individual scores */

data final_6;
set have;
do Home_goals=0 to 10; 
  do Away_goals=0 to 10;
    format probability 8.3;
    probability = pdf('Poisson', Home_goals, Hometeam_goal_expectancy) * pdf('Poisson', Away_goals, Awayteam_goal_expectancy)*100;
    output;
  end;
end;
run;

/* Compute probabilities for home wins */

proc summary data=final_6 nway;
class HomeTeam AwayTeam Season;
where home_goals>away_goals;
var probability;
output out=home_wins(drop=_:) sum=p_home_win;
run;

View solution in original post

5 REPLIES 5
FreelanceReinh
Jade | Level 19

Hi @Taasby,

 

You can simply use the SET statement to pull the expectancies from your base dataset. I would stick to the "long" format with 121 observations rather than 121 variables for the individual scores because it's easier to work with. Here's an example:

/* Create sample data */

data have;
infile cards dlm='",';
length HomeTeam AwayTeam $20;
input HomeTeam AwayTeam Season Date :mmddyy. Hometeam_goal_expectancy Awayteam_goal_expectancy;
format date mmddyy10.;
cards;
"Man City","Blackpool","2011","01/01/2011","3.19298245614035","1.36058252427184"
"Stoke","Everton","2011","01/01/2011","0.64210526315789","1.486875"
"Sunderland","Blackburn","2011","01/01/2011","1.93038596491228","1.34933333333333"
;

/* Compute probabilities for individual scores */

data final_6;
set have;
do Home_goals=0 to 10; 
  do Away_goals=0 to 10;
    format probability 8.3;
    probability = pdf('Poisson', Home_goals, Hometeam_goal_expectancy) * pdf('Poisson', Away_goals, Awayteam_goal_expectancy)*100;
    output;
  end;
end;
run;

/* Compute probabilities for home wins */

proc summary data=final_6 nway;
class HomeTeam AwayTeam Season;
where home_goals>away_goals;
var probability;
output out=home_wins(drop=_:) sum=p_home_win;
run;
Taasby
Calcite | Level 5

THANK YOU so much! Did exactly what I wished for!

 

/Frederik

Rick_SAS
SAS Super FREQ

You almost solved the problem yourself. All you need to do is use an array to define the new 121 variables. Using a two-dimensional array will make the indexing easier.  I didn't double-check the following computation for correctness, but you should be able to use technique to accomplish your goal:

data Have;
input Home_expect Away_expect;
datalines;
3.19298245614035 1.36058252427184
0.64210526315789 1.486875
;

data Want;
set Have;
array Prob[0:10,0:10] score00_00 score01_00 score02_00 score03_00 score04_00 score05_00 
score06_00 score07_00 score08_00 score09_00 score10_00 
score00_01 score01_01 score02_01 score03_01 score04_01 score05_01 
score06_01 score07_01 score08_01 score09_01 score10_01 
score00_02 score01_02 score02_02 score03_02 score04_02 score05_02 
score06_02 score07_02 score08_02 score09_02 score10_02 
score00_03 score01_03 score02_03 score03_03 score04_03 score05_03 
score06_03 score07_03 score08_03 score09_03 score10_03 
score00_04 score01_04 score02_04 score03_04 score04_04 score05_04 
score06_04 score07_04 score08_04 score09_04 score10_04 
score00_05 score01_05 score02_05 score03_05 score04_05 score05_05 
score06_05 score07_05 score08_05 score09_05 score10_05 
score00_06 score01_06 score02_06 score03_06 score04_06 score05_06 
score06_06 score07_06 score08_06 score09_06 score10_06 
score00_07 score01_07 score02_07 score03_07 score04_07 score05_07 
score06_07 score07_07 score08_07 score09_07 score10_07 
score00_08 score01_08 score02_08 score03_08 score04_08 score05_08 
score06_08 score07_08 score08_08 score09_08 score10_08 
score00_09 score01_09 score02_09 score03_09 score04_09 score05_09 
score06_09 score07_09 score08_09 score09_09 score10_09 
score00_10 score01_10 score02_10 score03_10 score04_10 score05_10 
score06_10 score07_10 score08_10 score09_10 score10_10 ;
format score: 8.3;
do away=0 to 10;
   do home=0 to 10; 
      Prob[home, away] = pdf('Poisson', home, Home_expect) * 
                         pdf('Poisson', away, Away_expect)*100;
   end;
end;
output;
drop home away;
run;
Taasby
Calcite | Level 5

Hi Rick

 

The code didn't work out on my data unfortunately - but I found the solution in a suggestion above, so nevermind! Thank you so much for your time anyway!!

 

Best,

Frederik

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 608 views
  • 0 likes
  • 4 in conversation